gratools specific_groups_sample

This command identifies segments that are either common to a specific group of samples (Group A) or unique to that group when compared against a second, optional group (Group B). It operates based on strict presence/absence: a “shared” segment must be in all samples of a group, and a “specific” segment must be in all samples of Group A and absent from all samples of Group B.

Options

Complete Usage Example

This example identifies segments shared by samples CG14 and Og20 (Group A), and among that shared set, those that are absent from sample Tog5681 (Group B). The results are displayed in a table, and the segment lists are saved to CSV files.

Input Files:

File: `list_A.txt` `text CG14 Og20 ` File: `list_B.txt` `text Tog5681 `

Command:

$ gratools specific_groups_sample --gfa Og_cactus.gfa.gz \
    --samples-list-A list_A.txt \
    --samples-list-B list_B.txt \
    --output-csv --suffix test

Terminal Output:

The terminal displays a summary of the shared and specific results.

╭────────────────────────────────── 📊 Shared & Specific Segment Analysis ───────────────────────────────────╮
│  Segments shared by 2 in list ['CG14', 'Og20']                                                             │
│    ├─ Count                                                                           980,985 / 2,354,995  │
│    ├─ Percentage                                                                                   41.66%  │
│    └─ Total Length                                                                          48,986,534 bp  │
│                                                                                                            │
│  Segments specific to 2 in list ['CG14', 'Og20'] and absent in 1 in list ['Tog5681']                       │
│    ├─ Count                                                                           214,326 / 2,354,995  │
│    ├─ Percentage                                                                                    9.10%  │
│    └─ Total Length                                                                           1,020,689 bp  │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

CSV Output Files:

Two CSV files are generated, containing not only the segment IDs but also their coordinates for each sample in Group A.

Contents of `Og_cactus_segment_shared_test.csv`: `csv # Samples in list A: CG14,Og20 NODE_ID,CG14,Og20 10,CG14_Chr07:6-7,Og20_Chr07:13648-13649 100,CG14_Chr07:59-64,Og20_Chr07:14476-14481 ... `

Contents of `Og_cactus_segment_specific_test.csv`: `csv # Samples in list A: CG14,Og20 # Samples in list B: Tog5681 NODE_ID,CG14,Og20 1000006,CG14_Chr07:20263860-20263861,Og20_Chr07:20222839-20222840 1000023,CG14_Chr07:20263916-20263917,Og20_Chr07:20222895-20222896 ... `

Illustrated Example

Scenario 1: Shared Segments

The command finds the intersection of the sets of segments present in each sample of Group A. A segment must be in Sample A1 AND Sample A2.

digraph shared_segments {
    // --- Global settings ---
    graph [
        labelloc="t",
        label="Shared Segments (Intersection)",
        fontsize=14,
        fontname="Helvetica"
    ];
    node [
        shape=ellipse,
        style=filled,
        fontname="Helvetica"
    ];

    // --- Subgraph to group the input samples ---
    subgraph cluster_group_a {
        label = "Group A";
        labelloc = "t";
        style = "dotted";
        bgcolor = "#e0f0ff"; // Light blue background

        // Sample Nodes
        A1 [label="Sample A1", fillcolor=lightblue];
        A2 [label="Sample A2", fillcolor=lightblue];
    }

    // --- Result Node ---
    Shared [
        label="Shared Segments\n(Present in A1 AND A2)",
        fillcolor=lightgreen,
        shape=box
    ];

    // --- Connections showing the flow ---
    A1 -> Shared [label=" "];
    A2 -> Shared [label=" "];
}

Scenario 2: Specific Segments

The command first calculates the set of segments shared by Group A. Then, it subtracts from that set any segment found in at least one sample from Group B. This is a set difference operation.

digraph specific_segments {
    // --- Global settings ---
    graph [
        rankdir=LR, // Left-to-right flow is better for a process
        labelloc="t",
        label="Specific Segments (Set Difference)",
        fontsize=14,
        fontname="Helvetica"
    ];
    node [
        shape=box,
        style=filled,
        fontname="Helvetica"
    ];

    // --- Group A (Input) ---
    subgraph cluster_group_a {
        label = "Group A";
        style = "rounded";
        bgcolor = "#e0f0ff"; // Light blueish

        A1 [label="Sample A1", fillcolor=lightblue];
        A2 [label="Sample A2", fillcolor=lightblue];
    }

    // --- Intermediate step node ---
    SharedA [
        label="Shared Segments\nin Group A",
        shape=ellipse,
        fillcolor=white,
        style="filled,dashed"
    ];

    // --- Group B (Exclusion Filter) ---
    subgraph cluster_group_b {
        label = "Group B (Exclusion Filter)";
        style = "rounded";
        bgcolor = "#ffe0e0"; // Light reddish

        B1 [label="Sample B1", fillcolor=lightcoral];
        B2 [label="Sample B2", fillcolor=lightcoral];
    }

    // --- Final Result Node ---
    SpecificResult [
        label="Segments Specific\nto Group A",
        fillcolor=lightgreen
    ];

    // --- Define the logical flow with arrows ---
    // CORRECTED: Removed comma between A1 and A2
    {A1 A2} -> SharedA;

    SharedA -> SpecificResult [label=" Keep"];

    // CORRECTED: Removed comma between B1 and B2
    {B1 B2} -> SpecificResult [
        label=" Remove",
        color=red,
        style=dashed,
        fontcolor=red
    ];
}