gratools specific_groups_sample
This command identifies segments that are either common to a specific group of samples (Group A) or unique to that group when compared against a second, optional group (Group B). It operates based on strict presence/absence: a “shared” segment must be in all samples of a group, and a “specific” segment must be in all samples of Group A and absent from all samples of Group B.
Options
Complete Usage Example
This example identifies segments shared by samples CG14 and Og20 (Group A), and among that shared set, those that are absent from sample Tog5681 (Group B). The results are displayed in a table, and the segment lists are saved to CSV files.
Input Files:
File: `list_A.txt`
`text
CG14
Og20
`
File: `list_B.txt`
`text
Tog5681
`
Command:
$ gratools specific_groups_sample --gfa Og_cactus.gfa.gz \
--samples-list-A list_A.txt \
--samples-list-B list_B.txt \
--output-csv --suffix test
Terminal Output:
The terminal displays a summary of the shared and specific results.
╭────────────────────────────────── 📊 Shared & Specific Segment Analysis ───────────────────────────────────╮
│ Segments shared by 2 in list ['CG14', 'Og20'] │
│ ├─ Count 980,985 / 2,354,995 │
│ ├─ Percentage 41.66% │
│ └─ Total Length 48,986,534 bp │
│ │
│ Segments specific to 2 in list ['CG14', 'Og20'] and absent in 1 in list ['Tog5681'] │
│ ├─ Count 214,326 / 2,354,995 │
│ ├─ Percentage 9.10% │
│ └─ Total Length 1,020,689 bp │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
CSV Output Files:
Two CSV files are generated, containing not only the segment IDs but also their coordinates for each sample in Group A.
Contents of `Og_cactus_segment_shared_test.csv`:
`csv
# Samples in list A: CG14,Og20
NODE_ID,CG14,Og20
10,CG14_Chr07:6-7,Og20_Chr07:13648-13649
100,CG14_Chr07:59-64,Og20_Chr07:14476-14481
...
`
Contents of `Og_cactus_segment_specific_test.csv`:
`csv
# Samples in list A: CG14,Og20
# Samples in list B: Tog5681
NODE_ID,CG14,Og20
1000006,CG14_Chr07:20263860-20263861,Og20_Chr07:20222839-20222840
1000023,CG14_Chr07:20263916-20263917,Og20_Chr07:20222895-20222896
...
`
Illustrated Example
Scenario 1: Shared Segments
The command finds the intersection of the sets of segments present in each sample of Group A. A segment must be in Sample A1 AND Sample A2.
![digraph shared_segments {
// --- Global settings ---
graph [
labelloc="t",
label="Shared Segments (Intersection)",
fontsize=14,
fontname="Helvetica"
];
node [
shape=ellipse,
style=filled,
fontname="Helvetica"
];
// --- Subgraph to group the input samples ---
subgraph cluster_group_a {
label = "Group A";
labelloc = "t";
style = "dotted";
bgcolor = "#e0f0ff"; // Light blue background
// Sample Nodes
A1 [label="Sample A1", fillcolor=lightblue];
A2 [label="Sample A2", fillcolor=lightblue];
}
// --- Result Node ---
Shared [
label="Shared Segments\n(Present in A1 AND A2)",
fillcolor=lightgreen,
shape=box
];
// --- Connections showing the flow ---
A1 -> Shared [label=" "];
A2 -> Shared [label=" "];
}](../_images/graphviz-f7904f18408f815eded1d2407308f29ace4a3c82.png)
Scenario 2: Specific Segments
The command first calculates the set of segments shared by Group A. Then, it subtracts from that set any segment found in at least one sample from Group B. This is a set difference operation.
![digraph specific_segments {
// --- Global settings ---
graph [
rankdir=LR, // Left-to-right flow is better for a process
labelloc="t",
label="Specific Segments (Set Difference)",
fontsize=14,
fontname="Helvetica"
];
node [
shape=box,
style=filled,
fontname="Helvetica"
];
// --- Group A (Input) ---
subgraph cluster_group_a {
label = "Group A";
style = "rounded";
bgcolor = "#e0f0ff"; // Light blueish
A1 [label="Sample A1", fillcolor=lightblue];
A2 [label="Sample A2", fillcolor=lightblue];
}
// --- Intermediate step node ---
SharedA [
label="Shared Segments\nin Group A",
shape=ellipse,
fillcolor=white,
style="filled,dashed"
];
// --- Group B (Exclusion Filter) ---
subgraph cluster_group_b {
label = "Group B (Exclusion Filter)";
style = "rounded";
bgcolor = "#ffe0e0"; // Light reddish
B1 [label="Sample B1", fillcolor=lightcoral];
B2 [label="Sample B2", fillcolor=lightcoral];
}
// --- Final Result Node ---
SpecificResult [
label="Segments Specific\nto Group A",
fillcolor=lightgreen
];
// --- Define the logical flow with arrows ---
// CORRECTED: Removed comma between A1 and A2
{A1 A2} -> SharedA;
SharedA -> SpecificResult [label=" Keep"];
// CORRECTED: Removed comma between B1 and B2
{B1 B2} -> SpecificResult [
label=" Remove",
color=red,
style=dashed,
fontcolor=red
];
}](../_images/graphviz-0c1504869b53322901163bce24f88ddfc09ccf38.png)