gratools specific_groups_sample๏
Identify segments shared by or unique to specific groups of samples.
This command performs set operations (Intersections and Differences) on pangenome segments:
Shared Segments: Must be present in ALL samples of Group A.
Specific Segments: Must be present in ALL samples of Group A AND absent from ALL samples of Group B.
Options๏
๐ ๏ธ View Command Line Options
$ gratools specific_groups_sample
Welcome to GraTools version: '1.1.0'
@author: GraTools team's
____ __________ ____
6MMMMMb/ MMMMMMMMMM `MM
8P YM / MM \ MM
6M Y ___ __ ___ MM _____ _____ MM ____
MM `MM 6MM 6MMMMb MM 6MMMMMb 6MMMMMb MM 6MMMMb\
MM MM69 " 8M' `Mb MM 6M' `Mb 6M' `Mb MM MM' `
MM ___ MM' ,oMM MM MM MM MM MM MM YM.
MM `M' MM ,6MM9'MM MM MM MM MM MM MM YMMMMb
YM M MM MM' MM MM MM MM MM MM MM `Mb
8b d9 MM MM. ,MM MM YM. ,M9 YM. ,M9 MM L ,MM
YMMMMM9 _MM_ `YMMM9'Yb_MM_ YMMMMM9 YMMMMM9 _MM_MYMMMM9
\ / /
/''A''\ /''''''\ / /''''A'''''\
...GC| |..ATG...C...CG...T....TAG..'..GC.| |...
\..C../ \.............../ \...TATA.../
Please cite our gitlab: https://forge.ird.fr/diade/gratools.git\
Usage: gratools specific_groups_sample [OPTIONS]
Aliases: groups
This command compares segments between two (optional) groups of samples (A and
B), both defined by providing files listing sample. It identifies: 1. Segments
shared by ALL samples in group A. 2. Segments specific to group A (i.e.,
present in ALL of A and ABSENT from ALL of B). An optional length filter can
be applied to consider only segment for a minimal length. Results are logged
to the terminal and saved in a CSV file. This command relies on a pre-existing
GraTools import.
For more details, see the full documentation: https://gratools.readthedocs.io/
en/latest/commands/specific_and_shared_segments.html
Specific groups sample Options:
-g, --gfa PATH
Path to the input GFA file (e.g., myGraph.gfa or myGraph.gfa.gz).
[required]
-o, --outdir DIRECTORY
Output directory for GraTools results. If not specified, results are
typically placed in a subdirectory within the GFA file's parent directory
(e.g., 'GraTools-output_<gfa_name>').
-sla, --samples-list-A FILE
Path to a file listing sample names for group A (one sample per line).
[required]
-slb, --samples-list-B FILE
Path to a file listing sample names for group B (one sample per line).
Required for specificity analysis.
-fl, --filter-len INTEGER
Minimum segment length (bp) to consider. 0 means no length filter.
[default: 0]
-csv, --output_csv
Save the specific and shared segments in a csv file.
-su, --suffix TEXT
Suffix added to output filename. [required]
Logging Options:
-vv, --verbosity [DEBUG|INFO|ERROR]
Set the logging verbosity level. [default: INFO]
-l, --log-path DIRECTORY
Directory where the log files will be saved. If not specified, logs will be
placed in the main output directory (or in a default GraTools log
location).
Performance Options:
-t, --threads INTEGER
Number of threads to be used for parallelizable operations. [default: 1]
Other options:
-h, --help
Show this message and exit.
Complete Usage Example๏
In this scenario, we want to find segments shared by CG14 and Og20, but absent from Tog5681.
Step 1: Prepare Input Lists๏
CG14
Og20
Tog5681
Step 2: Run Analysis๏
$ gratools specific_groups_sample --gfa Og_cactus.gfa.gz \
--samples-list-A list_A.txt \
--samples-list-B list_B.txt \
--output-csv --suffix test
Step 3: Review Results๏
โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ ๐ Shared & Specific Segment Analysis โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ Segments shared by 2 in list ['CG14', 'Og20'] โ
โ โโ Count 980,985 / 2,354,995 โ
โ โโ Percentage 41.66% โ
โ โโ Total Length 48,986,534 bp โ
โ โ
โ Segments specific to 2 in list ['CG14', 'Og20'] and absent in 1 in list ['Tog5681'] โ
โ โโ Count 214,326 / 2,354,995 โ
โ โโ Percentage 9.10% โ
โ โโ Total Length 1,020,689 bp โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
๐ Output Files Detail๏
GraTools generates CSV files containing not just the Node IDs, but also their exact coordinates in each sample.
File: Og_cactus_segment_shared_test.csv
Contains nodes found in every sample of Group A.
Metadata: # Samples in list A: CG14, Og20
NODE_ID |
CG14 |
Og20 |
|---|---|---|
10 |
CG14_Chr07:6-7 |
Og20_Chr07:13648-13649 |
100 |
CG14_Chr07:59-64 |
Og20_Chr07:14476-14481 |
File: Og_cactus_segment_specific_test.csv
Contains nodes found in Group A but strictly absent from Group B.
Metadata: # Samples in list A: CG14, Og20 | # Samples in list B: Tog5681
NODE_ID |
CG14 |
Og20 |
|---|---|---|
1000006 |
CG14_Chr07:20263860-20263861 |
Og20_Chr07:20222839-20222840 |
1000023 |
CG14_Chr07:20263916-20263917 |
Og20_Chr07:20222895-20222896 |
Visual Logic๏
The command finds the intersection of the sets of segments. A segment must be in Sample A1 AND Sample A2.
GraTools calculates the shared set of Group A, then subtracts any segment found in at least one sample from Group B.
This command is ideal for identifying markers specific to a virulent strain or segments that are conserved within a species but absent in a close relative. The inclusion of coordinates in the CSV files allows you to jump directly to those regions using gratools get_fasta or gratools get_subgraph.
๐ Quick Links
Command Import: gratools import
Filter by depth: gratools get_segments_by_depth
Node statistics: gratools depth_nodes_stat