gratools specific_groups_sample

Identify segments shared by or unique to specific groups of samples.

🧬 Comparative Analysis Logic

This command performs set operations (Intersections and Differences) on pangenome segments:

Shared Segments: Must be present in ALL samples of Group A.
Specific Segments: Must be present in ALL samples of Group A AND absent from ALL samples of Group B.

Options

🛠️ View Command Line Options

$ gratools specific_groups_sample
Welcome to GraTools version: '1.1.0'
@author: GraTools team's
        ____                 __________               ____          
      6MMMMMb/               MMMMMMMMMM               `MM          
     8P    YM               /   MM     \               MM          
    6M      Y ___  __    ___    MM   _____     _____   MM   ____   
    MM        `MM 6MM  6MMMMb   MM  6MMMMMb   6MMMMMb  MM  6MMMMb\ 
    MM         MM69 " 8M'  `Mb  MM 6M'   `Mb 6M'   `Mb MM MM'    ` 
    MM     ___ MM'        ,oMM  MM MM     MM MM     MM MM YM.      
    MM     `M' MM     ,6MM9'MM  MM MM     MM MM     MM MM  YMMMMb  
    YM      M  MM     MM'   MM  MM MM     MM MM     MM MM      `Mb 
     8b    d9  MM     MM.  ,MM  MM YM.   ,M9 YM.   ,M9 MM L    ,MM 
      YMMMMM9  _MM_   `YMMM9'Yb_MM_ YMMMMM9   YMMMMM9 _MM_MYMMMM9 
        \                                    /                /
        /''A''\          /''''''\           /     /''''A'''''\
  ...GC|       |..ATG...C...CG...T....TAG..'..GC.|            |...
        \..C../      \.............../            \...TATA.../
 
Please cite our gitlab: https://forge.ird.fr/diade/gratools.git\

Usage: gratools specific_groups_sample [OPTIONS]
Aliases: groups

  This command compares segments between two (optional) groups of samples (A and
  B), both defined by providing files listing sample. It identifies: 1. Segments
  shared by ALL samples in group A. 2. Segments specific to group A (i.e.,
  present in ALL of A and ABSENT from ALL of B). An optional length filter can
  be applied to consider only segment for a minimal length.  Results are logged
  to the terminal and saved in a CSV file. This command relies on a pre-existing
  GraTools import.
  
  For more details, see the full documentation: https://gratools.readthedocs.io/
  en/latest/commands/specific_and_shared_segments.html

Specific groups sample Options:
  -g, --gfa PATH
     Path to the input GFA file (e.g., myGraph.gfa or myGraph.gfa.gz).
     [required]

  -o, --outdir DIRECTORY
     Output directory for GraTools results. If not specified, results are
     typically placed in a subdirectory within the GFA file's parent directory
     (e.g., 'GraTools-output_<gfa_name>').

  -sla, --samples-list-A FILE
     Path to a file listing sample names for group A (one sample per line).
     [required]

  -slb, --samples-list-B FILE
     Path to a file listing sample names for group B (one sample per line).
     Required for specificity analysis.

  -fl, --filter-len INTEGER
     Minimum segment length (bp) to consider. 0 means no length filter.
     [default: 0]

  -csv, --output_csv
     Save the specific and shared segments in a csv file.

  -su, --suffix TEXT
     Suffix added to output filename.  [required]

Logging Options:
  -vv, --verbosity [DEBUG|INFO|ERROR]
     Set the logging verbosity level.  [default: INFO]

  -l, --log-path DIRECTORY
     Directory where the log files will be saved. If not specified, logs will be
     placed in the main output directory (or in a default GraTools log
     location).

Performance Options:
  -t, --threads INTEGER
     Number of threads to be used for parallelizable operations.  [default: 1]

Other options:
  -h, --help
     Show this message and exit.

Complete Usage Example

In this scenario, we want to find segments shared by CG14 and Og20, but absent from Tog5681.

Step 1: Prepare Input Lists

📂 list_A.txt (Target Group)

CG14
Og20

📂 list_B.txt (Filter Group)

Tog5681

Step 2: Run Analysis

$ gratools specific_groups_sample --gfa Og_cactus.gfa.gz \
    --samples-list-A list_A.txt \
    --samples-list-B list_B.txt \
    --output-csv --suffix test

Step 3: Review Results

🖥️ Terminal Output Summary

╭────────────────────────────────── 📊 Shared & Specific Segment Analysis ───────────────────────────────────╮
│  Segments shared by 2 in list ['CG14', 'Og20']                                                             │
│    ├─ Count                                                                           980,985 / 2,354,995  │
│    ├─ Percentage                                                                                   41.66%  │
│    └─ Total Length                                                                          48,986,534 bp  │
│                                                                                                            │
│  Segments specific to 2 in list ['CG14', 'Og20'] and absent in 1 in list ['Tog5681']                       │
│    ├─ Count                                                                           214,326 / 2,354,995  │
│    ├─ Percentage                                                                                    9.10%  │
│    └─ Total Length                                                                           1,020,689 bp  │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

📂 Output Files Detail

GraTools generates CSV files containing not just the Node IDs, but also their exact coordinates in each sample.

Shared CSV

File: Og_cactus_segment_shared_test.csv

Contains nodes found in every sample of Group A.

Metadata: # Samples in list A: CG14, Og20

NODE_ID	CG14	Og20
10	CG14_Chr07:6-7	Og20_Chr07:13648-13649
100	CG14_Chr07:59-64	Og20_Chr07:14476-14481

Specific CSV

File: Og_cactus_segment_specific_test.csv

Contains nodes found in Group A but strictly absent from Group B.

Metadata: # Samples in list A: CG14, Og20 | # Samples in list B: Tog5681

NODE_ID	CG14	Og20
1000006	CG14_Chr07:20263860-20263861	Og20_Chr07:20222839-20222840
1000023	CG14_Chr07:20263916-20263917	Og20_Chr07:20222895-20222896

Visual Logic

🤝 Scenario 1: Shared

The command finds the intersection of the sets of segments. A segment must be in Sample A1 AND Sample A2.

🎯 Scenario 2: Specific

GraTools calculates the shared set of Group A, then subtracts any segment found in at least one sample from Group B.

💡 Pro Tip: Comparative Genomics

This command is ideal for identifying markers specific to a virulent strain or segments that are conserved within a species but absent in a close relative. The inclusion of coordinates in the CSV files allows you to jump directly to those regions using gratools get_fasta or gratools get_subgraph.

📑 Quick Links

Command Import: gratools import
Filter by depth: gratools get_segments_by_depth
Node statistics: gratools depth_nodes_stat