gratools specific_groups_sample๏ƒ

Identify segments shared by or unique to specific groups of samples.

๐Ÿงฌ Comparative Analysis Logic

This command performs set operations (Intersections and Differences) on pangenome segments:

  • Shared Segments: Must be present in ALL samples of Group A.

  • Specific Segments: Must be present in ALL samples of Group A AND absent from ALL samples of Group B.

Options๏ƒ

๐Ÿ› ๏ธ View Command Line Options
$ gratools specific_groups_sample
Welcome to GraTools version: '1.1.0'
@author: GraTools team's
        ____                 __________               ____          
      6MMMMMb/               MMMMMMMMMM               `MM          
     8P    YM               /   MM     \               MM          
    6M      Y ___  __    ___    MM   _____     _____   MM   ____   
    MM        `MM 6MM  6MMMMb   MM  6MMMMMb   6MMMMMb  MM  6MMMMb\ 
    MM         MM69 " 8M'  `Mb  MM 6M'   `Mb 6M'   `Mb MM MM'    ` 
    MM     ___ MM'        ,oMM  MM MM     MM MM     MM MM YM.      
    MM     `M' MM     ,6MM9'MM  MM MM     MM MM     MM MM  YMMMMb  
    YM      M  MM     MM'   MM  MM MM     MM MM     MM MM      `Mb 
     8b    d9  MM     MM.  ,MM  MM YM.   ,M9 YM.   ,M9 MM L    ,MM 
      YMMMMM9  _MM_   `YMMM9'Yb_MM_ YMMMMM9   YMMMMM9 _MM_MYMMMM9 
        \                                    /                /
        /''A''\          /''''''\           /     /''''A'''''\
  ...GC|       |..ATG...C...CG...T....TAG..'..GC.|            |...
        \..C../      \.............../            \...TATA.../
 
Please cite our gitlab: https://forge.ird.fr/diade/gratools.git\

Usage: gratools specific_groups_sample [OPTIONS]
Aliases: groups

  This command compares segments between two (optional) groups of samples (A and
  B), both defined by providing files listing sample. It identifies: 1. Segments
  shared by ALL samples in group A. 2. Segments specific to group A (i.e.,
  present in ALL of A and ABSENT from ALL of B). An optional length filter can
  be applied to consider only segment for a minimal length.  Results are logged
  to the terminal and saved in a CSV file. This command relies on a pre-existing
  GraTools import.
  
  For more details, see the full documentation: https://gratools.readthedocs.io/
  en/latest/commands/specific_and_shared_segments.html

Specific groups sample Options:
  -g, --gfa PATH
     Path to the input GFA file (e.g., myGraph.gfa or myGraph.gfa.gz).
     [required]

  -o, --outdir DIRECTORY
     Output directory for GraTools results. If not specified, results are
     typically placed in a subdirectory within the GFA file's parent directory
     (e.g., 'GraTools-output_<gfa_name>').

  -sla, --samples-list-A FILE
     Path to a file listing sample names for group A (one sample per line).
     [required]

  -slb, --samples-list-B FILE
     Path to a file listing sample names for group B (one sample per line).
     Required for specificity analysis.

  -fl, --filter-len INTEGER
     Minimum segment length (bp) to consider. 0 means no length filter.
     [default: 0]

  -csv, --output_csv
     Save the specific and shared segments in a csv file.

  -su, --suffix TEXT
     Suffix added to output filename.  [required]

Logging Options:
  -vv, --verbosity [DEBUG|INFO|ERROR]
     Set the logging verbosity level.  [default: INFO]

  -l, --log-path DIRECTORY
     Directory where the log files will be saved. If not specified, logs will be
     placed in the main output directory (or in a default GraTools log
     location).

Performance Options:
  -t, --threads INTEGER
     Number of threads to be used for parallelizable operations.  [default: 1]

Other options:
  -h, --help
     Show this message and exit.

Complete Usage Example๏ƒ

In this scenario, we want to find segments shared by CG14 and Og20, but absent from Tog5681.

Step 1: Prepare Input Lists๏ƒ

๐Ÿ“‚ list_A.txt (Target Group)
CG14
Og20
๐Ÿ“‚ list_B.txt (Filter Group)
Tog5681

Step 2: Run Analysis๏ƒ

$ gratools specific_groups_sample --gfa Og_cactus.gfa.gz \
    --samples-list-A list_A.txt \
    --samples-list-B list_B.txt \
    --output-csv --suffix test

Step 3: Review Results๏ƒ

๐Ÿ–ฅ๏ธ Terminal Output Summary
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ ๐Ÿ“Š Shared & Specific Segment Analysis โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚  Segments shared by 2 in list ['CG14', 'Og20']                                                             โ”‚
โ”‚    โ”œโ”€ Count                                                                           980,985 / 2,354,995  โ”‚
โ”‚    โ”œโ”€ Percentage                                                                                   41.66%  โ”‚
โ”‚    โ””โ”€ Total Length                                                                          48,986,534 bp  โ”‚
โ”‚                                                                                                            โ”‚
โ”‚  Segments specific to 2 in list ['CG14', 'Og20'] and absent in 1 in list ['Tog5681']                       โ”‚
โ”‚    โ”œโ”€ Count                                                                           214,326 / 2,354,995  โ”‚
โ”‚    โ”œโ”€ Percentage                                                                                    9.10%  โ”‚
โ”‚    โ””โ”€ Total Length                                                                           1,020,689 bp  โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

๐Ÿ“‚ Output Files Detail๏ƒ

GraTools generates CSV files containing not just the Node IDs, but also their exact coordinates in each sample.

File: Og_cactus_segment_shared_test.csv

Contains nodes found in every sample of Group A.

Metadata: # Samples in list A: CG14, Og20

NODE_ID

CG14

Og20

10

CG14_Chr07:6-7

Og20_Chr07:13648-13649

100

CG14_Chr07:59-64

Og20_Chr07:14476-14481

File: Og_cactus_segment_specific_test.csv

Contains nodes found in Group A but strictly absent from Group B.

Metadata: # Samples in list A: CG14, Og20 | # Samples in list B: Tog5681

NODE_ID

CG14

Og20

1000006

CG14_Chr07:20263860-20263861

Og20_Chr07:20222839-20222840

1000023

CG14_Chr07:20263916-20263917

Og20_Chr07:20222895-20222896

Visual Logic๏ƒ

The command finds the intersection of the sets of segments. A segment must be in Sample A1 AND Sample A2.

digraph shared_segments { // --- Global settings --- graph [ labelloc="t", label="Shared Segments (Intersection)", fontsize=14, fontname="Helvetica", bgcolor="white" ]; node [ shape=ellipse, style=filled, fontname="Helvetica" ]; // --- Subgraph to group the input samples --- subgraph cluster_group_a { label = "Group A"; labelloc = "t"; style = "dotted"; bgcolor = "#e0f0ff"; // Light blue background // Sample Nodes A1 [label="Sample A1", fillcolor=lightblue]; A2 [label="Sample A2", fillcolor=lightblue]; } // --- Result Node --- Shared [ label="Shared Segments\n(Present in A1 AND A2)", style="filled", // On force l'attribut style ici fillcolor="#90EE90", // On utilise le code Hex (plus fiable que le nom) fontcolor="#000000" // On force le texte en noir pour รชtre sรปr shape=box ]; // --- Connections showing the flow --- A1 -> Shared [label=" "]; A2 -> Shared [label=" "]; }

GraTools calculates the shared set of Group A, then subtracts any segment found in at least one sample from Group B.

digraph specific_segments { // --- Global settings --- graph [ rankdir=LR, // Left-to-right flow is better for a process labelloc="t", label="Specific Segments (Set Difference)", fontsize=14, fontname="Helvetica", bgcolor="white" ]; node [ shape=box, style=filled, fontname="Helvetica" ]; // --- Group A (Input) --- subgraph cluster_group_a { label = "Group A"; style = "rounded"; bgcolor = "#e0f0ff"; // Light blueish A1 [label="Sample A1", fillcolor=lightblue]; A2 [label="Sample A2", fillcolor=lightblue]; } // --- Intermediate step node --- SharedA [ label="Shared Segments\nin Group A", shape=ellipse, fillcolor=white, style="filled,dashed" ]; // --- Group B (Exclusion Filter) --- subgraph cluster_group_b { label = "Group B (Exclusion Filter)"; style = "rounded"; bgcolor = "#ffe0e0"; // Light reddish B1 [label="Sample B1", fillcolor=lightcoral]; B2 [label="Sample B2", fillcolor=lightcoral]; } // --- Final Result Node --- SpecificResult [ label="Segments Specific\nto Group A", style="filled", // On force l'attribut style ici fillcolor="#90EE90", // On utilise le code Hex (plus fiable que le nom) fontcolor="#000000" // On force le texte en noir pour รชtre sรปr ]; // --- Define the logical flow with arrows --- // CORRECTED: Removed comma between A1 and A2 {A1 A2} -> SharedA; SharedA -> SpecificResult [label=" Keep"]; // CORRECTED: Removed comma between B1 and B2 {B1 B2} -> SpecificResult [ label=" Remove", color=red, style=dashed, fontcolor=red ]; }

๐Ÿ’ก Pro Tip: Comparative Genomics

This command is ideal for identifying markers specific to a virulent strain or segments that are conserved within a species but absent in a close relative. The inclusion of coordinates in the CSV files allows you to jump directly to those regions using gratools get_fasta or gratools get_subgraph.

๐Ÿ“‘ Quick Links