gratools depth_nodes_stat๏ƒ

Summarize how segments are shared across samples by calculating their โ€˜depthโ€™.

๐Ÿงฌ Understanding Node Depth

This command analyzes the distribution of every segment in your graph. The Depth represents the number of unique samples passing through a segment.

  • Depth = 1: Private segments (specific to one individual).

  • Depth = N: Core segments (shared by all N individuals).

Options๏ƒ

๐Ÿ› ๏ธ View Command Line Options
$ gratools depth_nodes_stat
Welcome to GraTools version: '1.1.0'
@author: GraTools team's
        ____                 __________               ____          
      6MMMMMb/               MMMMMMMMMM               `MM          
     8P    YM               /   MM     \               MM          
    6M      Y ___  __    ___    MM   _____     _____   MM   ____   
    MM        `MM 6MM  6MMMMb   MM  6MMMMMb   6MMMMMb  MM  6MMMMb\ 
    MM         MM69 " 8M'  `Mb  MM 6M'   `Mb 6M'   `Mb MM MM'    ` 
    MM     ___ MM'        ,oMM  MM MM     MM MM     MM MM YM.      
    MM     `M' MM     ,6MM9'MM  MM MM     MM MM     MM MM  YMMMMb  
    YM      M  MM     MM'   MM  MM MM     MM MM     MM MM      `Mb 
     8b    d9  MM     MM.  ,MM  MM YM.   ,M9 YM.   ,M9 MM L    ,MM 
      YMMMMM9  _MM_   `YMMM9'Yb_MM_ YMMMMM9   YMMMMM9 _MM_MYMMMM9 
        \                                    /                /
        /''A''\          /''''''\           /     /''''A'''''\
  ...GC|       |..ATG...C...CG...T....TAG..'..GC.|            |...
        \..C../      \.............../            \...TATA.../
 
Please cite our gitlab: https://forge.ird.fr/diade/gratools.git\

Usage: gratools depth_nodes_stat [OPTIONS]
Aliases: dns

  This command computes how segments are shared across different samples by
  calculating the 'depth' of each segment (i.e., the number of unique samples
  encompassing it). It outputs a table showing the count of segments for each
  depth level. An optional length filter can be applied to consider only
  segments above a certain size for a 'filtered' count. Results are displayed in
  the terminal and saved to a CSV file. This command relies on a pre-existing
  GraTools import.
  
  For more details, see the full documentation:
  https://gratools.readthedocs.io/en/latest/commands/depth_nodes_stat.html

Node Depth Statistics Options:
  -g, --gfa PATH
     Path to the input GFA file (e.g., myGraph.gfa or myGraph.gfa.gz).
     [required]

  -o, --outdir DIRECTORY
     Output directory for GraTools results. If not specified, results are
     typically placed in a subdirectory within the GFA file's parent directory
     (e.g., 'GraTools-output_<gfa_name>').

  -su, --suffix TEXT
     Custom suffix to append to output filenames. If not provided, a default
     suffix will be generated based on the command line parameters.

  -fl, --filter-len INTEGER
     Minimal segment length (bp) to be considered in the depth count. A value of
     0 means no length filter.  [default: 0]

Logging Options:
  -vv, --verbosity [DEBUG|INFO|ERROR]
     Set the logging verbosity level.  [default: INFO]

  -l, --log-path DIRECTORY
     Directory where the log files will be saved. If not specified, logs will be
     placed in the main output directory (or in a default GraTools log
     location).

Performance Options:
  -t, --threads INTEGER
     Number of threads to be used for parallelizable operations.  [default: 1]

Other options:
  -h, --help
     Show this message and exit.

Usage Example๏ƒ

๐Ÿ“Š Node Depth Distribution

Analyze how segments are shared while filtering out very short segments (e.g., < 50 bp) to focus on structural content.

$ gratools depth_nodes_stat --gfa Og_cactus.gfa.gz --filter-len 50 --threads 4

Output Summary:

โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€  Summary โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
Total segments analyzed: 2,354,995
Total segments passing length filter: 210,046 (8.92%)

          Node Depth Statistics โ€” Og_cactus (Len โ‰ฅ 50bp)
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ Depth โ”‚ Segments โ”‚ Percentage โ”‚ Filtered Segments โ”‚ Filtered % โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚   1   โ”‚ 575,292  โ”‚   24.43%   โ”‚       1,257       โ”‚   0.60%    โ”‚
โ”‚   2   โ”‚ 316,053  โ”‚   13.42%   โ”‚       7,345       โ”‚   3.50%    โ”‚
โ”‚   3   โ”‚ 301,588  โ”‚   12.81%   โ”‚       5,604       โ”‚   2.67%    โ”‚
โ”‚   4   โ”‚ 501,321  โ”‚   21.29%   โ”‚      11,500       โ”‚   5.47%    โ”‚
โ”‚   5   โ”‚ 660,741  โ”‚   28.06%   โ”‚      184,340      โ”‚   87.76%   โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

Understanding the Table๏ƒ

๐Ÿ“ Table Columns
  • Depth: Number of unique samples containing the segment.

  • Segments: Raw count of segments at this depth.

  • Percentage: Proportion relative to the total GFA segments.

  • Filtered Segments: Count of segments at this depth longer than --filter-len.

  • Filtered %: Proportion relative to the total filtered segments.

๐Ÿ” Filter Logic

The --filter-len (or -fl) option is crucial for pangenome analysis.

In many graphs, a high percentage of segments are very short (1-10 bp) and correspond to small polymorphisms. Filtering by length allows you to see the distribution of larger genomic blocks, which often show a much higher conservation (Core genome).

๐Ÿ’ก Insight

Compare the Percentage column with the Filtered % column. If the Filtered % for the maximum depth is significantly higher than the raw Percentage, it indicates that your โ€œCore Genomeโ€ is mostly composed of longer, well-conserved sequences, while โ€œSpecificโ€ regions are often made of smaller segments.

๐Ÿ“‘ Quick Links