gratools depth_nodes_stat๏
Summarize how segments are shared across samples by calculating their โdepthโ.
This command analyzes the distribution of every segment in your graph. The Depth represents the number of unique samples passing through a segment.
Depth = 1: Private segments (specific to one individual).
Depth = N: Core segments (shared by all N individuals).
๐ ๏ธ Options๏
View Command Line Options
$ gratools depth_nodes_stat
Welcome to GraTools version: '1.2.0.dev19'
@author: GraTools team's
____ __________ ____
6MMMMMb/ MMMMMMMMMM `MM
8P YM / MM \ MM
6M Y ___ __ ___ MM _____ _____ MM ____
MM `MM 6MM 6MMMMb MM 6MMMMMb 6MMMMMb MM 6MMMMb\
MM MM69 " 8M' `Mb MM 6M' `Mb 6M' `Mb MM MM' `
MM ___ MM' ,oMM MM MM MM MM MM MM YM.
MM `M' MM ,6MM9'MM MM MM MM MM MM MM YMMMMb
YM M MM MM' MM MM MM MM MM MM MM `Mb
8b d9 MM MM. ,MM MM YM. ,M9 YM. ,M9 MM L ,MM
YMMMMM9 _MM_ `YMMM9'Yb_MM_ YMMMMM9 YMMMMM9 _MM_MYMMMM9
\ / /
/''A''\ /''''''\ / /''''A'''''\
...GC| |..ATG...C...CG...T....TAG..'..GC.| |...
\..C../ \.............../ \...TATA.../
Please cite our gitlab: https://forge.ird.fr/diade/gratools.git\
Usage: gratools depth_nodes_stat [OPTIONS]
Aliases: dns
This command computes how segments are shared across different samples by
calculating the 'depth' of each segment (i.e., the number of unique samples
encompassing it). It outputs a table showing the count of segments for each
depth level. An optional length filter can be applied to consider only
segments above a certain size for a 'filtered' count. Results are displayed in
the terminal and saved to a CSV file. This command relies on a pre-existing
GraTools import.
For more details, see the full documentation:
https://gratools.readthedocs.io/en/latest/commands/depth_nodes_stat.html
Node Depth Statistics Options:
-g, --gfa PATH
Path to the input GFA file (e.g., myGraph.gfa or myGraph.gfa.gz).
[required]
-o, --outdir DIRECTORY
Output directory for GraTools results. If not specified, results are
typically placed in a subdirectory within the GFA file's parent directory
(e.g., 'GraTools-output_<gfa_name>').
-su, --suffix TEXT
Custom suffix to append to output filenames. If not provided, a default
suffix will be generated based on the command line parameters.
-fl, --filter-len INTEGER
Minimal segment length (bp) to be considered in the depth count. A value of
0 means no length filter. [default: 0]
Logging Options:
-vv, --verbosity [DEBUG|INFO|ERROR]
Set the logging verbosity level. [default: INFO]
-l, --log-path DIRECTORY
Directory where the log files will be saved. If not specified, logs will be
placed in the main output directory (or in a default GraTools log
location).
Performance Options:
-t, --threads INTEGER
Number of threads to be used for parallelizable operations. [default: 1]
Other options:
-h, --help
Show this message and exit.
โถ๏ธ Usage Examples๏
Analyze how segments are shared while filtering out very short segments (e.g., < 50 bp) to focus on structural content.
$ gratools depth_nodes_stat --gfa Og_cactus.gfa.gz --filter-len 50 --threads 4
Output Summary:
โโโโโโโโโโโโโโโโ Summary โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Total segments analyzed: 2,354,995
Total segments passing length filter: 210,046 (8.92%)
Node Depth Statistics โ Og_cactus (Len โฅ 50bp)
โญโโโโโโโโฌโโโโโโโโโโโฌโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโฎ
โ Depth โ Segments โ Percentage โ Filtered Segments โ Filtered % โ
โโโโโโโโโผโโโโโโโโโโโผโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโค
โ 1 โ 575,292 โ 24.43% โ 1,257 โ 0.60% โ
โ 2 โ 316,053 โ 13.42% โ 7,345 โ 3.50% โ
โ 3 โ 301,588 โ 12.81% โ 5,604 โ 2.67% โ
โ 4 โ 501,321 โ 21.29% โ 11,500 โ 5.47% โ
โ 5 โ 660,741 โ 28.06% โ 184,340 โ 87.76% โ
โฐโโโโโโโโดโโโโโโโโโโโดโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโฏ
๐ Understanding the Output๏
Depth: Number of unique samples containing the segment.
Segments: Raw count of segments at this depth.
Percentage: Proportion relative to the total GFA segments.
Filtered Segments: Count of segments at this depth longer than
--filter-len.Filtered %: Proportion relative to the total filtered segments.
The --filter-len (or -fl) option is crucial for pangenome analysis.
In many graphs, a high percentage of segments are very short (1-10 bp) and correspond to small polymorphisms. Filtering by length allows you to see the distribution of larger genomic blocks, which often show a much higher conservation (Core genome).
Compare the Percentage column with the Filtered % column. If the Filtered % for the maximum depth is significantly higher than the raw Percentage, it indicates that your โCore Genomeโ is mostly composed of longer, well-conserved sequences, while โSpecificโ regions are often made of smaller segments.
๐ Quick Links
Command Import: gratools import
Related Tool: gratools pan_ratio
Explore stats: gratools stats