gratools stats๏
Comprehensive statistics for pangenome graph size, connectivity, and complexity.
The stats command parses the GFA file (or uses the GraTools index) to calculate essential metrics. It provides a high-level understanding of the graphโs properties, which is crucial for quality control and structural analysis of the pangenome.
Options๏
๐ ๏ธ View Command Line Options
$ gratools stats
Welcome to GraTools version: '1.1.0.dev7'
@author: GraTools team's
____ __________ ____
6MMMMMb/ MMMMMMMMMM `MM
8P YM / MM \ MM
6M Y ___ __ ___ MM _____ _____ MM ____
MM `MM 6MM 6MMMMb MM 6MMMMMb 6MMMMMb MM 6MMMMb\
MM MM69 " 8M' `Mb MM 6M' `Mb 6M' `Mb MM MM' `
MM ___ MM' ,oMM MM MM MM MM MM MM YM.
MM `M' MM ,6MM9'MM MM MM MM MM MM MM YMMMMb
YM M MM MM' MM MM MM MM MM MM MM `Mb
8b d9 MM MM. ,MM MM YM. ,M9 YM. ,M9 MM L ,MM
YMMMMM9 _MM_ `YMMM9'Yb_MM_ YMMMMM9 YMMMMM9 _MM_MYMMMM9
\ / /
/''A''\ /''''''\ / /''''A'''''\
...GC| |..ATG...C...CG...T....TAG..'..GC.| |...
\..C../ \.............../ \...TATA.../
Please cite our gitlab: https://forge.ird.fr/diade/gratools.git\
Usage: gratools stats [OPTIONS]
Parses a GFA file (or uses an existing GraTools index) to compute various
statistics about its structure, including segment counts, link properties,
walk characteristics, and connectivity. Results are displayed in the terminal
and saved to a file within the GraTools index directory.
For more details, see the full documentation:
https://gratools.readthedocs.io/en/latest/commands/stats.html
Statistics Options:
-g, --gfa PATH
Path to the input GFA file (e.g., myGraph.gfa or myGraph.gfa.gz).
[required]
-o, --outdir DIRECTORY
Output directory for GraTools results. If not specified, results are
typically placed in a subdirectory within the GFA file's parent directory
(e.g., 'GraTools-output_<gfa_name>').
--by-category / --single-table
Display statistics in separate tables per category instead of one
consolidated table. [default: single-table]
--save / --no-save
Save the statistique to a CSV file. [default: no-save]
Logging Options:
-vv, --verbosity [DEBUG|INFO|ERROR]
Set the logging verbosity level. [default: INFO]
-l, --log-path DIRECTORY
Directory where the log files will be saved. If not specified, logs will be
placed in the main output directory (or in a default GraTools log
location).
Performance Options:
-t, --threads INTEGER
Number of threads to be used for parallelizable operations. [default: 1]
Other options:
-h, --help
Show this message and exit.
Usage Examples๏
Calculates all statistics and displays them in a single, comprehensive table.
$ gratools stats -g Og_cactus.gfa.gz
โญโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ Category โ Metric โ Value โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Graph Overview โ GFA File Name โ Og_cactus โ
โ Graph Overview โ GFA Version โ 1.1 โ
โ Graph Overview โ Total Segments (S lines) โ 2,354,995 โ
โ Graph Overview โ Total Links (L lines) โ 6,670,282 โ
โ Graph Overview โ Total Walks (W lines) โ 23 โ
โ Graph Overview โ Unique Samples in Walks โ 5 โ
โ Segment Statistics โ Total Segment Length (bp) โ 57,119,496 โ
โ ... โ ... โ ... โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
For better readability on large screens, display statistics in separate tables for each category.
$ gratools stats -g Og_cactus.gfa.gz --by-category
--- GFA Statistics for: Og_cactus ---
Graph Overview Statistics
โญโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโฎ
โ Metric โ Value โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโค
โ GFA File Name โ Og_cactus โ
โ ... โ ... โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโฏ
Segment Statistics Statistics
โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ Metric โ Value โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Total Segment Length (bp) โ 57,119,496 โ
โ ... โ ... โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
Save the results for downstream analysis in Excel or R. The file is saved directly within the GraTools index directory.
$ gratools stats -g Og_cactus.gfa.gz --save
Metrics Explained๏
Select a category below to understand the different metrics provided by GraTools.
General information about the GFA file and its primary components.
Metric |
Description |
|---|---|
GFA File Name |
The base name of the input GFA file. |
GFA Version |
Version declared in the file header (e.g., 1.1). |
Total Segments (S lines) |
Count of all segment lines (nodes) in the graph. |
Total Links (L lines) |
Count of all link lines (edges) in the graph. |
Total Walks (W lines) |
Count of all walk lines, representing paths for samples. |
Unique Samples in Walks |
Number of distinct sample names found in the walk lines. |
Statistics regarding the physical length of the segments (nodes).
Metric |
Description |
|---|---|
Total Segment Length (bp) |
Sum of all unique segment lengths. Also known as Graph Size. |
Average Segment Length (bp) |
Mean length: |
Median Segment Length (bp) |
Median length across all segments. |
Top 5% Stats (Avg/Med) |
Average and Median lengths of the longest 5% of segments. |
Length Distribution |
Binned counts of segments by their base pair length. |
Connectivity metrics between segments.
Metric |
Description |
|---|---|
Max Segment Degree |
Highest number of links connected to any single segment. |
Average Segment Degree |
Mean connectivity: |
Self-Links (S1 -> S1) |
Links connecting a segment to itself. |
Inverted Links |
Links connecting segments with opposite orientations (e.g., + to -). |
Both Negative Links |
Links connecting two segments in reverse-complement orientation. |
How genomic sequences traverse the graph.
Metric |
Description |
|---|---|
Total Path Length (bp) |
Sum of all walk lengths. Represents the total sequence content. |
Graph Compression Ratio |
|
Max Segments in a Walk |
Highest count of nodes in a single walk (W-line). |
Sum of First Segments |
Sum of the lengths of the starting node of each walk. |
How samples are distributed across segments.
Metric |
Description |
|---|---|
Avg Unique Samples / Seg |
Similarity Mean: Mean number of samples passing through a segment. |
Median Unique Samples / Seg |
Similarity Median: Median number of samples per segment. |
Avg Occurrences / Seg |
Depth Mean: Average times a segment appears in all walks. |
StdDev Unique Samples |
Standard deviation of samples per segment. |
Topology and connectivity of the whole graph.
Metric |
Description |
|---|---|
Graph Density |
How close the graph is to being complete (fully connected). |
Segments/Links Ratio |
The balance between nodes and edges. |
Dead-End Segments |
Segments with only one link (degree 1). |
Isolated Segments |
Segments with no links at all (degree 0). |
Connected Components (CCs) |
Number of separate subgraphs. |
Largest CC Size (bp) |
Sequence length of the largest connected subgraph. |
Calculating statistics on very large graphs can be CPU-intensive. Once you have run gratools index, the statistics will be calculated much faster. If you use the --save flag, you can avoid re-calculating them for future reports.
๐ Quick Links
Command Index: gratools index
Pangenome Analysis: gratools core_dispensable_ratio