1 β Graph Descriptionο
Biological question: Which individuals are embedded in the pangenome variation graph (PVG)? What are their chromosomes/contigs, and what are the basic structural statistics of the graph?
This use case illustrates how GraTools can be used to quickly explore a GFA file and retrieve essential information about its content, without any prior knowledge of its internal structure.
The examples below are based on the Asian Rice graph built from 13 accessions [Zhou et al., 2020]
using MGC [Marthe et al., 2025], available as NewRiceGraph_MGC.gfa.gz.
Note
The stats command will automatically trigger the import step if it has not been
performed yet. You do not need to run gratools import separately.
Step 1: Basic Statisticsο
Use the stats command to obtain an overview of the graph structure:
gratools stats --gfa NewRiceGraph_MGC.gfa.gz
This command reports metrics such as:
Total number of segments (nodes) and links
Mean, median, minimum and maximum segment lengths
Total graph size in base pairs and compression ratio
Average and median node depth (number of haplotypes sharing a node)
Graph density, degree distribution, and connected components
Example output (Asian Rice graph):
Category |
Metric |
Value |
|---|---|---|
Graph Overview |
Total Segments |
26,461,214 |
Graph Overview |
Total Links |
36,276,920 |
Graph Overview |
Unique Samples in Walks |
13 |
Segment Statistics |
Total Segment Length (bp) |
858,417,045 |
Segment Statistics |
Average Segment Length (bp) |
32.44 |
Path Statistics |
Graph Compression Ratio |
5.95 |
Segment Sharing |
Avg Unique Samples per Segment |
7.36 |
Step 2: List Embedded Samplesο
Use the list_samples command to retrieve all sample (haplotype) names embedded in the graph:
gratools list_samples --gfa NewRiceGraph_MGC.gfa.gz
Example output (Asian Rice graph):
Total samples in GFA: 13
AzucenaRS1
IRGSP
Os117425RS1
Os125619RS1
Os125827RS1
Os127518RS1
Os127564RS1
Os127652RS1
Os127742RS1
Os128077RS1
Os132278RS1
Os132424RS1
OsIR64RS1
The output is also written to a CSV file for downstream use.
Step 3: List Chromosomes per Sampleο
Use the list_chr command to display all chromosomes or contigs for each sample, along with
their coordinates:
gratools list_chr --gfa NewRiceGraph_MGC.gfa.gz --full
The --full option outputs the start and end position of each fragment (chromosome/contig).
Without it, only names and counts are reported.
Example output (partial, Asian Rice graph):
Total samples: 13
Total unique chromosomes: 743
Min fragment length: 28,006 bp
Max fragment length: 44,959,450 bp
Average fragment length: 6,878,547.43 bp
Sample | Chromosome Fragment Name | Start | End
AzucenaRS1 | CM020633.1_Azucena_chromosome1 | 0 | 44,011,168
AzucenaRS1 | CM020634.1_Azucena_chromosome2 | 0 | 36,468,344
...
IRGSP | Chr1 | 0 | 43,270,923
IRGSP | Chr9 | 0 | 23,012,720
...
Summaryο
Command |
Description |
|---|---|
|
Compute graph-level statistics |
|
List all sample names embedded in the graph |
|
List all chromosomes/contigs with coordinates |
See also
2 β Subgraph & FASTA Extraction β Subgraph and FASTA extraction
3 β Core/Dispensable & Groups β Core/Dispensable genome analysis
4 β Advanced Pangenome Size Analysis β Advanced pangenome size analysis