1 β€” Graph Description

Biological question: Which individuals are embedded in the pangenome variation graph (PVG)? What are their chromosomes/contigs, and what are the basic structural statistics of the graph?

This use case illustrates how GraTools can be used to quickly explore a GFA file and retrieve essential information about its content, without any prior knowledge of its internal structure.

The examples below are based on the Asian Rice graph built from 13 accessions [Zhou et al., 2020] using MGC [Marthe et al., 2025], available as NewRiceGraph_MGC.gfa.gz.

Note

The stats command will automatically trigger the import step if it has not been performed yet. You do not need to run gratools import separately.


Step 1: Basic Statistics

Use the stats command to obtain an overview of the graph structure:

gratools stats --gfa NewRiceGraph_MGC.gfa.gz

This command reports metrics such as:

  • Total number of segments (nodes) and links

  • Mean, median, minimum and maximum segment lengths

  • Total graph size in base pairs and compression ratio

  • Average and median node depth (number of haplotypes sharing a node)

  • Graph density, degree distribution, and connected components

Example output (Asian Rice graph):

Category

Metric

Value

Graph Overview

Total Segments

26,461,214

Graph Overview

Total Links

36,276,920

Graph Overview

Unique Samples in Walks

13

Segment Statistics

Total Segment Length (bp)

858,417,045

Segment Statistics

Average Segment Length (bp)

32.44

Path Statistics

Graph Compression Ratio

5.95

Segment Sharing

Avg Unique Samples per Segment

7.36


Step 2: List Embedded Samples

Use the list_samples command to retrieve all sample (haplotype) names embedded in the graph:

gratools list_samples --gfa NewRiceGraph_MGC.gfa.gz

Example output (Asian Rice graph):

Total samples in GFA: 13

AzucenaRS1
IRGSP
Os117425RS1
Os125619RS1
Os125827RS1
Os127518RS1
Os127564RS1
Os127652RS1
Os127742RS1
Os128077RS1
Os132278RS1
Os132424RS1
OsIR64RS1

The output is also written to a CSV file for downstream use.


Step 3: List Chromosomes per Sample

Use the list_chr command to display all chromosomes or contigs for each sample, along with their coordinates:

gratools list_chr --gfa NewRiceGraph_MGC.gfa.gz --full

The --full option outputs the start and end position of each fragment (chromosome/contig). Without it, only names and counts are reported.

Example output (partial, Asian Rice graph):

Total samples: 13
Total unique chromosomes: 743
Min fragment length: 28,006 bp
Max fragment length: 44,959,450 bp
Average fragment length: 6,878,547.43 bp

Sample         | Chromosome Fragment Name                     | Start | End
AzucenaRS1     | CM020633.1_Azucena_chromosome1               | 0     | 44,011,168
AzucenaRS1     | CM020634.1_Azucena_chromosome2               | 0     | 36,468,344
...
IRGSP          | Chr1                                         | 0     | 43,270,923
IRGSP          | Chr9                                         | 0     | 23,012,720
...

Summary

Command

Description

gratools stats --gfa <file.gfa.gz>

Compute graph-level statistics

gratools list_samples --gfa <file.gfa.gz>

List all sample names embedded in the graph

gratools list_chr --gfa <file.gfa.gz> --full

List all chromosomes/contigs with coordinates

See also