๐ Quick Start Guide๏
Welcome to GraTools! This guide will help you install the software and run your first pangenome graph analyses in minutes.
๐ฆ Installation๏
System Tools: Bedtools must be installed and accessible in your PATH.
Recommended for most users.
# Update build tools
python3 -m pip install -U pip setuptools build
# Install GraTools
python3 -m pip install gratools
Test the latest features from the main branch.
python3 -m pip install GraTools@git+https://forge.ird.fr/diade/gratools.git@main
apptainer pull gratools.sif oras://registry.forge.ird.fr/diade/gratools/gratools:1.0.0
After installation, ensure GraTools is working correctly:
gratools --version
gratools --help
๐ ๏ธ Command Overview๏
GraTools provides a rich set of subcommands. You can see them all by running gratools --help.
$ gratools
Welcome to GraTools version: '1.1.0.dev7'
@author: GraTools team's
____ __________ ____
6MMMMMb/ MMMMMMMMMM `MM
8P YM / MM \ MM
6M Y ___ __ ___ MM _____ _____ MM ____
MM `MM 6MM 6MMMMb MM 6MMMMMb 6MMMMMb MM 6MMMMb\
MM MM69 " 8M' `Mb MM 6M' `Mb 6M' `Mb MM MM' `
MM ___ MM' ,oMM MM MM MM MM MM MM YM.
MM `M' MM ,6MM9'MM MM MM MM MM MM MM YMMMMb
YM M MM MM' MM MM MM MM MM MM MM `Mb
8b d9 MM MM. ,MM MM YM. ,M9 YM. ,M9 MM L ,MM
YMMMMM9 _MM_ `YMMM9'Yb_MM_ YMMMMM9 YMMMMM9 _MM_MYMMMM9
\ / /
/''A''\ /''''''\ / /''''A'''''\
...GC| |..ATG...C...CG...T....TAG..'..GC.| |...
\..C../ \.............../ \...TATA.../
Please cite our gitlab: https://forge.ird.fr/diade/gratools.git\
Usage: gratools [OPTIONS] COMMAND [ARGS]...
A toolkit for analyzing, manipulating, and extracting information from
pangenome graphs in GFA format.
Options:
-v, --version Show the version and exit.
-h, --help Show this message and exit.
GFA Content Information:
list_samples
List the samples embedded in the indexed GFA file.
list_chr
List the embedded chromosomes and their fragments if relevant from the
indexed GFA file.
stats
Compute and display various statistics for a GFA file.
GFA Data Extraction:
extract_subgraph
Extracts a subgraph from a GFA file based on a query region.
get_fasta
Extracts sequences for a specific genomic region in FASTA format.
to_bandage
Generates a CSV file for the Bandage graph visualizer.
GFA Analysis:
core_dispensable_ratio
Compute and display the ratio of core and dispensable segments.
depth_nodes_stat
Display various statistics about segment depth (number of embedded
samples).
specific_groups_sample
Identify segments shared by or specific to defined sample groups.
get_segments_by_depth
List segments within a specified depth range (number of encompassing
samples).
Other commands:
index
Pre-processes a GFA file for faster GraTools operations.
shell_completion
Generates shell completion scripts (Bash, Zsh, Fish).
โ
๐ Command Examples๏
Click on the headers below to see example outputs for each major command.
๐ stats: Compute graph statistics
Compute a range of statistics on the graph structure (segments, links, walks, connectivity).
$ gratools stats --gfa Og_cactus.gfa.gz
For more details, see the complete documentation for gratools stats.
๐ฅ list_samples & list_chr: Explore graph content
Lists all unique sample names or chromosomes per sample found in a GFA file.
$ gratools list_samples --gfa Og_cactus.gfa.gz
$ gratools list_chr --gfa Og_cactus.gfa.gz
For more details, see gratools list_samples and gratools list_chr.
โ๏ธ extract_subgraph & get_fasta: Data extraction
Extract specific regions defined by sample, chromosome, and positions.
# Extract Subgraph
$ gratools extract_subgraph --gfa Og_cactus.gfa.gz \
--sample-query CG14 --chrom-query CG14_Chr07 \
--start-query 100000 --stop-query 150000 \
--all-samples
# Get FASTA
$ gratools get_fasta --gfa Og_cactus.gfa.gz \
--sample-query CG14 --chrom-query CG14_Chr07 \
--start-query 10000 --stop-query 15000 \
--all-samples
For more details, see gratools extract_subgraph and gratools get_fasta.
โ๏ธ core_dispensable_ratio & depth_nodes_stat: Pangenome analysis
Analyze how segments are shared across samples (core vs dispensable).
# Ratio Core/Dispensable
$ gratools core_dispensable_ratio -g Og_cactus.gfa.gz --input-as-number \
--shared-min 4 --specific-max 2 --filter-len 50
# Node Depth Summary
$ gratools depth_nodes_stat --gfa Og_cactus.gfa.gz --filter-len 50 --threads 4
For more details, see gratools core_dispensable_ratio and gratools depth_nodes_stat.
๐ specific_groups_sample: Group comparisons
Identify segments shared by or specific to defined sample groups.
$ gratools specific_groups_sample --gfa Og_cactus.gfa.gz \
--samples-list-A list_A.txt \
--samples-list-B list_B.txt \
--output-csv
For more details, see gratools specific_groups_sample.
โ
๐ Typical Workflow๏
Crucial first step. Index your GFA file to speed up all future operations.
gratools index --gfa my_graph.gfa.gz
Get an overview of your samples and graph properties.
gratools list_samples --gfa my.gfa.gz
gratools stats --gfa my.gfa.gz
Perform deep analysis or extract sequences/subgraphs.
gratools get_fasta --gfa my.gfa.gz ...
gratools core_dispensable_ratio ...
โ
๐งช Testing GraTools๏
Download our curated test dataset to explore functionalities immediately.
wget http://itrop.ird.fr/GraTools/data-gratools.tar.gz
tar -zxvf data-gratools.tar.gz
Dataset Structure:
data-gratools/
โโโ Bacteria/ (ecoli_MGC_graph.full.gfa)
โโโ Bathyprasinos/ (Bathyprasinos_graph.full.gfa.gz)
โโโ Rice/ (Og_cactus.gfa.gz, NewRiceGraph_MGC.gfa.gz, ...)
โ
โ ๏ธ Troubleshooting๏
Dependencies: Ensure bedtools is in your systemโs PATH.
Logs: Check GraTools log files in the output directory for precise error messages.
Index Mismatch: If you modify your GFA, delete the index directory (
*_GraTools_INDEX/) and rungratools indexagain.
โ
๐ Further Assistance๏
Documentation: Browse the sidebar for in-depth command references.
Issues: Open a ticket on the projectโs Git repository.
Mailing List: Contact us at gratools@ird.fr.