๐Ÿš€ Quick Start Guide๏ƒ

Welcome to GraTools! This guide will help you install the software and run your first pangenome graph analyses in minutes.

๐Ÿ“ฆ Installation๏ƒ

๐Ÿ“‹ Requirements
  • Python: Python 3.11+

  • System Tools: Bedtools must be installed and accessible in your PATH.

Recommended for most users.

# Update build tools
python3 -m pip install -U pip setuptools build
# Install GraTools
python3 -m pip install gratools

Test the latest features from the main branch.

python3 -m pip install GraTools@git+https://forge.ird.fr/diade/gratools.git@main
apptainer pull gratools.sif oras://registry.forge.ird.fr/diade/gratools/gratools:1.0.0
โœ… Verification

After installation, ensure GraTools is working correctly:

gratools --version
gratools --help

๐Ÿ› ๏ธ Command Overview๏ƒ

GraTools provides a rich set of subcommands. You can see them all by running gratools --help.

$ gratools
Welcome to GraTools version: '1.1.0.dev7'
@author: GraTools team's
        ____                 __________               ____          
      6MMMMMb/               MMMMMMMMMM               `MM          
     8P    YM               /   MM     \               MM          
    6M      Y ___  __    ___    MM   _____     _____   MM   ____   
    MM        `MM 6MM  6MMMMb   MM  6MMMMMb   6MMMMMb  MM  6MMMMb\ 
    MM         MM69 " 8M'  `Mb  MM 6M'   `Mb 6M'   `Mb MM MM'    ` 
    MM     ___ MM'        ,oMM  MM MM     MM MM     MM MM YM.      
    MM     `M' MM     ,6MM9'MM  MM MM     MM MM     MM MM  YMMMMb  
    YM      M  MM     MM'   MM  MM MM     MM MM     MM MM      `Mb 
     8b    d9  MM     MM.  ,MM  MM YM.   ,M9 YM.   ,M9 MM L    ,MM 
      YMMMMM9  _MM_   `YMMM9'Yb_MM_ YMMMMM9   YMMMMM9 _MM_MYMMMM9 
        \                                    /                /
        /''A''\          /''''''\           /     /''''A'''''\
  ...GC|       |..ATG...C...CG...T....TAG..'..GC.|            |...
        \..C../      \.............../            \...TATA.../
 
Please cite our gitlab: https://forge.ird.fr/diade/gratools.git\

Usage: gratools [OPTIONS] COMMAND [ARGS]...

  A toolkit for analyzing, manipulating, and extracting information from
  pangenome graphs in GFA format.

Options:
  -v, --version Show the version and exit.
  -h, --help    Show this message and exit.

GFA Content Information:
  list_samples
     List the samples embedded in the indexed GFA file.

  list_chr
     List the embedded chromosomes and their fragments if relevant from the
     indexed GFA file.

  stats
     Compute and display various statistics for a GFA file.

GFA Data Extraction:
  extract_subgraph
     Extracts a subgraph from a GFA file based on a query region.

  get_fasta
     Extracts sequences for a specific genomic region in FASTA format.

  to_bandage
     Generates a CSV file for the Bandage graph visualizer.

GFA Analysis:
  core_dispensable_ratio
     Compute and display the ratio of core and dispensable segments.

  depth_nodes_stat
     Display various statistics about segment depth (number of embedded
     samples).

  specific_groups_sample
     Identify segments shared by or specific to defined sample groups.

  get_segments_by_depth
     List segments within a specified depth range (number of encompassing
     samples).

Other commands:
  index
     Pre-processes a GFA file for faster GraTools operations.

  shell_completion
     Generates shell completion scripts (Bash, Zsh, Fish).

โ€”

๐Ÿ“– Command Examples๏ƒ

Click on the headers below to see example outputs for each major command.

๐Ÿ“Š stats: Compute graph statistics

Compute a range of statistics on the graph structure (segments, links, walks, connectivity).

$ gratools stats --gfa Og_cactus.gfa.gz

For more details, see the complete documentation for gratools stats.

๐Ÿ‘ฅ list_samples & list_chr: Explore graph content

Lists all unique sample names or chromosomes per sample found in a GFA file.

$ gratools list_samples --gfa Og_cactus.gfa.gz
$ gratools list_chr --gfa Og_cactus.gfa.gz

For more details, see gratools list_samples and gratools list_chr.

โœ‚๏ธ extract_subgraph & get_fasta: Data extraction

Extract specific regions defined by sample, chromosome, and positions.

# Extract Subgraph
$ gratools extract_subgraph --gfa Og_cactus.gfa.gz \
    --sample-query CG14 --chrom-query CG14_Chr07 \
    --start-query 100000 --stop-query 150000 \
    --all-samples

# Get FASTA
$ gratools get_fasta --gfa Og_cactus.gfa.gz \
    --sample-query CG14 --chrom-query CG14_Chr07 \
    --start-query 10000 --stop-query 15000 \
    --all-samples

For more details, see gratools extract_subgraph and gratools get_fasta.

โš–๏ธ core_dispensable_ratio & depth_nodes_stat: Pangenome analysis

Analyze how segments are shared across samples (core vs dispensable).

# Ratio Core/Dispensable
$ gratools core_dispensable_ratio -g Og_cactus.gfa.gz --input-as-number \
    --shared-min 4 --specific-max 2 --filter-len 50

# Node Depth Summary
$ gratools depth_nodes_stat --gfa Og_cactus.gfa.gz --filter-len 50 --threads 4

For more details, see gratools core_dispensable_ratio and gratools depth_nodes_stat.

๐Ÿ” specific_groups_sample: Group comparisons

Identify segments shared by or specific to defined sample groups.

$ gratools specific_groups_sample --gfa Og_cactus.gfa.gz \
    --samples-list-A list_A.txt \
    --samples-list-B list_B.txt \
    --output-csv

For more details, see gratools specific_groups_sample.

โ€”

๐Ÿ”„ Typical Workflow๏ƒ

1๏ธโƒฃ Indexing

Crucial first step. Index your GFA file to speed up all future operations.

gratools index --gfa my_graph.gfa.gz
2๏ธโƒฃ Exploration

Get an overview of your samples and graph properties.

gratools list_samples --gfa my.gfa.gz
gratools stats --gfa my.gfa.gz
3๏ธโƒฃ Analysis

Perform deep analysis or extract sequences/subgraphs.

gratools get_fasta --gfa my.gfa.gz ...
gratools core_dispensable_ratio ...

โ€”

๐Ÿงช Testing GraTools๏ƒ

Download our curated test dataset to explore functionalities immediately.

๐Ÿ“ฅ Download Dataset
wget http://itrop.ird.fr/GraTools/data-gratools.tar.gz
tar -zxvf data-gratools.tar.gz

Dataset Structure:

data-gratools/
โ”œโ”€โ”€ Bacteria/ (ecoli_MGC_graph.full.gfa)
โ”œโ”€โ”€ Bathyprasinos/ (Bathyprasinos_graph.full.gfa.gz)
โ””โ”€โ”€ Rice/ (Og_cactus.gfa.gz, NewRiceGraph_MGC.gfa.gz, ...)

โ€”

โš ๏ธ Troubleshooting๏ƒ

๐Ÿ› ๏ธ Setup Issues
  • Dependencies: Ensure bedtools is in your systemโ€™s PATH.

  • Logs: Check GraTools log files in the output directory for precise error messages.

๐Ÿ”„ Data Changes
  • Index Mismatch: If you modify your GFA, delete the index directory (*_GraTools_INDEX/) and run gratools index again.

โ€”

๐Ÿ†˜ Further Assistance๏ƒ

  • Documentation: Browse the sidebar for in-depth command references.

  • Issues: Open a ticket on the projectโ€™s Git repository.

  • Mailing List: Contact us at gratools@ird.fr.