Welcome to the Quick Start Guide for GraTools!
This guide will help you quickly install, configure and start using GraTools.
Install GraTools
Make sure you have and Bedtools , installed and accessible from the command line.
Install the lastest GraTools release (Recommended)
Install the latest release from PyPi using pip
# Update pip and build tools if needed python3 -m pip install -U pip setuptools build python3 -m pip install gratools
Install the development version from github
You can test new futures not yet available in the latest release :
python3 -m pip install GraTools@git+https://forge.ird.fr/diade/gratools.git@main
Replace main with the name of the branch you want to test, if needed.
Verify the installation
gratools --version gratools --help
Command List and Help
To see all available commands and their descriptions, simply type gratools –help.
Using GraTools
This section provides a quick overview of some of the key functionalities of GraTools. Each subsection includes a brief description and an example command to help you get started.
stats
Compute a range of statistics on the graph structure, such as segment counts, link properties, walk characteristics, and connectivity.
$ gratools stats --gfa Og_cactus.gfa.gz
--- GFA Statistics for: Og_cactus ---
Graph Overview Statistics
╭─────────────────────────┬─────────────────────────────────────────────────────────────┬───────────────────────╮
│ Category │ Metric │ Value │
├─────────────────────────┼─────────────────────────────────────────────────────────────┼───────────────────────┤
│ Graph Overview │ GFA File Name │ NewRiceGraph_MGC │
│ Graph Overview │ GFA Version │ 1.1 │
│ Graph Overview │ Total Segments (S lines) │ 26,461,214 │
│ Graph Overview │ Total Links (L lines) │ 36,276,920 │
│ Graph Overview │ Total Walks (W lines) │ 743 │
│ Graph Overview │ Unique Samples in Walks │ 13 │
│ Segment Statistics │ Total Segment Length (bp) │ 858,417,045 │
│ Segment Statistics │ Average Segment Length (bp) │ 32.44 │
│ Segment Statistics │ Median Segment Length (bp) │ 1.00 │
│ Segment Statistics │ Avg Length of Top 5% Longest Segments (bp) │ 508.68 │
│ Segment Statistics │ Median Length of Top 5% Longest Segments (bp) │ 155.00 │
│ Segment Statistics │ Segment Length Distribution │ 0-499bp: 26325942, │
│ ... │ ... │ ... │
╰─────────────────────────┴─────────────────────────────────────────────────────────────┴───────────────────────╯
For more details, see the complete documentation for :ref:`stats`.
list_samples
Lists all unique sample names found in a GFA file
$ gratools list_samples --gfa Og_cactus.gfa.gz
─────────────────────────────── Summary ────────────────────────────────
Total samples in GFA: 5
Available
Samples in GFA:
Og_cactus
╭─────────────╮
│ Sample Name │
├─────────────┤
│ CG14 │
│ Og20 │
│ Og103 │
│ Og182 │
│ Tog5681 │
╰─────────────╯
For more details, see the complete documentation for `list_samples`.
list_chr
Lists the chromosomes per sample in a GFA file in a short or full format
$ gratools list_chr --gfa Og_cactus.gfa.gz
────────────────────────────────── Summary ─────────────────────────────────────────
Total samples: 5
Min chromosomes per sample: 2
Max chromosomes per sample: 2
5 Samples, 2-2 Unique Chromosomes/Sample (GFA: Og_cactus)
╭─────────────┬──────────────────────────────────────┬──────────────────────────────╮
│ Sample Name │ Unique Chromosomes (Comma-separated) │ Number of Unique Chromosomes │
├─────────────┼──────────────────────────────────────┼──────────────────────────────┤
│ CG14 │ CG14_Chr07, CG14_Chr08 │ 2 │
├─────────────┼──────────────────────────────────────┼──────────────────────────────┤
│ Og103 │ Og103_Chr07, Og103_Chr08 │ 2 │
├─────────────┼──────────────────────────────────────┼──────────────────────────────┤
│ Og182 │ Og182_Chr07, Og182_Chr08 │ 2 │
├─────────────┼──────────────────────────────────────┼──────────────────────────────┤
│ Og20 │ Og20_Chr07, Og20_Chr08 │ 2 │
├─────────────┼──────────────────────────────────────┼──────────────────────────────┤
│ Tog5681 │ Tog5681_Chr07, Tog5681_Chr08 │ 2 │
╰─────────────┴──────────────────────────────────────┴──────────────────────────────╯
For more details, see the complete documentation for `list_chr`.
extract_subgraph
Extracts a subgraph of a specific region from the GFA, defined by a query sample, chromosome, and start/end.
$ gratools extract_subgraph --gfa Og_cactus.gfa.gz \
--sample-query CG14 --chrom-query CG14_Chr07 \
--start-query 100000 --stop-query 150000 \
--all-samples --num-threads 4
01-13 14:25 | INFO | Generated GFA file Og_cactus_subgraph_CG14-CG14_Chr07-100000-150000.gfa.gz
For more details, see :ref:`extract_subgraph_doc`.
get_fasta
Extracts sequences from a specific region degined by sample, chromosome and start/stop positions for the query sample and any other specified samples
$ gratools get_fasta --gfa Og_cactus.gfa.gz \
--sample-query CG14 --chrom-query CG14_Chr07 \
--start-query 10000 --stop-query 15000 \
--all-samples --num-threads 8
For more details, see the complete documentation for:ref:`get_fasta_doc`.
core_dispensable_ratio
Calculates the ratio of core segments (shared by almost all samples) versus the dispensable ones (present in a smaller subset of samples)
$ gratools core_dispensable_ratio -g Og_cactus.gfa.gz --input-as-number \
--shared-min 4 --specific-max 2 --filter-len 50
───────────────── Summary ──────────────────
Total segments in GFA: 2,354,995
...
╭───────────────────────────────────┬───────────┬───────────┬────────────╮
│ Category │ Count │ Total │ Percentage │
├───────────────────────────────────┼───────────┼───────────┼────────────┤
│ Shared (Core) - Raw │ 1,162,062 │ 2,354,995 │ 49.34% │
│ ... │ ... │ ... │ ... │
╰───────────────────────────────────┴───────────┴───────────┴────────────╯
For more details, see the complete documentation for:ref:`core_dispensable_ratio_doc`.
get_segments_by_depth
Lists segments that are present in a number of samples falling with a specified range.
$ gratools get_segments_by_depth --gfa Og_cactus.gfa.gz \
--input-as-number --lower-bound 0 --upper-bound 2
| INFO | Parameters: lower=0; upper=2; filter_len=0
| INFO | Generate CSV file Og_cactus_segment_by_depth_between_0-2_individuals.csv
| INFO | Number of segments found: 891345 between 0 and 2 individuals
For more details, see the complete documentation for:ref:`get_segments_by_depth_doc`.
depth_nodes_stat
Displays a table summarizing how segments are shared across different samples by calculating the ‘depth’ of each segment (i.e., the number of unique samples encompassing it).
$ gratools depth_nodes_stat --gfa Og_cactus.gfa.gz --filter-len 50 --threads 4
──────────────── Summary ────────────────────────────────────────
Total segments analyzed: 2,354,995
Total segments passing length filter: 210,046 (8.92%)
Node Depth Statistics — Og_cactus (Len ≥ 50bp)
╭───────┬──────────┬────────────┬───────────────────┬────────────╮
│ Depth │ Segments │ Percentage │ Filtered Segments │ Filtered % │
├───────┼──────────┼────────────┼───────────────────┼────────────┤
│ 1 │ 575,292 │ 24.43% │ 1,257 │ 0.60% │
│ 2 │ 316,053 │ 13.42% │ 7,345 │ 3.50% │
│ 3 │ 301,588 │ 12.81% │ 5,604 │ 2.67% │
│ 4 │ 501,321 │ 21.29% │ 11,500 │ 5.47% │
│ 5 │ 660,741 │ 28.06% │ 184,340 │ 87.76% │
╰───────┴──────────┴────────────┴───────────────────┴────────────╯
For more details, see the complete documentation for :ref:`depth_nodes_stat_doc`.
specific_groups_sample
Identify segments shared by or specific to defined sample groups
$ gratools specific_groups_sample --gfa Og_cactus.gfa.gz \
--samples-list-A list_A.txt \
--samples-list-B list_B.txt \
--output-csv --suffix test
╭────────────────────────────────── 📊 Shared & Specific Segment Analysis ───────────────────────────────────╮
│ Segments shared by 2 in list ['CG14', 'Og20'] │
│ ├─ Count 980,985 / 2,354,995 │
│ ├─ Percentage 41.66% │
│ └─ Total Length 48,986,534 bp │
│ │
│ Segments specific to 2 in list ['CG14', 'Og20'] and absent in 1 in list ['Tog5681'] │
│ ├─ Count 214,326 / 2,354,995 │
│ ├─ Percentage 9.10% │
│ └─ Total Length 1,020,689 bp │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
For more details, see the complete documentation for :ref:`specific_groups_sample_doc`.
Typical Workflow
For the best performance, a typical GraTools session follows these steps:
Index the GFA File: The index command is the cruciel first step. It parses your GFA file once to create index files, which will greatly speed up all subsequent commands.
gratools index --gfa my_graph.gfa.gz
Explore the Graph Content: Use the “GFA Content Information” commands to describe your data before running further analysis.
# See what samples are in the graph gratools list_samples --gfa my_graph.gfa.gz # Get an overview of the graph's properties gratools stats --gfa my_graph.gfa.gz
Analyze or Extract Data: After exploring the graph, you can analyse it further or extract specific subgraphs and sequences.
# Extract a specific region for all samples into a FASTA file gratools get_fasta --gfa my_graph.gfa.gz --sample-query Ref --chrom-query chr1 --all-samples # Analyze the core vs. dispensable genome gratools core_dispensable_ratio --gfa my_graph.gfa.gz --input-as-percentage --shared-min 95% --specific-max 5%
Testing GraTools
Download a test dataset to explore GraTools’ functionalities.
# Go to the directory where you want to download data
# and download the dataset
wget http://itrop.ird.fr/GraTools/data-gratools.tar.gz
# Extract the dataset
tar -zxvf data-gratools.tar.gz
data-gratools/
├── Bacteria
│ └── ecoli_MGC_graph.full.gfa
├── Bathyprasinos
│ └── Bathyprasinos_graph.full.gfa.gz
├── README
├── Rice
├── inversion_duplications_nipponbare.bed
├── MGC_RiceGraph_Chr8.gfa.gz
├── NewRiceGraph_MGC.gfa.gz
└── Og_cactus.gfa.gz
3 directories, 7 files
Troubleshooting
Installation Problems: Ensure all dependencies, especially bedtools, are correctly installed and available in your system’s PATH.
Runtime Errors: GraTools creates detailed log files in its output directory. Check these logs for specific error messages.
Index Issues: If you modify or replace your GFA file, remember to delete the old index directory (e.g., my_graph.gfa.gz_GraTools_INDEX/) and run gratools index again.
Further Assistance
For more help, please refer to the complete official documentation or open an issue on the project’s Git repository.