gratools index๏
Pre-processes your GFA to allow near-instant access to segments and walks. Essential for large-scale pangenome graphs.
Creates an optimized index directory. Other GraTools commands will automatically detect and use these files.
The index command is a critical first step for using GraTools efficiently. It pre-processes a GFA file to create several auxiliary files that allow other commands to access graph data much faster. It is highly recommended to run index on your GFA file before using other analysis or extraction commands, especially with large graphs.
Options๏
๐ ๏ธ View Command Line Options
$ gratools index
Welcome to GraTools version: '1.1.0.dev7'
@author: GraTools team's
____ __________ ____
6MMMMMb/ MMMMMMMMMM `MM
8P YM / MM \ MM
6M Y ___ __ ___ MM _____ _____ MM ____
MM `MM 6MM 6MMMMb MM 6MMMMMb 6MMMMMb MM 6MMMMb\
MM MM69 " 8M' `Mb MM 6M' `Mb 6M' `Mb MM MM' `
MM ___ MM' ,oMM MM MM MM MM MM MM YM.
MM `M' MM ,6MM9'MM MM MM MM MM MM MM YMMMMb
YM M MM MM' MM MM MM MM MM MM MM `Mb
8b d9 MM MM. ,MM MM YM. ,M9 YM. ,M9 MM L ,MM
YMMMMM9 _MM_ `YMMM9'Yb_MM_ YMMMMM9 YMMMMM9 _MM_MYMMMM9
\ / /
/''A''\ /''''''\ / /''''A'''''\
...GC| |..ATG...C...CG...T....TAG..'..GC.| |...
\..C../ \.............../ \...TATA.../
Please cite our gitlab: https://forge.ird.fr/diade/gratools.git\
Usage: gratools index [OPTIONS]
The 'index' command parses a GFA file and creates several auxiliary files
(e.g., a BAM representation of segments, BED files for walks per sample, and a
statistics summary). These indexed files allow subsequent GraTools commands to
operate much more quickly on the GFA data.
It is highly recommended to run 'index' on your GFA file before using other
analysis or extraction commands for optimal performance, especially with large
GFA files. If an index already exists, GraTools will typically use it; this
command can be used to explicitly (re)generate the index.
For more details, see the full documentation:
https://gratools.readthedocs.io/en/latest/commands/index.html
Index Generation Options:
-g, --gfa PATH
Path to the input GFA file (e.g., myGraph.gfa or myGraph.gfa.gz).
[required]
--index-links / --no-index-links
index links on DB [default: no-index-links]
--disable-progress
Disable progress bars, which may improve performance for large GFA files.
Logging Options:
-vv, --verbosity [DEBUG|INFO|ERROR]
Set the logging verbosity level. [default: INFO]
-l, --log-path DIRECTORY
Directory where the log files will be saved. If not specified, logs will be
placed in the main output directory (or in a default GraTools log
location).
Performance Options:
-t, --threads INTEGER
Number of threads to be used for parallelizable operations. [default: 1]
Other options:
-h, --help
Show this message and exit.
Usage Examples๏
The simplest way to prepare your graph. GraTools creates a folder named [GFA_NAME].Gratools_index/.
$ gratools index --gfa Og_cactus.gfa.gz
Speed up the process with multiple threads.
$ gratools index --gfa Og_cactus.gfa.gz --threads 8
Include connectivity (links) for deeper analysis be more slow but stats command will be display more value.
$ gratools index --gfa Og_cactus.gfa.gz --index-links --threads 8
โ
What happens during Indexing?๏
The indexing process parses the GFA file once and stores the information in optimized formats. This avoids re-parsing the entire GFA for every subsequent command, leading to significant performance gains. When you run index, GraTools performs the following actions:
BAM Conversion
Segments are converted into a specialized BAM format. This allows GraTools to perform random access and retrieve specific sequence fragments without reading the whole file.
BED Mapping
Walks (paths of samples through the graph) are mapped into individual BED files per sample. This enables extremely fast coordinate-based queries.
Connectivity Database
If --index-links is used, all edges between segments are stored in a database. This is required for topology-heavy commands like stats.
Summary Generation
A baseline summary of the graph properties is calculated once, preventing redundant calculations in future sessions.
All these generated files are stored together in the GraTools index directory. If an index already exists, other GraTools commands will automatically detect and use it.
The index directory can be quite large (especially the BAM and Link files). Ensure you have enough disk space before indexing massive graphs. If you update your GFA file, remember to delete the existing index directory and re-run the command to avoid data mismatch.
๐ Next Steps
Explore your graph: gratools stats
List content: gratools list_samples
Full Workflow: ๐ Quick Start Guide