gratools import๏ƒ

๐Ÿš€ Performance Boost

Pre-processes your GFA to allow near-instant access to segments and walks. Essential for large-scale pangenome graphs.

๐Ÿ“‚ Smart Persistence

Creates an optimized import directory. Other GraTools commands will automatically detect and use these files.

The import command is a critical first step for using GraTools efficiently. It pre-processes a GFA file to create several auxiliary files that allow other commands to access graph data much faster. It is highly recommended to run import on your GFA file before using other analysis or extraction commands, especially with large graphs.

Options๏ƒ

๐Ÿ› ๏ธ View Command Line Options
$ gratools import
Welcome to GraTools version: '1.1.0'
@author: GraTools team's
        ____                 __________               ____          
      6MMMMMb/               MMMMMMMMMM               `MM          
     8P    YM               /   MM     \               MM          
    6M      Y ___  __    ___    MM   _____     _____   MM   ____   
    MM        `MM 6MM  6MMMMb   MM  6MMMMMb   6MMMMMb  MM  6MMMMb\ 
    MM         MM69 " 8M'  `Mb  MM 6M'   `Mb 6M'   `Mb MM MM'    ` 
    MM     ___ MM'        ,oMM  MM MM     MM MM     MM MM YM.      
    MM     `M' MM     ,6MM9'MM  MM MM     MM MM     MM MM  YMMMMb  
    YM      M  MM     MM'   MM  MM MM     MM MM     MM MM      `Mb 
     8b    d9  MM     MM.  ,MM  MM YM.   ,M9 YM.   ,M9 MM L    ,MM 
      YMMMMM9  _MM_   `YMMM9'Yb_MM_ YMMMMM9   YMMMMM9 _MM_MYMMMM9 
        \                                    /                /
        /''A''\          /''''''\           /     /''''A'''''\
  ...GC|       |..ATG...C...CG...T....TAG..'..GC.|            |...
        \..C../      \.............../            \...TATA.../
 
Please cite our gitlab: https://forge.ird.fr/diade/gratools.git\

Usage: gratools import [OPTIONS]

  The 'import' command parses a GFA file and creates several auxiliary files
  (e.g., a BAM representation of segments, BED files for walks per sample, and a
  statistics summary). These imported files allow subsequent GraTools commands
  to operate much more quickly on the GFA data.
  
  It is highly recommended to run 'import' on your GFA file before using other
  analysis or extraction commands for optimal performance, especially with large
  GFA files. If an import already exists, GraTools will typically use it; this
  command can be used to explicitly (re)generate the import.
  
  For more details, see the full documentation:
  https://gratools.readthedocs.io/en/latest/commands/import.html

import Generation Options:
  -g, --gfa PATH
     Path to the input GFA file (e.g., myGraph.gfa or myGraph.gfa.gz).
     [required]

  --import-links / --no-import-links
     import links on DB  [default: no-import-links]

  --disable-progress
     Disable progress bars, which may improve performance for large GFA files.

Logging Options:
  -vv, --verbosity [DEBUG|INFO|ERROR]
     Set the logging verbosity level.  [default: INFO]

  -l, --log-path DIRECTORY
     Directory where the log files will be saved. If not specified, logs will be
     placed in the main output directory (or in a default GraTools log
     location).

Performance Options:
  -t, --threads INTEGER
     Number of threads to be used for parallelizable operations.  [default: 1]

Other options:
  -h, --help
     Show this message and exit.

Usage Examples๏ƒ

1. Basic importing

The simplest way to prepare your graph. GraTools creates a folder named [GFA_NAME]_Gratools-IMPORT/.

$ gratools import --gfa Og_cactus.gfa.gz
2. Threads importing

Speed up the process with multiple threads.

$ gratools import --gfa Og_cactus.gfa.gz --threads 8
3. Advanced importing

Include connectivity (links) for deeper analysis be more slow but stats command will be display more value.

$ gratools import --gfa Og_cactus.gfa.gz --import-links --threads 8

โ€”

What happens during importing?๏ƒ

The importing process parses the GFA file once and stores the information in optimized formats. This avoids re-parsing the entire GFA for every subsequent command, leading to significant performance gains. When you run import, GraTools performs the following actions:

BAM Conversion

Segments are converted into a specialized BAM format. This allows GraTools to perform random access and retrieve specific sequence fragments without reading the whole file.

BED Mapping

Walks (paths of samples through the graph) are mapped into individual BED files per sample. This enables extremely fast coordinate-based queries.

Connectivity Database

If --import-links is used, all edges between segments are stored in a database. This is required for topology-heavy commands like stats.

Summary Generation

A baseline summary of the graph properties is calculated once, preventing redundant calculations in future sessions.

All these generated files are stored together in the GraTools import directory. If an import already exists, other GraTools commands will automatically detect and use it.

๐Ÿ’ก Pro Tip: Disk Space

The import directory can be quite large (especially the BAM and Link files). Ensure you have enough disk space before importing massive graphs. If you update your GFA file, remember to delete the existing import directory and re-run the command to avoid data mismatch.

๐Ÿ“‘ Next Steps