Welcome to the Quick Start Guide for GraTools!

This guide will help you quickly install, configure and start using GraTools.

Install GraTools

Make sure you have Python 3.11+ and Bedtools , installed and accessible from the command line.

Install the development version from github

You can test new futures not yet available in the latest release :

python3 -m pip install GraTools@git+https://forge.ird.fr/diade/gratools.git@main

Replace main with the name of the branch you want to test, if needed.

Verify the installation

gratools --version
gratools --help

Command List and Help

To see all available commands and their descriptions, simply type gratools –help.

Using GraTools

This section provides a quick overview of some of the key functionalities of GraTools. Each subsection includes a brief description and an example command to help you get started.

stats

Compute a range of statistics on the graph structure, such as segment counts, link properties, walk characteristics, and connectivity.

$ gratools stats --gfa Og_cactus.gfa.gz

--- GFA Statistics for: Og_cactus ---
       Graph Overview Statistics
╭─────────────────────────┬─────────────────────────────────────────────────────────────┬───────────────────────╮
│ Category                 Metric                                                       Value                 │
├─────────────────────────┼─────────────────────────────────────────────────────────────┼───────────────────────┤
│ Graph Overview           GFA File Name                                                NewRiceGraph_MGC      │
│ Graph Overview           GFA Version                                                  1.1                   │
│ Graph Overview           Total Segments (S lines)                                     26,461,214            │
│ Graph Overview           Total Links (L lines)                                        36,276,920            │
│ Graph Overview           Total Walks (W lines)                                        743                   │
│ Graph Overview           Unique Samples in Walks                                      13                    │
│ Segment Statistics       Total Segment Length (bp)                                    858,417,045           │
│ Segment Statistics       Average Segment Length (bp)                                  32.44                 │
│ Segment Statistics       Median Segment Length (bp)                                   1.00                  │
│ Segment Statistics       Avg Length of Top 5% Longest Segments (bp)                   508.68                │
│ Segment Statistics       Median Length of Top 5% Longest Segments (bp)                155.00                │
│ Segment Statistics       Segment Length Distribution                                  0-499bp: 26325942,    │
│ ...                      ...                                                          ...                   │
╰─────────────────────────┴─────────────────────────────────────────────────────────────┴───────────────────────╯

For more details, see the complete documentation for :ref:`stats`.

list_samples

Lists all unique sample names found in a GFA file

$ gratools list_samples --gfa Og_cactus.gfa.gz
─────────────────────────────── Summary ────────────────────────────────
Total samples in GFA: 5
   Available
Samples in GFA:
   Og_cactus
╭─────────────╮
│ Sample Name │
├─────────────┤
│ CG14        │
│ Og20        │
│ Og103       │
│ Og182       │
│ Tog5681     │
╰─────────────╯

For more details, see the complete documentation for `list_samples`.

list_chr

Lists the chromosomes per sample in a GFA file in a short or full format

$ gratools list_chr --gfa Og_cactus.gfa.gz

────────────────────────────────── Summary ─────────────────────────────────────────
Total samples: 5
Min chromosomes per sample: 2
Max chromosomes per sample: 2
              5 Samples, 2-2 Unique Chromosomes/Sample (GFA: Og_cactus)
╭─────────────┬──────────────────────────────────────┬──────────────────────────────╮
│ Sample Name  Unique Chromosomes (Comma-separated)  Number of Unique Chromosomes │
├─────────────┼──────────────────────────────────────┼──────────────────────────────┤
│ CG14         CG14_Chr07, CG14_Chr08                2                            │
├─────────────┼──────────────────────────────────────┼──────────────────────────────┤
│ Og103        Og103_Chr07, Og103_Chr08              2                            │
├─────────────┼──────────────────────────────────────┼──────────────────────────────┤
│ Og182        Og182_Chr07, Og182_Chr08              2                            │
├─────────────┼──────────────────────────────────────┼──────────────────────────────┤
│ Og20         Og20_Chr07, Og20_Chr08                2                            │
├─────────────┼──────────────────────────────────────┼──────────────────────────────┤
│ Tog5681      Tog5681_Chr07, Tog5681_Chr08          2                            │
╰─────────────┴──────────────────────────────────────┴──────────────────────────────╯

For more details, see the complete documentation for `list_chr`.

extract_subgraph

Extracts a subgraph of a specific region from the GFA, defined by a query sample, chromosome, and start/end.

$ gratools extract_subgraph --gfa Og_cactus.gfa.gz \
    --sample-query CG14 --chrom-query CG14_Chr07 \
    --start-query 100000 --stop-query 150000 \
    --all-samples --num-threads 4

01-13 14:25 |  INFO     | Generated GFA file Og_cactus_subgraph_CG14-CG14_Chr07-100000-150000.gfa.gz

For more details, see :ref:`extract_subgraph_doc`.

get_fasta

Extracts sequences from a specific region degined by sample, chromosome and start/stop positions for the query sample and any other specified samples

$ gratools get_fasta --gfa Og_cactus.gfa.gz \
    --sample-query CG14 --chrom-query CG14_Chr07 \
    --start-query 10000 --stop-query 15000 \
    --all-samples --num-threads 8

For more details, see the complete documentation for:ref:`get_fasta_doc`.

core_dispensable_ratio

Calculates the ratio of core segments (shared by almost all samples) versus the dispensable ones (present in a smaller subset of samples)

$ gratools core_dispensable_ratio -g Og_cactus.gfa.gz --input-as-number \
    --shared-min 4 --specific-max 2 --filter-len 50

───────────────── Summary ──────────────────
Total segments in GFA: 2,354,995
...
╭───────────────────────────────────┬───────────┬───────────┬────────────╮
│ Category                               Count      Total  Percentage │
├───────────────────────────────────┼───────────┼───────────┼────────────┤
│ Shared (Core) - Raw                1,162,062  2,354,995      49.34% │
│ ...                                      ...        ...         ... │
╰───────────────────────────────────┴───────────┴───────────┴────────────╯

For more details, see the complete documentation for:ref:`core_dispensable_ratio_doc`.

get_segments_by_depth

Lists segments that are present in a number of samples falling with a specified range.

$ gratools get_segments_by_depth --gfa Og_cactus.gfa.gz \
    --input-as-number --lower-bound 0 --upper-bound 2

|  INFO     | Parameters: lower=0; upper=2; filter_len=0
|  INFO     | Generate CSV file Og_cactus_segment_by_depth_between_0-2_individuals.csv
|  INFO     | Number of segments found: 891345 between 0 and 2 individuals

For more details, see the complete documentation for:ref:`get_segments_by_depth_doc`.

depth_nodes_stat

Displays a table summarizing how segments are shared across different samples by calculating the ‘depth’ of each segment (i.e., the number of unique samples encompassing it).

$ gratools depth_nodes_stat --gfa Og_cactus.gfa.gz --filter-len 50 --threads 4

────────────────  Summary ────────────────────────────────────────
Total segments analyzed: 2,354,995
Total segments passing length filter: 210,046 (8.92%)

          Node Depth Statistics  Og_cactus (Len  50bp)
╭───────┬──────────┬────────────┬───────────────────┬────────────╮
│ Depth  Segments  Percentage  Filtered Segments  Filtered % │
├───────┼──────────┼────────────┼───────────────────┼────────────┤
│   1    575,292     24.43%          1,257          0.60%    │
│   2    316,053     13.42%          7,345          3.50%    │
│   3    301,588     12.81%          5,604          2.67%    │
│   4    501,321     21.29%         11,500          5.47%    │
│   5    660,741     28.06%         184,340         87.76%   │
╰───────┴──────────┴────────────┴───────────────────┴────────────╯

For more details, see the complete documentation for :ref:`depth_nodes_stat_doc`.

specific_groups_sample

Identify segments shared by or specific to defined sample groups

$ gratools specific_groups_sample --gfa Og_cactus.gfa.gz \
    --samples-list-A list_A.txt \
    --samples-list-B list_B.txt \
    --output-csv --suffix test


╭────────────────────────────────── 📊 Shared & Specific Segment Analysis ───────────────────────────────────╮
│  Segments shared by 2 in list ['CG14', 'Og20']                                                             │
│    ├─ Count                                       980,985 / 2,354,995                                      │
│    ├─ Percentage                                  41.66%                                                   │
│    └─ Total Length                                48,986,534 bp                                            │
│                                                                                                            │
│  Segments specific to 2 in list ['CG14', 'Og20'] and absent in 1 in list ['Tog5681']                       │
│    ├─ Count                                       214,326 / 2,354,995                                      │
│    ├─ Percentage                                  9.10%                                                    │
│    └─ Total Length                                1,020,689 bp                                             │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

For more details, see the complete documentation for :ref:`specific_groups_sample_doc`.

Typical Workflow

For the best performance, a typical GraTools session follows these steps:

  1. Index the GFA File: The index command is the cruciel first step. It parses your GFA file once to create index files, which will greatly speed up all subsequent commands.

    gratools index --gfa my_graph.gfa.gz
    
  2. Explore the Graph Content: Use the “GFA Content Information” commands to describe your data before running further analysis.

    # See what samples are in the graph
    gratools list_samples --gfa my_graph.gfa.gz
    
    # Get an overview of the graph's properties
    gratools stats --gfa my_graph.gfa.gz
    
  3. Analyze or Extract Data: After exploring the graph, you can analyse it further or extract specific subgraphs and sequences.

    # Extract a specific region for all samples into a FASTA file
    gratools get_fasta --gfa my_graph.gfa.gz --sample-query Ref --chrom-query chr1 --all-samples
    
    # Analyze the core vs. dispensable genome
    gratools core_dispensable_ratio --gfa my_graph.gfa.gz --input-as-percentage --shared-min 95% --specific-max 5%
    

Testing GraTools

Download a test dataset to explore GraTools’ functionalities.

# Go to the directory where you want to download data
# and download the dataset
wget http://itrop.ird.fr/GraTools/data-gratools.tar.gz

# Extract the dataset
tar -zxvf data-gratools.tar.gz
data-gratools/
├── Bacteria
│   └── ecoli_MGC_graph.full.gfa
├── Bathyprasinos
│   └── Bathyprasinos_graph.full.gfa.gz
├── README
├── Rice
  ├── inversion_duplications_nipponbare.bed
  ├── MGC_RiceGraph_Chr8.gfa.gz
  ├── NewRiceGraph_MGC.gfa.gz
  └── Og_cactus.gfa.gz

3 directories, 7 files

Troubleshooting

  • Installation Problems: Ensure all dependencies, especially bedtools, are correctly installed and available in your system’s PATH.

  • Runtime Errors: GraTools creates detailed log files in its output directory. Check these logs for specific error messages.

  • Index Issues: If you modify or replace your GFA file, remember to delete the old index directory (e.g., my_graph.gfa.gz_GraTools_INDEX/) and run gratools index again.

Further Assistance

For more help, please refer to the complete official documentation or open an issue on the project’s Git repository.