gratools core_dispensable_ratio

Options

Usage Examples

Displays the ratio of core and dispensable segments

The following example calculates the ratio for a GFA containing 5 samples. Segments are considered “core” if they are shared by at least 4 samples (–shared-min 4) and “dispensable” if they are present in 2 or fewer samples (–specific-max 2). A length filter is also applied to only consider segments of 50 bp or longer (–filter-len 50).

$ gratools core_dispensable_ratio -g Og_cactus.gfa.gz --input-as-number \
    --shared-min 4 --specific-max 2 --filter-len 50 --threads 4

─────────────────────────────────────────── Summary ──────────────────────────────────────
Total segments in GFA: 2,354,995
Total segments analyzed: 2,354,995
Total segments passing length filter ( 50bp): 210,046 (8.92%)

                         Core vs. Dispensable Segments  Og_cactus
╭────────────────────────────────────────────────────┬───────────┬───────────┬────────────╮
│ Category                                                Count      Total  Percentage │
├────────────────────────────────────────────────────┼───────────┼───────────┼────────────┤
│ Shared (Core) - Raw                                 1,162,062  2,354,995      49.34% │
│ Specific (Dispensable) - Raw                          891,345  2,354,995      37.85% │
│ Shared (Core) - Filtered (Length >= 50bp)             195,840    210,046      93.24% │
│ Specific (Dispensable) - Filtered (Length >= 50bp)      8,602    210,046       4.10% │
│ Segments Filtered Out by Length                     2,144,949  2,354,995      91.08% │
╰────────────────────────────────────────────────────┴───────────┴───────────┴────────────╯

Understanding the Output Table

  • Shared (Core) - Raw: The number of core segments as a percentage of all segments in the GFA.

  • Specific (Dispensable) - Raw: The number of dispensable segments as a percentage of all segments in the GFA.

  • Shared (Core) - Filtered: The number of core segments that also meet the length filter, as a percentage of only the filtered segments. This shows the composition of the longer segments.

  • Specific (Dispensable) - Filtered: The number of dispensable segments that meet the length filter, as a percentage of only the filtered segments.

  • Segments Filtered Out by Length: The total number and percentage of segments that were excluded from the “Filtered” analysis because they were shorter than the –filter-len value.

Illustrated Example

Understanding the Process

  1. The `shared-min` and `specific-max` options: The --shared-min (-sm) and --specific-max (-spm) options specify the number of samples a segment must be found in to be considered part of the “core” or “dispensable” genome, respectively.

  2. Input mode: The user must specify whether the thresholds are absolute numbers or percentages.

  • With --input-as-percentage, the user can specify --shared-min 90% and --specific-max 10%. Segments present in 90% or more of the samples will be considered “core,” and those in 10% or less will be considered “dispensable.”

  • With --input-as-number, the user specifies the exact number of samples.

  1. Example: For the command gratools core_dispensable_ratio --input-as-number --specific-max 2 --shared-min 4, the minimum number of samples for a segment to be “core” is 4, and the maximum number of samples for it to be “dispensable” is 2.

  1. Length filter: The --filter-len (-fl) option applies a filter to exclude segments shorter than the specified value from the “filtered” analysis.