SubGraph

class gratools.SubGraph(bam_path, bed_path, logger=<factory>, sample_name=None, sample_name_query=None, chromosome_query=None, start_query=None, stop_query=None, offset_first=0, offset_last=0, add_start_bases_first_segment=0, intersect_bed=None, segment_id_set=<factory>, segment_id_first_query=None, segment_id_first_strand=None, segment_id_last_query=None, segment_id_last_strand=None, segment_id_first=None, segment_id_last=None, works_path=None, merge=None, build_fasta_flag=False, gfa_walk_list=<factory>, gfa_link_list=<factory>, gfa_segment_list=<factory>, dict_segments_samples=<factory>, dict_segments_sequence=<factory>, sequences_list=<factory>, progress_dict=None, task_id=None, regions=None, intersected_results_by_regions=None)[source]

Bases: object

A class representing a subgraph in a genomic analysis pipeline. It handles operations like BED file intersection, GFA walk/link/segment building, and FASTA sequence generation for a specific sample or region.

Attributes

bam_pathPath: The file path to the BAM file.
bed_pathPath: The file path to the BED file.
loggerlogging.Logger: The logger instance for logging messages. Defaults to a logger named “GraTools”.
sample_nameOptional[str]: The name of the sample, derived from bed_path. Default to None.
sample_name_queryOptional[str]: The name of the sample being queried. Default to None.
chromosome_queryOptional[str]: The chromosome name for the query. Default to None.
start_queryOptional[int]: The start position on the chromosome for the query. Default to None.
stop_queryOptional[int]: The stop position on the chromosome for the query. Default to None.
offset_firstint: The offset for the first segment in the query region. Default to 0.
offset_lastint: The offset for the last segment in the query region. Default to 0.
add_start_bases_first_segmentint: Additional start bases for the first segment (usage seems specific, consider clarifying). Default to 0.
intersect_bedOptional[BedTool]: The BedTool object for intersected BED regions. Default to None.
segment_id_setSet[str]: A set storing unique segment IDs encountered during walk building.
segment_id_first_queryOptional[str]: The ID of the first segment in the query region. Default to None.
segment_id_first_strandOptional[str]: The strand (‘+’ or ‘-’) of the first segment ID in the query. Default to None.
segment_id_last_queryOptional[str]: The ID of the last segment ID in the query region. Default to None.
segment_id_last_strandOptional[str]: The strand (‘+’ or ‘-’) of the last segment ID in the query. Default to None.
segment_id_firstOptional[str]: The first segment ID encountered for the current sample (might be same as query). Default to None.
segment_id_lastOptional[str]: The last segment ID encountered for the current sample (might be same as query). Default to None.
works_pathOptional[Path]: The working directory path. Defaults to None.
mergeOptional[int]: Merge distance parameter for BED operations (e.g., bedtools merge -d). Default to None.
build_fasta_flagOptional[bool]: Flag indicating whether FASTA sequences should be built. Default to False.
gfa_walk_listList[str]: A list of GFA walk strings (W lines). Default to an empty list.
gfa_link_listList[str]: A list of GFA link strings (L lines). Default to an empty list.
gfa_segment_listList[str]: A list of GFA segment strings (S lines). Default to an empty list.
dict_segments_samplesdefaultdict[str, List[str]]: A dictionary mapping segment IDs to a list of sample identifiers. Default to an empty defaultdict.
dict_segments_sequencedefaultdict[str, str]: A dictionary mapping segment IDs to their sequences. Default to an empty defaultdict.
sequences_listList[SeqRecord]: A list of Biopython SeqRecord objects for generated FASTA sequences. Default to an empty list.
progress_dictOptional[Dict]: A dictionary to track progress, typically for multi-processing. Default to None.
task_idOptional[TaskID]: The task ID for progress tracking with rich.progress. Default to None.
regionsOptional[List[Dict[str, Any]]]: A list of regions (dictionaries with ‘chromosome’, ‘start’, ‘stop’). Default to None.
intersected_results_by_regionsOptional[BedTool]: A BedTool object containing combined intersected results for all regions. Default to None.

Attributes Summary

`add_start_bases_first_segment`
`build_fasta_flag`
`chromosome_query`
`intersect_bed`
`intersected_results_by_regions`
`merge`
`offset_first`
`offset_last`
`progress_dict`
`regions`
`sample_name`
`sample_name_query`
`segment_id_first`
`segment_id_first_query`
`segment_id_first_strand`
`segment_id_last`
`segment_id_last_query`
`segment_id_last_strand`
`start_query`
`stop_query`
`task_id`
`works_path`

Methods Summary

`build_fasta`()	Build FASTA sequences from the GFA walks (self.gfa_walk_list).
`build_segments`()	Build GFA segments (S lines) from the BAM file for segments in self.segment_id_set.
`build_walks`()	Build GFA walks (W lines) and links (L lines) from the intersected BED regions.
`compute_intersection`()	Compute the intersection of BED regions for each region in self.regions.
`filter_bed_with_awk`()	Filter a BED file using an awk command line to extract only lines containing an ID of interest.
`get_chr_pos`(progress_dict, task_id)	Identify chromosomal regions corresponding to segments in self.segment_id_set, then compute intersections, and finally build walks and segments for these regions.

Attributes Documentation

add_start_bases_first_segment: int = 0

build_fasta_flag: bool | None = False

chromosome_query: str | None = None

intersect_bed: BedTool | None = None

intersected_results_by_regions: BedTool | None = None

merge: int | None = None

offset_first: int = 0

offset_last: int = 0

progress_dict: Dict | None = None

regions: List[Dict[str, Any]] | None = None

sample_name: str | None = None

sample_name_query: str | None = None

segment_id_first: str | None = None

segment_id_first_query: str | None = None

segment_id_first_strand: str | None = None

segment_id_last: str | None = None

segment_id_last_query: str | None = None

segment_id_last_strand: str | None = None

start_query: int | None = None

stop_query: int | None = None

task_id: TaskID | None = None

works_path: Path | None = None

Methods Documentation

build_fasta()[source]

Build FASTA sequences from the GFA walks (self.gfa_walk_list).

This method recovers FASTA sequences by processing each walk, extracting the constitutive segments, and applying specific offsets if the current sample is the query sample (self.sample_name == self.sample_name_query). The offsets trim segment sequences at the beginning or end of the walk to match the precise query region boundaries (self.start_query, self.stop_query). The resulting sequences are stored as SeqRecord objects in self.sequences_list.

Return type:: None

build_segments()[source]

Build GFA segments (S lines) from the BAM file for segments in self.segment_id_set.

Return type:: None

build_walks()[source]

Build GFA walks (W lines) and links (L lines) from the intersected BED regions.

Return type:: None

compute_intersection()[source]

Compute the intersection of BED regions for each region in self.regions. Store the results in self.intersected_results_by_regions.

Return type:: None

filter_bed_with_awk()[source]

Filter a BED file using an awk command line to extract only lines containing an ID of interest.

Return type:: BedTool

get_chr_pos(progress_dict, task_id)[source]

Identify chromosomal regions corresponding to segments in self.segment_id_set, then compute intersections, and finally build walks and segments for these regions.

Parameters

progress_dictOptional[Dict]: A dictionary to track progress for multiprocessing.
task_idOptional[TaskID]: The task ID for rich.progress tracking.

Parameters:

progress_dict (Dict | None)
task_id (TaskID | None)

Return type:

None

Parameters:

bam_path (Path)
bed_path (Path)
logger (Logger)
sample_name (str | None)
sample_name_query (str | None)
chromosome_query (str | None)
start_query (int | None)
stop_query (int | None)
offset_first (int)
offset_last (int)
add_start_bases_first_segment (int)
intersect_bed (BedTool | None)
segment_id_set (Set[str])
segment_id_first_query (str | None)
segment_id_first_strand (str | None)
segment_id_last_query (str | None)
segment_id_last_strand (str | None)
segment_id_first (str | None)
segment_id_last (str | None)
works_path (Path | None)
merge (int | None)
build_fasta_flag (bool | None)
gfa_walk_list (List[str])
gfa_link_list (List[str])
gfa_segment_list (List[str])
dict_segments_samples (defaultdict)
dict_segments_sequence (defaultdict)
sequences_list (List[SeqRecord])
progress_dict (Dict | None)
task_id (TaskID | None)
regions (List[Dict[str, Any]] | None)
intersected_results_by_regions (BedTool | None)