GenomeUtils.Genome
Top-level container that manages chromosomes, genes, transcripts, and exons with indexing utilities. |
|
Locus-bound gene that owns transcripts and provides access to the genomic sequence. |
|
Transcript with ordered exons, canonical sequence, and helpers for coordinate conversion. |
|
Exon segment attached to a transcript and able to derive its nucleotide sequence. |
|
Lazy-loaded chromosome wrapper that exposes sequence slices via loci and tracks genes. |
|
Immutable utility for 1-based inclusive genomic coordinates with overlap/containment helpers. |
|
Abstract base class that unifies shared behavior for loci-based genome entities. |
|
Fluent builder that assembles a |
Classes
- class GenomeUtils.Genome.Chromosome(id, seq_index, genome=None, length=None, **kwargs)[source]
Bases:
GenomeElementRepresents a chromosome, with sequence data loaded from file on demand.
- property genes: List['Gene']
- get_subsequence_by_locus(locus)[source]
Returns a subsequence of the chromosome for a given Locus.
- Parameters:
locus (Locus)
- Return type:
Bio.Seq.Seq
- property sequence: Bio.Seq.Seq
- class GenomeUtils.Genome.Exon(id, chr, start, end, strand, gene=None, transcripts=None, genome=None, sequence=None, **kwargs)[source]
Bases:
GenomeElementRepresents an exon.
- Parameters:
- add_to_transcript(transcript)[source]
Add the exon to the transcript.
- Parameters:
transcript (Transcript)
- get_transcripts()[source]
Returns the Transcript object that the exon belongs to.
- Return type:
List[‘Transcript’]
- property sequence: Bio.Seq.Seq
- class GenomeUtils.Genome.Gene(id, name, chr, start, end, strand, chromosome=None, genome=None, **kwargs)[source]
Bases:
GenomeElementRepresents a gene.
- Parameters:
- add_transcript(transcript)[source]
Add a transcript to the gene.
- Parameters:
transcript (Transcript)
- property sequence: Bio.Seq.Seq
- property transcripts: List['Transcript']
- class GenomeUtils.Genome.Genome(id, species, name, **kwargs)[source]
Bases:
objectRepresents a Genome object, includes a collection of chromosomes, genes, transcripts, and exons.
- add_chromosome(chromosome)[source]
Add a chromosome to the genome.
- Parameters:
chromosome (Chromosome)
- chromosome_by_id(chromosome_id)[source]
Get a chromosome by its ID using the index. Raises ValueError if not found.
- Parameters:
chromosome_id (str)
- Return type:
- property chromosomes: List[Chromosome]
Get all chromosomes in the genome.
- get_sequence_by_locus(locus)[source]
Get a sequence by its locus.
- Parameters:
locus (Locus)
- Return type:
Bio.Seq.Seq
- index()[source]
Creates an index of all genes, transcripts, and exons for fast lookup. This method MUST be called after all genomic features have been added.
- transcript_by_id(transcript_id)[source]
Get a transcript by its ID using the index. Raises ValueError if not found.
- Parameters:
transcript_id (str)
- Return type:
- property transcripts: List[Transcript]
Get all transcripts in the genome.
- class GenomeUtils.Genome.GenomeBuilder(id, species, name, main_chromosomes=None, separate_scaffolds=True, **kwargs)[source]
Bases:
objectConstructs a Genome object from various file formats.
This builder simplifies the process of assembling a complete Genome object by handling the parsing and integration of DNA sequences, cDNA sequences, and gene annotations from standard bioinformatics files.
The correct order of operations is:
with_dna_fasta()
with_cdna_fasta()
with_gtf_file()
build()
Example:
builder = GenomeBuilder(id="hg38", species="homo_sapiens", name="Human Reference Genome") genome = ( builder.with_dna_fasta(Path("path/to/dna.fa")) .with_cdna_fasta(Path("path/to/cdna.fa")) .with_gtf_file(Path("path/to/annotations.gtf")) .build() )
- Parameters:
- set_chromosome_filter(chromosomes)[source]
Set a filter to only include specified chromosomes.
- Parameters:
- Return type:
- with_cdna_fasta(cdna_fasta_path)[source]
Loads transcript sequences from a cDNA FASTA file.
- Parameters:
cdna_fasta_path (Path)
- Return type:
- with_dna_fasta(dna_fasta_path)[source]
Loads chromosome sequences from a genomic DNA FASTA file. This must be the first step in the build process.
- Parameters:
dna_fasta_path (Path)
- Return type:
- class GenomeUtils.Genome.GenomeElement(id, locus, parent=None, genome=None, **kwargs)[source]
Bases:
ABCAbstract base class for genomic elements (e.g. chromosomes, genes, transcripts, exons, etc.).
- Parameters:
id (str)
locus (Locus)
parent (Optional[GenomeElement])
genome (Genome)
- property parent: GenomeElement
Returns the parent of the genome element.
- abstract property sequence: Bio.Seq.Seq
- class GenomeUtils.Genome.Locus(chr, start, end, strand='+')[source]
Bases:
objectRepresents a 1-based inclusive genomic coordinates on a chromosome.
- class GenomeUtils.Genome.Transcript(id, chr, start, end, strand, sequence, gene=None, genome=None, **kwargs)[source]
Bases:
GenomeElementRepresents a transcript.
- Parameters:
- property exons: List['Exon']
- property sequence: Bio.Seq.Seq
- transcript_to_genomic_pos(start, end=None)[source]
Converts a 0-based, half-open transcript coordinate (or range) to a 1-based, inclusive genomic coordinate (or list of Locus objects).
- Parameters:
- Returns:
A Locus object for a single point or for a range within a single exon.
A list of Locus objects if the range spans multiple exons.
None if a single point maps to no location; an empty list for a range.
- Return type: