GenomeUtils.Downloaders

Class Descriptions

Downloader

Abstract base for downloaders with caching, logging, and cleanup helpers for fetched files.

EnsemblGenomeDownloader

Ensembl-specific implementation that locates genome resources with gget and downloads DNA, cDNA, and GTF assets.

Classes

class GenomeUtils.Downloaders.Downloader(download_dir=None)[source]

Bases: ABC

Abstract base class for all downloaders.

Parameters:

download_dir (Path | None)

cleanup()[source]

Clean up created files.

download_file(url, filename=None, force=False)[source]

Download a single file from a URL and saves it in the cache directory.

Parameters:
  • url (str) – The URL of the file to download.

  • filename (str) – The name of the file to be saved in the cache directory.

  • force (bool) – If True, redownload the file even if it exists. Defaults to False.

Returns:

The path to the downloaded file.

Return type:

Path

class GenomeUtils.Downloaders.EnsemblGenomeDownloader(assembly_id, ensembl_release, species, genomes_root_dir=PosixPath('data/genomes'))[source]

Bases: Downloader

Downloads genome data from Ensembl.

This downloader fetches the download URLs for genomic data using gget, downloads the files, and stores them in genomes_root_dir/ensembl/{assembly_id}/{ensembl_release}.

Parameters:
  • assembly_id (str)

  • ensembl_release (int)

  • species (str)

  • genomes_root_dir (Path | str)

download()[source]

Downloads all necessary genome files using gget to retrieve the URLs.

Returns:

A dictionary mapping a file type to the local Path. Keys are dna, cdna, and annotation.

Return type:

dict[str, Path]