GATK GRCh38 genome version for biomaRt
1
0
Entering edit mode
5 months ago

Hello, I am using iGenomes GATK.GRCh38 with sarek. Now I need to use the R package biomaRt to fetch the sequence and other information. What ensembl version does it correspond to? For example, for ncbi37 I would do:

 genome = useMart(biomart="ENSEMBL_MART_ENSEMBL",
                               host="grch37.ensembl.org",
                               path="/biomart/martservice",
                               dataset="hsapiens_gene_ensembl")

and for TCGA I do:

  grch38 = useMart(biomart="ENSEMBL_MART_ENSEMBL",
                   host="https://nov2020.archive.ensembl.org",
                   dataset="hsapiens_gene_ensembl")

In my case now it would then be for the GATK bundle genome version for GRCh38. Would it be as simple as:

  grch38 = useMart(biomart="ENSEMBL_MART_ENSEMBL",
                   dataset="hsapiens_gene_ensembl")

Thank you in advance.

biomart • 292 views
ADD COMMENT
1
Entering edit mode
5 months ago
GenoMax 147k

Current Ensembl release is GRCh38. See the genome versions and the relationship to Ensembl releases in this table: https://www.ensembl.org/info/website/archives/assembly.html

Difference between ENSEMBL releases is a past thread that offers details.

That said, if you need to get sequences from GATK's version of GRCh38 then you should use the sequence they provide in their resource bundle here: https://console.cloud.google.com/storage/browser/genomics-public-data/resources/broad/hg38/v0/ You can get sequences you need using samtools faidx (or similar tools) with the included indexed fasta file.

ADD COMMENT

Login before adding your answer.

Traffic: 2722 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6