Question

Mapping between gene IDs and transcripts IDs in C. elegans

0

Entering edit mode

8.2 years ago

biomagician ▴ 410

Dear Community,

I am looking to build a mapping file between the gene names of C. elegans and its transcript names. To do so, I use the Bioconductor packages biomaRt, that I freshly reinstalled. I have also freshed downloaded the latest transcriptome of C. elegans from Ensembl here: ftp://ftp.ensembl.org/pub/release-86/fasta/caenorhabditis_elegans/cdna/

Here is the code:

library(biomaRt)

Download C. elegans cDNA file from www.ensembl.org

download.file(paste0('ftp://ftp.ensembl.org/pub/release-', ensemblRelease, '/fasta/caenorhabditis_elegans/cdna/Caenorhabditis_elegans.WBcel235.cdna.all.fa.gz'), 'output/transcriptome/sequence/celegans.fa.gz') system('gunzip output/transcriptome/sequence/celegans.fa.gz')

Create a mapping file containing gene names in the first

column and the associated transcript name in the second

column. There should be only one name in each cell. Gene

names can occur more than once and be associated with more

than one associated transcript name but only one transcript

name per line.

martWorm <- biomaRt::useMart(biomart = "ENSEMBL_MART_ENSEMBL", dataset = "celegans_gene_ensembl", host = 'ensembl.org') g2t <- biomaRt::getBM(attributes = c('ensembl_gene_id', 'ensembl_transcript_id'), mart = martWorm) write.table(g2t, 'output/counts/rsem/ref/geneToTxMapping.txt', quote = FALSE, row.names = FALSE)

However, there is a problem. In my FASTA transcriptome (cDNA) file, I have the following transcript ID: F52H2.2. It is not found in my mapping table, although F52H2.2a and F52H2.2b are found. Vice-versa, F52H2.2a is not found in the FASTA file. This causes problems in my downstream analysis. Does anybody know what causes this? Is there a way maybe to download my transcriptome from within R using the biomaRt package that would make it compatible with its database?

Thank you.

mapping biomart ensembl gene id transcript id • 2.6k views

ADD COMMENT • link 8.2 years ago by biomagician ▴ 410