No XIST in Emsembl cdna.fa
1
0
Entering edit mode
7.3 years ago

I noticed that Salmon didn't count XIST in my RNA-seq data, and tracked it down to the transcriptome fasta from Ensembl which was used to build the index. I'm not sure if I'm being stupid or something is wrong, but I can't find XIST (ENSG00000229807) in the Homo_sapiens.GRCh38.cdna.all.fa.gz transcriptome file, while it is present in the gtf Homo_sapiens.GRCh38.89.gtf.gz

ftp to the Homo_sapiens.GRCh38.cdna.all.fa.gz
ftp to the Homo_sapiens.GRCh38.89.gtf.gz

Does anyone have an idea what I'm missing or what's wrong?

Thanks!

ensembl • 2.3k views
ADD COMMENT
3
Entering edit mode
7.3 years ago
Satyajeet Khare ★ 1.6k

This page says cDNA fasta file contains "cDNA sequences for protein-coding genes". Whereas, this page says its a file containing "cDNA sequences for Ensembl or ab initio predicted genes". Can you check if all non-coding RNAs are absent or only XIST?

ADD COMMENT
1
Entering edit mode

Thanks! That seems to be it... HOTAIR is also not present. Not what I expected!
But XIST is present in Homo_sapiens.GRCh38.ncrna.fa.gz. I wonder if it would be appropriate to just concatenate the ncrna and cdna fasta to get a more complete reference.

For me, cDNA was "transcript-coding DNA" and not necessarily "protein-coding DNA".

I'm not sure if I'm being stupid

Hypothesis confirmed...

ADD REPLY

Login before adding your answer.

Traffic: 1581 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6