I noticed that Salmon didn't count XIST in my RNA-seq data, and tracked it down to the transcriptome fasta from Ensembl which was used to build the index. I'm not sure if I'm being stupid or something is wrong, but I can't find XIST (ENSG00000229807) in the Homo_sapiens.GRCh38.cdna.all.fa.gz transcriptome file, while it is present in the gtf Homo_sapiens.GRCh38.89.gtf.gz
ftp to the Homo_sapiens.GRCh38.cdna.all.fa.gz
ftp to the Homo_sapiens.GRCh38.89.gtf.gz
Does anyone have an idea what I'm missing or what's wrong?
Thanks!
Thanks! That seems to be it... HOTAIR is also not present. Not what I expected!
But XIST is present in Homo_sapiens.GRCh38.ncrna.fa.gz. I wonder if it would be appropriate to just concatenate the ncrna and cdna fasta to get a more complete reference.
For me, cDNA was "transcript-coding DNA" and not necessarily "protein-coding DNA".
Hypothesis confirmed...