Entering edit mode
4.4 years ago
dioscorea.bulbifera
▴
10
I am trying to import quant.sf files output from salmon into R using tximport and a transcript to gene (tx2g) file from BUSpaRse:tr2g_ensembl(). The transcriptID's are all ensembl, and the geneID's are gene names, but the output of tximport has rownames as MGI gene names which don't seem to be present in any of the files used.
It isn't too much hassle to subsequently change the gene names, but it would be nice to understand why this is happening. I have attached images below (quant.sf, txi output, tx2g).
you mean
rownames(txi$abundance)
are different thantx2g$GENEID
? Are you sure?I do not see anything unusual, the order of tx2g is simply different than the quant.sf You should check things systematically by comparing the lists with code, not by eye. tximport though is not inventing new names or pulling data from databases, it simply takes what you give it, you're fine.
Hi,
Where did you get the
tx2g
? When you submit a list of ensembl gene ids to ensembl, ensembl returns a list of the same gene ids with the common gene names, although the order of the genes is not the same as the order of the genes submitted.So, that's why the gene ids are not matching between files, because they are not ordered. You might have queried genes without a common gene name, i.e., without annotation, and in that case ensembl will not return any thing. Therefore, be very careful because probably the size of your gene lists differ, i.e., the no. of ensembl genes ids > no. of common gene names retrieved.
I hope this helps.
António
Sorry, I think my previous comment is not related with the problem that you're facing.
As far as I know, Salmon quant.sf files quantify transcripts, and, therefore their identifier are transcript ids such as, ENSMUST00000193812.1. To provide data to tximport, you need to provide the salmon files (quant.sf) as well as
tx2gene
file parameter. This file, according to documentation:In your tx2g file (assuming that you're using this to
tx2gene
file parameter) you have ensembl gene ids to common gene names, and not ensembl transcript ids to gene ids. I believe that's why is not working as you expect.I hope this helps.
António