I have got raw count of mouse RNA reads. In order to calculate the FPKM, I obtained the length of transcripts (CDS + UTR) of the genes through BioMart. And then for each gene I selected the longest one. However, later I found the alignment was mapped to an old version of genome (mm9). But the information I got from BioMart is on a new version of genome (i.e., mm10). And I found from the the web site of Biomart, I can only achieve CDS length (but not transcript length) for a old ensemble corresponding to mm9.
So I want to know
(1) is there a relatively easy way to get the length of transcripts for mm9, given the gene ensemble ID?
(2) If there is no easy ways, is it good enough to use the transcript length from mm10 to calculated the FPKM? I think the gene length from different version of genome should be the same or quite similar, right?
Thanks
You can find mm9 genome in Ensembl release 67 here. BioMart from there should get you the information you need.
Was your data mapped to mm9 genome or transcriptome. Either would have different lengths and can't be used interchangeably.
Thank you. Yes, I can find the Ensemble release 67, but for this version of genome I can only get the Length of CDS not the transcripts.
Are you sure about that? I am seeing the usual options under "Attributes"/"Sequence".
Do you mean "Attributes"->"sequence"-> "cDNA sequences". And then calculate the length of the transcripts? I think it works. In the latest version of genome, I can directly get such information by choosing "Attributes"-> "Features"->"Transcript length (including UTRs and CDS)". But Thanks for your suggestions.
the raw count of reads were obtained by HTSeq. I think it must use the transcript annotation for the alignment. Thanks
Not many use FPKM any more. Are you looking to do DE analysis? If so you can use the raw counts you have with a package like DESeq2.
Yes. I will use DESeq2 to do find DE genes. However, I want to generate a heat map showing expression levels of genes under different conditions. I think the FPKM show relative expression level of genes. Besides, before I do the DE analysis I want to filter out unexpressed genes (i.e., FPKM<1) under both conditions. Thanks for your help.
Please do not send identical questions to BioStars and Ensembl helpdesk (twice).
I see. Sorry for that!