Question

Length of transcripts of mm9

0

Entering edit mode

8.5 years ago

qwzhang0601 ▴ 80

I have got raw count of mouse RNA reads. In order to calculate the FPKM, I obtained the length of transcripts (CDS + UTR) of the genes through BioMart. And then for each gene I selected the longest one. However, later I found the alignment was mapped to an old version of genome (mm9). But the information I got from BioMart is on a new version of genome (i.e., mm10). And I found from the the web site of Biomart, I can only achieve CDS length (but not transcript length) for a old ensemble corresponding to mm9. So I want to know (1) is there a relatively easy way to get the length of transcripts for mm9, given the gene ensemble ID?
(2) If there is no easy ways, is it good enough to use the transcript length from mm10 to calculated the FPKM? I think the gene length from different version of genome should be the same or quite similar, right?

Thanks

RNA-Seq Biomart FPKM • 6.1k views

ADD COMMENT • link updated 8.5 years ago by EagleEye 7.6k • written 8.5 years ago by qwzhang0601 ▴ 80

0

Entering edit mode

You can find mm9 genome in Ensembl release 67 here. BioMart from there should get you the information you need.

Was your data mapped to mm9 genome or transcriptome. Either would have different lengths and can't be used interchangeably.

ADD REPLY • link 8.5 years ago by GenoMax 148k

0

Entering edit mode

Thank you. Yes, I can find the Ensemble release 67, but for this version of genome I can only get the Length of CDS not the transcripts.

ADD REPLY • link 8.5 years ago by qwzhang0601 ▴ 80

0

Entering edit mode

Are you sure about that? I am seeing the usual options under "Attributes"/"Sequence".

ADD REPLY • link 8.5 years ago by GenoMax 148k

0

Entering edit mode

Do you mean "Attributes"->"sequence"-> "cDNA sequences". And then calculate the length of the transcripts? I think it works. In the latest version of genome, I can directly get such information by choosing "Attributes"-> "Features"->"Transcript length (including UTRs and CDS)". But Thanks for your suggestions.

ADD REPLY • link 8.5 years ago by qwzhang0601 ▴ 80

0

Entering edit mode

the raw count of reads were obtained by HTSeq. I think it must use the transcript annotation for the alignment. Thanks

ADD REPLY • link 8.5 years ago by qwzhang0601 ▴ 80

0

Entering edit mode

Not many use FPKM any more. Are you looking to do DE analysis? If so you can use the raw counts you have with a package like DESeq2.

ADD REPLY • link 8.5 years ago by GenoMax 148k

0

Entering edit mode

Yes. I will use DESeq2 to do find DE genes. However, I want to generate a heat map showing expression levels of genes under different conditions. I think the FPKM show relative expression level of genes. Besides, before I do the DE analysis I want to filter out unexpressed genes (i.e., FPKM<1) under both conditions. Thanks for your help.

ADD REPLY • link 8.5 years ago by qwzhang0601 ▴ 80

0

Entering edit mode

Please do not send identical questions to BioStars and Ensembl helpdesk (twice).

ADD REPLY • link 8.5 years ago by Emily 24k

0

Entering edit mode

I see. Sorry for that!

ADD REPLY • link 8.5 years ago by qwzhang0601 ▴ 80

GenoMax · Answer 1 · 2016-06-14

0

Entering edit mode

8.5 years ago

ATRX ★ 1.1k

You may use this code; however you need GTF mm10 as an input file.

## Transcript Length
library(GenomicRanges)
library(rtracklayer)

## reading your mm10 gtf file and read only exons
GTFfile <- "mm10,gtf" ## your mm10 gtf file
GTF <- import.gff(GTFfile, format="gtf", genome="mm10", asRangedData=F, feature.type="exon") 
grl <- reduce(split(GTF, elementMetadata(GTF)$gene_id))
reducedGTF <- unlist(grl, use.names=T)
elementMetadata(reducedGTF)$gene_id <- rep(names(grl), elementLengths(grl))
elementMetadata(reducedGTF)$widths <- width(reducedGTF)

## calculation of length
calc_length <- function(x) {
  sum(elementMetadata(x)$widths)
}
output <- t(sapply(split(reducedGTF, elementMetadata(reducedGTF)$gene_id), calc_length))
output <- data.frame(t(output))
output$id = rownames(output)
output$id = gsub(pattern = "(.*)[.](.*)", replacement = "\\1", x = output$id)
output = output[,c(2,1)]
colnames(output)[2] = "Transcript.Length"
head(output)

ADD COMMENT • link 8.5 years ago by ATRX ★ 1.1k

0

Entering edit mode

Thank you. But would you please briefly explain the code? Does this code return length of each transcript of a gene? Or it returns a length for each gene (the longest or the union of transcripts)?

Thanks

ADD REPLY • link 8.5 years ago by qwzhang0601 ▴ 80

0

Entering edit mode

This will give you the length of the transcript using GTF/GFF file. Here is what the code does:

import.gff reads the file using the exon regions and creates IRange object
reduce is used for combining all the exons based on a particular gene id
calculate the length of the entire exon region using the width that is obtained using reduce function

Try running the code and check the results and you will see what I am doing.

ADD REPLY • link 8.5 years ago by ATRX ★ 1.1k

0

Entering edit mode

I downloaded the mmg.gtf from "ftp://ftp.ensembl.org/pub/release-67/gtf/mus_musculus/". But when I run the following code, I got an error. And there are warnings? Do you have any suggestions? Thanks.

GTFfile <- "mm9.gtf" 

GTF <- import.gff(GTFfile, format="gtf", genome="mm9", asRangedData=F, feature.type="exon")

Error in validObject(.Object) : 
  invalid class “GRanges” object: 'seqnames' contains missing values
In addition: There were 16 warnings (use warnings() to see them)
> warnings() 
Warning messages:
1: 'BiocGenerics:::updateS4' is deprecated.
Use 'replaceSlots' instead.
See help("Deprecated")
2: 'BiocGenerics:::updateS4' is deprecated.
3: ......

ADD REPLY • link updated 8.5 years ago by GenoMax 148k • written 8.5 years ago by qwzhang0601 ▴ 80

score 0 · Answer 2 · 2016-06-14

0

Entering edit mode

8.5 years ago

EagleEye 7.6k

This might be helpful

A: Converting gtf format to bed format (gene & transcript level)

A: How to normalise read count per gene

https://github.com/santhilalsubhash/rpkm_rnaseq_count

ADD COMMENT • link 8.5 years ago by EagleEye 7.6k