Gene length from a GFF file to count RPKM for edgeR-RNA-seq analysis
1
0
Entering edit mode
7.6 years ago
pixie@bioinfo ★ 1.5k

Hello, I am new to NGS analysis. I am working on rice and interested in Gene level analysis. I have counted the reads using HTseq and need to estimate the gene lengths. I have a locus.gff file with the format: chr01 irgsp1_locus gene 2983 10815 . + . ID=Os01g0100100;Name=Os01g0100100;Note=RabGAP/TBC domain containing protein. (Os01t0100100-01);Transcript variants=Os01t0100100-01 chr01 irgsp1_locus gene 11218 12435 . + . ID=Os01g0100200;Name=Os01g0100200;Note=Conserved hypothetical protein. (Os01t0100200-01);Transcript variants=Os01t0100200-01 chr01 irgsp1_locus gene 11372 12284 . - . ID=Os01g0100300;Name=Os01g0100300;Note=Cytochrome P450 domain containing protein. (Os01t0100300-00);Transcript variants=Os01t0100300-00

I also have the usual transcript.gff file. How can I extract the gene length from this file? Kindly suggest.

RNA-Seq • 3.8k views
ADD COMMENT
1
Entering edit mode
7.6 years ago

A gene level analysis typically considers transcripts formed by the concatenation of all exons of a gene (even if such transcript did not exist in reality). For the RPKM calculation the length of this transcript is required. This can be computed as the sum of the lengths of all exons.

The start and end coordinates for the gene are insufficient to determine the length of the transcript. If your file contains exons matched to genes then you can sum them up with a simple custom program (there are probably tools that can do this for you though nothing comes to mind at this point).

My recommendation is to use featureCounts tool that will compute this information for you. I consider featureCounts a superior replacement to HTSeq.

http://bioinf.wehi.edu.au/featureCounts/

PS: If you are using edgeR you shouldn't need to compute RPKMs - download the normalized matrix computed by edgeR instead and operate on that.

ADD COMMENT
0
Entering edit mode

Hello Istvan, Thanks for your quick reply. I have a transcript.gff file and transcript_exon.gff file. I will try what you said.

ADD REPLY

Login before adding your answer.

Traffic: 1858 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6