I posted this question a few days ago on Seqanswers but wasn't able to get an answer that solved my problem.
Here is my use case:
I aligned RNA-Seq data to a genome using tophat2. Now I have the bam file and I am looking for a straight-forward way to produce mapping statistics - how many reads map onto exons, introns and intergenic regions.
So my input consists of a bam file and the gtf file which I used with tophat2. What's the easiest way to get from this input to the output I need?
The suggestions I got so far involved tools that were not taking GTF file as input (CollectRnaSeqMetrics from Piccard toolset) and were not computing the statistics on the intergenic intervals (RSeQC). CollectRnaSeqMetrics seems a good candidate but so far I could not find information/manual on converting GTF to RefFlat format.
So I would welcome suggestions that specify the processing steps that start with a BAM file and a GTF file and lead to statistics on the number of reads mapping to exons, introns and intergenic intervals.
I just replied on your other SEQanswers thread on how to use gtfToGenePred. That'll be the simplest solution.
Thanks, Devon - I just replied to your post. It seems I am not able to get gtfToGenePred running.
I was able to get gtfToGenePred running. It took the Ensembl GTF as input and produced a refFlat file which I tried to use with CollectRnaSeqMetrics. However, the latter produced an error message:
Apparently, the refFlat file produced is not valid. Have you, actually, used gtfToGenePred successfully?
Apparently
gtfToGenePred
leaves out the first (or second) column. You can fix that with:Yay! This seems to have fixed it. Thanks!
I have the same problem
The fastq files are paired end, aligned using STAR and GRCh38 index.
I need the
REF_FLAT
file compatible for GRCh38. I know there is an inconsistency with the chromosome naming, between the ensembl build and picard tools.Please don't provide an answer unless you actually know how to solve this problem. I have read >100 answers to this problem and most are not useful at all.
I moved this to a comment because it's not an answer to this thread
Okay, won't try to help then. Good luck.