I want to get mapping statistic from a mapped bam file and want to use CollectRnaSeqMetrics.
I have a gtf file which is downloaded from http://genome.ucsc.edu/cgi-bin/hgTables. I use this as refFlat.
head
of gtf:
1 hg19_refFlat exon 11874 12227 0.000000 + . gene_id "DDX11L1"; transcript_id "DDX11L1";
1 hg19_refFlat exon 12613 12721 0.000000 + . gene_id "DDX11L1"; transcript_id "DDX11L1";
1 hg19_refFlat exon 13221 14409 0.000000 + . gene_id "DDX11L1"; transcript_id "DDX11L1";
I think the problem is this gtf file. Does anyone know how to get the correct refFlat file of human?
The error from CollectRnaSeqMetrics is:
Exception in thread "main" net.sf.picard.annotation.AnnotationException: Wrong number of fields in refFlat file /home/JCheng/UCSC_RefSeqGenes_GRCh37_hg19_withoutChr.gtf at line 1
at net.sf.picard.annotation.RefFlatReader.load(RefFlatReader.java:80)
at net.sf.picard.annotation.RefFlatReader.load(RefFlatReader.java:66)
at net.sf.picard.annotation.GeneAnnotationReader.loadRefFlat(GeneAnnotationReader.java:37)
at net.sf.picard.analysis.CollectRnaSeqMetrics.setup(CollectRnaSeqMetrics.java:96)
at net.sf.picard.analysis.SinglePassSamProgram.makeItSo(SinglePassSamProgram.java:102)
at net.sf.picard.analysis.SinglePassSamProgram.doWork(SinglePassSamProgram.java:55)
at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:177)
at net.sf.picard.cmdline.CommandLineProgram.instanceMainWithExit(CommandLineProgram.java:119)
at net.sf.picard.analysis.CollectRnaSeqMetrics.main(CollectRnaSeqMetrics.java:88)
Hello,
I am getting same error and could anyone please help me. I used gtfToGenePred to convert my gtf file to refFlat file. and my refFlat looks like
and when I try to run
I end up getting
Could anyone help me figure out what wrong with Refflat file?
You are missing the gene name at the first column. Check solinvicta comment to fix this. Basically you have to run
gtfToGenePred -genePredExt
and then move the gene name (column 12) to the first column.