Cuffdiff terminated after GffObj::getSpliced() error
1
0
Entering edit mode
9.0 years ago
nalandaatmi ▴ 110

Dear All,

I am currently analyzing mouse RNASeq samples. I used Tophat2.1.1 version, cufflinks2.2.1 version for my analysis and I downloaded the latest version of mm10 from Igenomes (genome and GTF). I am facing an issue at following steps

Cuffmerge step:

I noticed following error messages at cuffmerge log file. But the merged file (merged.gtf) is generated in the output directory.

Error (GFaSeqGet): end coordinate (117274415) cannot be larger than sequence length 115169878
Error (GFaSeqGet): end coordinate (117981028) cannot be larger than sequence length 115169878
...
Error (GFaSeqGet): end coordinate (85529519) cannot be larger than sequence length 59373566

Cuffdiff step:

All the output files in cuffdiff directory are empty. Then I checked the log file from cuffdiff, I noticed following error messages.

Error (GFaSeqGet): end coordinate (117135884) cannot be larger than sequence length 115169878
.....
Error (GFaSeqGet): end coordinate (61176309) cannot be larger than sequence length 59128983
Error (GFaSeqGet): end coordinate (61228418) cannot be larger than sequence length 59128983

This contig will not be bias corrected.
Warning: couldn't find fasta record for 'chrUn_JH584304'!
This contig will not be bias corrected.
GffObj::getSpliced() error: improper genomic coordinate 3078823 on chrX for TCONS_00034613
cuffmerge cuffdiff RNASeq • 3.4k views
ADD COMMENT
1
Entering edit mode
9.0 years ago

The error message is surprisingly informative :)

Somehow the merged GTF file is invalid or at least inconsistent with what you're feeding into cuffdiff. Figure out which chromosome has a length of 115169878 (either look in a BAM header or the .fai file made by "samtools faidx") and use awk to confirm that there are entries in the merged GTF file that are beyond that end position. You might then check in the GTF files made by cufflinks to see if that occurs there as well. I should note that I suspect you used multiple fasta files, where the chromosome lengths differ between them.

ADD COMMENT
0
Entering edit mode

Dear Devon Thanks for getting back to me. I really appreciate it.

Sure I will check the file as you mentioned. I used following commands in my analysis

Tophat command:

$ tophat -p 12 -G mousegenes.gtf --library-type fr-unstranded -o tophat_out mousegenome R1.fastq R2.fastq

Cufflinks command:

$ cufflinks -p 12 -G mousegenes.gtf --library-type fr-unstranded -o cufflinks_out tophat_out/accepted_hits.bam

Cuffmerge command:

$ cuffmerge -p 12 -g mousegenes.gtf -o cuffmerge_out -s mousegenome.fa assemblylist.txt
ADD REPLY
0
Entering edit mode

Dear Devon,

When I used Ensembl mouse GTF file, I didn't encounter this error. But with the UCSC mouse GTF file, I faced the same kind of error for the different project dealing with the mouse.

Another colleague in my team did the RNAseq analysis using UCSC mouse GTF file. He generated some output files. For him, the gene_expdiff output has 1400 significant genes. But for me, when I redo the analysis using Ensembl mouse GTF file, I got 600 genes only. I couldn't check with him now. He moved to a different place. Is the difference in genes due to Ensembl GTF file?

ADD REPLY
0
Entering edit mode

Possibly, it's impossible to say without knowing exactly what was done before.

ADD REPLY

Login before adding your answer.

Traffic: 2725 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6