Running Cuffmerge error "duplicate/invalid 'transcript' "
2
0
Entering edit mode
9.3 years ago
Jingyue ▴ 70

Hi, all,

I use cuffmerge after tophat2 & cufflinks for all the transcripts.gtf files, there is an issue shows up:

[12:50:00] Loading reference annotation.
GFF Error: duplicate/invalid 'transcript' feature ID=id234068
        [FAILED]
Error: could not execute gtf_to_sam

I found this post on google: https://lists.galaxyproject.org/pipermail/galaxy-user/2013-July/006332.html, they suggested to edit file and remove duplicates, so I used grep -v "id234068" to edit all the transcripts.gtf files. But when I tried again, here shows:

Loading reference annotation.
GFF Error: duplicate/invalid 'transcript' feature ID=id350455
        [FAILED]
Error: could not execute gtf_to_sam

Seems there are more than one id have duplicates, so I tried used sort -u, but doesn't work because transcripts id is in field 9 separated by ; with gene ids; fpkm etc. And most of transcripts id have more than one duplicates.

Have you encounter such problems before? I was stuck here and need your help.

Would you guys please help me out?

Best,
Ellie

RNA-Seq software-error • 5.2k views
ADD COMMENT
1
Entering edit mode
9.2 years ago
Kanne ▴ 450

For the record, had your error ("GFF Error: duplicate/invalid 'transcript' feature ID=") and was searching all over the internet and couldn't find the answer.

For me, it turned out to be that there was nothing wrong with my reference gtf (which was iGenomes UCSSC hg19).

I am fairly sure the problem was that I was running multiple cuffmerge runs at the same time in the same working directory, which I am guessing means that they were all writing to the same temporary files and this caused problems. When I ran them in different working directories, the problem disappeared.

ADD COMMENT
0
Entering edit mode

Hi, Kanne,

Thanks a lot for your reply, yes exactly, I tried all kinds of ways, but seems like every round of cufflinks (I was using -G newest bovine genome UMD3.1.1) will generate some multiple transcripts id duplicates (they share the same transcripts id but exons all marked as "exon1", correct should be "exon1" to "exon10"), so I go back to use genome version UMD3.1, and cufflinks runs well. That just the weirdest thing ever.

Best,
Ellie

ADD REPLY
0
Entering edit mode
9.1 years ago

Dear Ellie,

I have been struggling with the same problem.

I wanted to use a pipeline that had previously been working on other samples on Galaxy main but faced the same GFF Error: duplicate/invalid 'transcript' that you describe at the Cuffmerge/Cuffcompare step.

I got around the problem by updating all the tools in the pipeline. Specifically, it appear that the problem occurs when trying to use the latest Cuffmerge version on data generated with older versions of Bowtie.

I hope that this will solve your problem too?

Best,
-Per

ADD COMMENT

Login before adding your answer.

Traffic: 1947 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6