duplicate/invalid 'transcript' feature in cuffmerge
0
0
Entering edit mode
6.7 years ago
bioinfo_ga ▴ 70

Hi !! I am doing alignment against banana genome using hisat downloaded from ( http://banana-genome-hub.southgreen.fr/). Further i used cufflinks (2.2.1)for expression estimation which runs fine but in cuffmerge step it gives the following error "duplicate/invalid 'transcript' feature ID=Ma03_t01040.3". I also converted gff to gtf but same error persists and remove this ID from gff results in the same error with another ID. Kindly give your inputs.

RNA-Seq • 2.5k views
ADD COMMENT
0
Entering edit mode

Would you please run a grep "Ma03_t01040.3" on your gff file. The error is "duplicate/invalid 'transcript'", you need to investigate that first. Also there are some thread on this subject https://biostar.usegalaxy.org/p/17359/ https://github.com/cole-trapnell-lab/cufflinks/issues/77

ADD REPLY
0
Entering edit mode

grep "Ma03_t01040.3" gives the following result

> chr03 manual_curation exon    836456  836913  .   -   .   transcript_id
    > "Ma03_t01040.3";
    > chr03 manual_curation exon    837103  837214  .   -   .   transcript_id
    > "Ma03_t01040.3";
    > chr03 manual_curation exon    837626  837723  .   -   .   transcript_id
    > "Ma03_t01040.3";
    > chr03 manual_curation exon    837832  837939  .   -   .   transcript_id
    > "Ma03_t01040.3";
    > chr03 manual_curation exon    838029  838067  .   -   .   transcript_id
    > "Ma03_t01040.3";
    > chr03 manual_curation exon    838163  838234  .   -   .   transcript_id
    > "Ma03_t01040.3";
    > chr03 manual_curation exon    838316  838579  .   -   .   transcript_id
    > "Ma03_t01040.3";
    > chr03 manual_curation exon    839379  839646  .   -   .   transcript_id
    > "Ma03_t01040.3";
    > chr03 manual_curation CDS 836665  836913  .   -   0   transcript_id
    > "Ma03_t01040.3";
    > chr03 manual_curation CDS 837103  837214  .   -   1   transcript_id
    > "Ma03_t01040.3";
    > chr03 manual_curation CDS 837626  837723  .   -   0   transcript_id
    > "Ma03_t01040.3";
    > chr03 manual_curation CDS 837832  837939  .   -   0   transcript_id
    > "Ma03_t01040.3";
    > chr03 manual_curation CDS 838029  838067  .   -   0   transcript_id
    > "Ma03_t01040.3";
    > chr03 manual_curation CDS 838163  838234  .   -   0   transcript_id
    > "Ma03_t01040.3";
    > chr03 manual_curation CDS 838316  838579  .   -   0   transcript_id
    > "Ma03_t01040.3";
    > chr03 manual_curation CDS 839379  839570  .   -   0   transcript_id
    > "Ma03_t01040.3";

REmoving these give same error for some other ID

ADD REPLY
0
Entering edit mode

This indent is hard to read, why "Ma03_t01040.3" is on a new line ? Do you have a link to your gff, maybe this one ( http://banana-genome-hub.southgreen.fr/sites/banana-genome-hub.southgreen.fr/files/data/gff3/version2/musa_acuminata_v2.gff3 ) ?

ADD REPLY
0
Entering edit mode

Yes the same gff was used for analysis

ADD REPLY
0
Entering edit mode

Yes, because all your transcript names are duplicate not only this one

ADD REPLY
0
Entering edit mode

For all the features of a given transcript we have the same name.

ADD REPLY
0
Entering edit mode

As I can read out there, everyone pick up a gff or gtf coming from ensembl and it works well.

So, let's try with the ensembl plants database :

ftp://ftp.ensemblgenomes.org/pub/plants/release-38/gff3/musa_acuminata/Musa_acuminata.MA1.38.gff3.gz

This should works, because in your gff file I think lines with the 9th column starting with "Parent=" annoyed cuffmerge. Now in the gff from ensembl these lines are removed

ADD REPLY

Login before adding your answer.

Traffic: 2441 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6