Entering edit mode
6.7 years ago
bioinfo_ga
▴
70
Hi !! I am doing alignment against banana genome using hisat downloaded from ( http://banana-genome-hub.southgreen.fr/). Further i used cufflinks (2.2.1)for expression estimation which runs fine but in cuffmerge step it gives the following error "duplicate/invalid 'transcript' feature ID=Ma03_t01040.3". I also converted gff to gtf but same error persists and remove this ID from gff results in the same error with another ID. Kindly give your inputs.
Would you please run a grep "Ma03_t01040.3" on your gff file. The error is "duplicate/invalid 'transcript'", you need to investigate that first. Also there are some thread on this subject https://biostar.usegalaxy.org/p/17359/ https://github.com/cole-trapnell-lab/cufflinks/issues/77
grep "Ma03_t01040.3" gives the following result
REmoving these give same error for some other ID
This indent is hard to read, why "Ma03_t01040.3" is on a new line ? Do you have a link to your gff, maybe this one ( http://banana-genome-hub.southgreen.fr/sites/banana-genome-hub.southgreen.fr/files/data/gff3/version2/musa_acuminata_v2.gff3 ) ?
Yes the same gff was used for analysis
Yes, because all your transcript names are duplicate not only this one
For all the features of a given transcript we have the same name.
As I can read out there, everyone pick up a gff or gtf coming from ensembl and it works well.
So, let's try with the ensembl plants database :
ftp://ftp.ensemblgenomes.org/pub/plants/release-38/gff3/musa_acuminata/Musa_acuminata.MA1.38.gff3.gz
This should works, because in your gff file I think lines with the 9th column starting with "Parent=" annoyed cuffmerge. Now in the gff from ensembl these lines are removed