hello
I am currently trying to do RNA-seq using public data in brassica juncea.
To use htseq-count for making count table, I have to convert gff file which downloaded in brassica database to gtf file.
So I used gffread for converting gff file with below command
gffread Bju.genome.gff -T -o Bju.genome.gtf
But when I type above command, I got a lot of error message like below
Error: discarding overlapping duplicate mRNA feature (24650-31472) with ID=BjuO004617
Error: discarding overlapping duplicate mRNA feature (39979-40535) with ID=BjuO004618
Error: discarding overlapping duplicate mRNA feature (30824-34680) with ID=BjuO001651
Error: discarding overlapping duplicate mRNA feature (56651-57957) with ID=BjuO001652
Error: discarding overlapping duplicate mRNA feature (78160-80198) with ID=BjuO001654
Error: discarding overlapping duplicate mRNA feature (74204-77329) with ID=BjuO001653
Error: discarding overlapping duplicate mRNA feature (4453-4737) with ID=BjuO006559
Error: discarding overlapping duplicate mRNA feature (10094-10618) with ID=BjuO009958
Error: discarding overlapping duplicate mRNA feature (15520-17470) with ID=BjuO010812
Error: discarding overlapping duplicate mRNA feature (22118-22816) with ID=BjuO010813
Error: discarding overlapping duplicate mRNA feature (5722-6432) with ID=BjuO010811
Error: discarding overlapping duplicate mRNA feature (3944-4429) with ID=BjuO007439
Error: discarding overlapping duplicate mRNA feature (790-2307) with ID=BjuO007438
Error: discarding overlapping duplicate mRNA feature (6457-6978) with ID=BjuO005411
To check if the file was created properly, I checked file size. And it looks like ok.
40M Bju.genome.gff 56M Bju.genome.gtf
and I also used 'head' command, and there seems to be no problem
A01 GeneWise gene 3352 5985 1195.88 + . ID=BjuA000594;
A01 GeneWise mRNA 3352 5985 1195.88 + . ID=BjuA000594;Parent=BjuA000594;
A01 GeneWise CDS 3352 3675 1195.88 + 0 ID=BjuA000594.cds;Parent=BjuA000594;
A01 GeneWise CDS 4225 4447 1195.88 + 0 ID=BjuA000594.cds;Parent=BjuA000594;
A01 GeneWise CDS 4579 4949 1195.88 + 2 ID=BjuA000594.cds;Parent=BjuA000594;
A01 GeneWise CDS 5091 5326 1195.88 + 0 ID=BjuA000594.cds;Parent=BjuA000594;
A01 GeneWise CDS 5421 5691 1195.88 + 1 ID=BjuA000594.cds;Parent=BjuA000594;
A01 GeneWise CDS 5785 5985 1195.88 + 0 ID=BjuA000594.cds;Parent=BjuA000594;
A01 GeneWise gene 8012 14242 1731.69 + . ID=BjuA002557;
A01 GeneWise mRNA 8012 14242 1731.69 + . ID=BjuA002557;Parent=BjuA002557;
So, I wonder why the error message appeared and whether it is okay to use the created file. I would be really grateful if someone reply.
I just wanted to note that the input GFF is not a valid GFF3 file since the ID field should be unique (see http://gmod.org/wiki/GFF3) and this is not the case.
Right between mRNA and gene. That might be the reason of the errors thrown by gffread.
Fortunately AGAT deals well with that type of problem.