I have genomic sequences (~30 ~200 bp) and I ran a BLAST against the full genome sequences (http://plants.ensembl.org/Triticum_aestivum/Info/Index). Now I want to know what feature the sequences are in. I'm particularly interested in knowing if they are UTR or introns. The feature file seems to be gff3 formatted in ensembl. What can I do in orer to know if my BLAST results are intronic or UTR?
Traceback (most recent call last): File "scripts/aux/gff_parse.py", line 15, in <module> db = gffutils.create_db(args.gff, "db") File "/usr/local/lib/python2.7/dist-packages/gffutils/create.py", line 1273, in create_db c.create() File "/usr/local/lib/python2.7/dist-packages/gffutils/create.py", line 488, in create self._populate_from_lines(self.iterator) File "/usr/local/lib/python2.7/dist-packages/gffutils/create.py", line 571, in _populate_from_lines fixed, final_strategy = self._do_merge(f, self.merge_strategy) File "/usr/local/lib/python2.7/dist-packages/gffutils/create.py", line 218, in _do_merge raise ValueError("Duplicate ID {0.id}".format(f)) ValueError: Duplicate ID CDS:TRIAE_CS42_1AL_TGACv1_000002_AA0000030.1
GFF file from ensembl does not seems to be well-formatted. Just need to know if my BLAST hits against introns or UTR, so difficult till now :8
Yeah. Been there. I mean if there's any recommended tool for simplifying the job and being efficient