When converting a GFF3 file into a EMBL file, what should be filled as locus_tag and ID?
0
0
Entering edit mode
6.2 years ago
Jerryliu ▴ 10

I am using the software EMBLmyGFF3 to convert a gff3 file into a emblem, but the locus_tug and ID were required , do any one knows where I should look for these information? and what do locus_tug and ID mean in an EMBL file?


after converting the file into embl format, I need to put this file as input file into another software, this is the information for the input file in EMBL format:

requierment:Gene annotation in EMBL format TriAnnot Note : only the locus_tag and the id is require to run clariTE.pl, other tag (such as blastp_file...) are not necessary.(following is what the EMBL file look like)

ID unknown; SV 1; linear; unassigned DNA; STD; UNC; 1411106 BP. XX AC unknown; XX XX FT CDS join(141960..142006,142121..142147,142248..142370,142493..142739,142850..142873) FT /locus_tag="v443_0002_EXONERATE_BLASTX_protOSA_6" FT /blastp_file="v443_0002_EXONERATE_BLASTX_protOSA_141960_142873_Q5JM42_Match_0005_mRNA_CDS.bltp" FT /id="v443_0002_EXONERATE_BLASTX_protOSA_141960_142873_Q5JM42_Match_0005_mRNA_joinedCDS" FT /note="Similar_to: hypothetical_protein" FT /note="BestBlastHit: B9EZI3_ORYSJ TrEMBL databank Putative uncharacterized protein - %25id: 91.67 - hcov: 13.78 - qcov: 100.00" FT /note="Status: High Confidence" FT CDS complement(join(143435..144154,144239..144363,145030..145267)) FT /locus_tag="v443_0002_EXONERATE_BLASTX_validated_9" FT /expressed FT /blastp_file="v443_0002_EXONERATE_BLASTX_validated_143435_145267_AFR_02_CAT01_3_Match_0001_mRNA_CDS.bltp" FT /id="v443_0002_EXONERATE_BLASTX_validated_143435_145267_AFR_02_CAT01_3_Match_0001_mRNA_joinedCDS" FT /note="Similar_to: putative_function - F2CSA4_HORVD TrEMBL databank Predicted protein OS Hordeum vulgare var distichum PE 2 SV 1" FT /note="BestBlastHit: F2CSA4_HORVD TrEMBL databank Predicted protein - %25id: 96.12 - hcov: 100.56 - qcov: 100.00" FT /note="Function_coverage: 94.71" FT /note="Function_identity: 97.94" FT /note="Function_target: F2CSA4 22 361" FT /note="Status: High Confidence"

sequence gene genome • 1.3k views
ADD COMMENT
1
Entering edit mode

Please see the Parameter section of the tool manual for more info.

ADD REPLY
0
Entering edit mode

If you talk about the /id from the qualifier list, it is not an accepted EMBL qualifier. It probably reflects the ID tag of 9th column of the gff3. The EMBLmyGFF3 tool will put the ID from your gff file in a /note qualifier like that:

 /note="ID:g1.t1"

So then you can fix the lines to get an /id qualifier with a sed command:

sed 's/\/note="ID:/\/id="/g' myFile.embl > myReadyFile.embl

About the locus_tag I think you can use any of your choice. If you talk about the ID of the accession ID (First line and AC line), I agree with Sed Modha, take the time to read the readme and the associated ENA documentation., everything should be explained.

ADD REPLY

Login before adding your answer.

Traffic: 2114 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6