Hello all! I am trying to generate an EMBL flat file to submit an annotated assembly to ENA. I am using EMBLmyGFF3 to generate the flat file from the whole genome FASTA file and the GFF3 file. I am getting two errors and a common warning which are:
Errors:
17:17:17 ERROR feature: >>start_codon<< is not a valid EMBL feature type. You can ignore this message if you don't need the feature.
Otherwise tell me which EMBL feature it corresponds to by adding the information within the json mapping file.
17:17:17 ERROR feature: >>stop_codon<< is not a valid EMBL feature type. You can ignore this message if you don't need the feature.
Otherwise tell me which EMBL feature it corresponds to by adding the information within the json mapping file.
Warnings:
17:17:43 WARNING EMBLmyGFF3: Sequence NODE_446_length_99_cov_479.909091 too short (99 bp)! Minimum accpeted by ENA is 100, we skip it !
17:17:43 WARNING EMBLmyGFF3: Sequence NODE_447_length_99_cov_30.409091 too short (99 bp)! Minimum accpeted by ENA is 100, we skip it !
17:17:43 WARNING EMBLmyGFF3: Sequence NODE_448_length_98_cov_103.285714 too short (98 bp)! Minimum accpeted by ENA is 100, we skip it !
17:17:43 WARNING EMBLmyGFF3: Sequence NODE_449_length_98_cov_59.095238 too short (98 bp)! Minimum accpeted by ENA is 100, we skip it !
17:17:43 WARNING EMBLmyGFF3: Sequence NODE_450_length_98_cov_49.000000 too short (98 bp)! Minimum accpeted by ENA is 100, we skip it !
17:17:43 WARNING EMBLmyGFF3: Sequence NODE_451_length_98_cov_39.142857 too short (98 bp)! Minimum accpeted by ENA is 100, we skip it !
Can someone please help me address the specified error? Is there any way to handle the warnings and include the short sequences as well?
Thank you!
Thank you. This helps!
Juke34 After so many efforts, I got the tool to work. Thank you. In my own case, I am not submitting the sequences to ENA, but to create a repeats library to be concatenated with RepeatMasker library. I need to have all the repeats model represented, so 100bp cannot be a limit that is acceptable. Is there a work around to avoid this. I do not want to pad the sequence with Ns, it wouldn't make any sense. Thanks for your feedback in advance.
I also would like to point out that how to use the --accession switch is a little confusing the way it is written in the documentation. stating that it is a Boolean data type presupposes that a value is supplied with the argument {True | False}. I had tried this many times with no success until I used only the -a without any argument. I think it needs to be stated explicitly that no argument must be supplied to the parameter. My thoughts.