Hi everyone after annotation one of sample genome i want to remove all hypothetical protein from genbank file. Can anyone give me any idea how to do simplest way ? Thank you advance.
Hi everyone after annotation one of sample genome i want to remove all hypothetical protein from genbank file. Can anyone give me any idea how to do simplest way ? Thank you advance.
You don't need BioPerl / BioPython for this. If you know how many lines your annotation takes up (assuming its fixed) - then you can do something like this from the command line in Linux
sed 's/(\s{5}ANNOTATION_NAME\s+.+?\n)(.+?\n){NUMBEROFLINESTOREMOVE}//g' GENBANKFILE.gbk > NEWGENBANKFILE.gbk
Least - that's what worked for me when I ran into this issue.
Use RefSeq, and stay away from XP_ records.
https://www.ncbi.nlm.nih.gov/refseq/ https://www.ncbi.nlm.nih.gov/books/NBK21091/table/ch18.T.refseq_accession_numbers_and_mole/?report=objectonly
https://ftp.ncbi.nlm.nih.gov/refseq/release/
Available in all formats ...
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
It would help to see which part of the file (field) contains the annotation and an example.
Do you need to keep the GenBank format? Maybe you can simply convert the GBK to Fasta and filter the sequences by description in the fasta comment line.