Question

How To Remove All Hypothetical Protein From Genbank File

1

Entering edit mode

10.7 years ago

HG ★ 1.2k

Hi everyone after annotation one of sample genome i want to remove all hypothetical protein from genbank file. Can anyone give me any idea how to do simplest way ? Thank you advance.

genbank perl • 3.4k views

ADD COMMENT • link updated 2.7 years ago by francis ▴ 260 • written 10.7 years ago by HG ★ 1.2k

0

Entering edit mode

It would help to see which part of the file (field) contains the annotation and an example.

ADD REPLY • link 10.7 years ago by Neilfws 49k

0

Entering edit mode

Do you need to keep the GenBank format? Maybe you can simply convert the GBK to Fasta and filter the sequences by description in the fasta comment line.

ADD REPLY • link 10.5 years ago by JC 13k

Ram · Answer 1 · 2014-12-03

0

Entering edit mode

10.0 years ago

Jonathanjacobs ▴ 280

You don't need BioPerl / BioPython for this. If you know how many lines your annotation takes up (assuming its fixed) - then you can do something like this from the command line in Linux

sed 's/(\s{5}ANNOTATION_NAME\s+.+?\n)(.+?\n){NUMBEROFLINESTOREMOVE}//g' GENBANKFILE.gbk > NEWGENBANKFILE.gbk

Least - that's what worked for me when I ran into this issue.

ADD COMMENT • link updated 2.7 years ago by Ram 44k • written 10.0 years ago by Jonathanjacobs ▴ 280

score 0 · Answer 2 · 2022-02-26

0

Entering edit mode

2.7 years ago

francis ▴ 260

Use RefSeq, and stay away from XP_ records.

https://www.ncbi.nlm.nih.gov/refseq/ https://www.ncbi.nlm.nih.gov/books/NBK21091/table/ch18.T.refseq_accession_numbers_and_mole/?report=objectonly

https://ftp.ncbi.nlm.nih.gov/refseq/release/

Available in all formats ...

ADD COMMENT • link 2.7 years ago by francis ▴ 260