Question

How to extract multi-line protein sequences with Ids present in headers ?

0

Entering edit mode

4.7 years ago

sharmatina189059 ▴ 110

Hi all Can anyone tell me how to retrieve multi-line protein sequences with Ids present in headers?

>gi|1706522686|gb|QDM68077.1| CraA [Acinetobacter baumannii]
MKNIQTTALNRTTLMFPLALVLFEFAVYIGNDLIQPAMLAITEDFGVSATWAPSSMSFYLLGGASVAWLL
GPLSDRLGRKKVLLSGVLFFALCCFLILLTRQIEHFLTLRFLQGIGLSVISAVGYAAIQENFAERDAIKV
MALM
>gi|1818457412|dbj|BCA98153.1| 1-acyl-sn-glycerol-3-phosphate acyltransferase [Acinetobacter baumannii]
MTQTQSIVNSTLKKFSKIGLYGKKVTSATAAISEGFYLVYRHGLYKDPNNPVNTRYVQYFCRRLCQVFNL
EVQVHGTIPREPALWVSNHISWLDIAVLGSGARVFFLAKAEIEKWPILGNLAKGGGTLFIKRGSGDSIKI

>gi|1818457412|dbj|BCA98158.1| 1-acyl-phosphate acyltransferase [Acinetobacter baumannii]
MTQTQSIVNSTLKKFSKIGLYGKKVTSATAAISEGFYLVYRHGLYKDPNNPVNTRYVQYFCRRLCQVFNL
EVQVHGTIPREPALWVSNHISWLDIAVLGSGARVFFLAKAEIEKWPILGNLAKGGGTLFIKRGSGDSIKI

and I have Ids like

QDM68077.1
BCA98153.1

Please let me know how to retrieve sequnces for these Ids. Would appreciate if someone tell me how to use seqkit. I have used seqkit like

seqkit grep -nrif remaining_except_core 307_DR_determinats.fasta but getting nothing from this command.

PERL awk Sed • 1.6k views

ADD COMMENT • link updated 2.8 years ago by jena ▴ 320 • written 4.7 years ago by sharmatina189059 ▴ 110

0

Entering edit mode

Take a look at the Similar posts section on the right-hand side of the page for related questions and the corresponding solutions.

e.g. Retrieve multi-line fasta sequences using list of locus tag shows a very similar question and an accepted solution that you could try.

Also take a look at: https://bioinf.shenwei.me/seqkit/usage/#grep

ADD REPLY • link 4.7 years ago by Sej Modha 5.3k

0

Entering edit mode

If you can do this with a GUI application, then take a look at SEDA (https://www.sing-group.org/seda/).

You can apply the Pattern filtering operation (https://www.sing-group.org/seda/manual/operations.html#pattern-filtering) to the headers (check the Header radio button) using the sequence IDs you want (Note: you can use the Import patterns option to import these IDs from a TXT file instead of typing them manually into the GUI).

ADD REPLY • link 4.7 years ago by Hugo ▴ 380

score 0 · Answer 1 · 2022-03-11

0

Entering edit mode

2.8 years ago

jena ▴ 320

Easy with seqmagick:

# if you have full IDs
seqmagick convert --include-from-file ids.txt input.fa output.fa

# if you have partial IDs, you can match patterns
seqmagick convert --pattern-include REGEX input.fa output.fa

ADD COMMENT • link 2.8 years ago by jena ▴ 320