Entering edit mode
4.7 years ago
sharmatina189059
▴
110
Hi all Can anyone tell me how to retrieve multi-line protein sequences with Ids present in headers?
>gi|1706522686|gb|QDM68077.1| CraA [Acinetobacter baumannii]
MKNIQTTALNRTTLMFPLALVLFEFAVYIGNDLIQPAMLAITEDFGVSATWAPSSMSFYLLGGASVAWLL
GPLSDRLGRKKVLLSGVLFFALCCFLILLTRQIEHFLTLRFLQGIGLSVISAVGYAAIQENFAERDAIKV
MALM
>gi|1818457412|dbj|BCA98153.1| 1-acyl-sn-glycerol-3-phosphate acyltransferase [Acinetobacter baumannii]
MTQTQSIVNSTLKKFSKIGLYGKKVTSATAAISEGFYLVYRHGLYKDPNNPVNTRYVQYFCRRLCQVFNL
EVQVHGTIPREPALWVSNHISWLDIAVLGSGARVFFLAKAEIEKWPILGNLAKGGGTLFIKRGSGDSIKI
>gi|1818457412|dbj|BCA98158.1| 1-acyl-phosphate acyltransferase [Acinetobacter baumannii]
MTQTQSIVNSTLKKFSKIGLYGKKVTSATAAISEGFYLVYRHGLYKDPNNPVNTRYVQYFCRRLCQVFNL
EVQVHGTIPREPALWVSNHISWLDIAVLGSGARVFFLAKAEIEKWPILGNLAKGGGTLFIKRGSGDSIKI
and I have Ids like
QDM68077.1
BCA98153.1
Please let me know how to retrieve sequnces for these Ids. Would appreciate if someone tell me how to use seqkit. I have used seqkit like
seqkit grep -nrif remaining_except_core 307_DR_determinats.fasta but getting nothing from this command.
Take a look at the
Similar posts
section on the right-hand side of the page for related questions and the corresponding solutions.e.g. Retrieve multi-line fasta sequences using list of locus tag shows a very similar question and an accepted solution that you could try.
Also take a look at: https://bioinf.shenwei.me/seqkit/usage/#grep
If you can do this with a GUI application, then take a look at SEDA (https://www.sing-group.org/seda/).
You can apply the Pattern filtering operation (https://www.sing-group.org/seda/manual/operations.html#pattern-filtering) to the headers (check the Header radio button) using the sequence IDs you want (Note: you can use the Import patterns option to import these IDs from a TXT file instead of typing them manually into the GUI).