Entering edit mode
16 months ago
Giffredo
▴
10
Hi,
I would like to create a new sub-database from the nr BLAST DB containing all the sequences related to biogenic amines. So, I need a script to extract sequences from the nr BLAST database based on partial protein names.
Example: for the word "phosph" I would like to reach the fasta output like that:
>VFG037176(gb|YP_001844723) (plc) phospholipase C [Phospholipase C (VF0470)] [Acinetobacter baumannii ACICU]
MNRREFLLNSTKTMFGTAALASFPLSIQKALAIDAKVESGTIQDVKHIV...
>VFG037177(gb|YP_001846906) (plc) phospholipase C [Phospholipase C (VF0470)] [Acinetobacter baumannii ACICU]
MITRRKFLNYSLNMGFGAAALAAFPSSIQKALAIPANNKTGTIQDVEHV...
>VFG037203(gb|YP_001847849) (plcD) phosphatidylserine/phosphatidylglycerophosphate/cardiolipin synthase [Phospholipase D (VF0469)] [Acinetobacter baumannii ACICU]
MAQSFHSKQLQTHQLANGFLIKASIVVCSSFAVALTGCSTLPKHSPEPI...