How do I extract a fasta sequence using the specific keyword (**Psychro**) ?
1
2
Entering edit mode
2.5 years ago
sunnykevin97 ▴ 990

HI,

I had a single fasta file with 60,000 bacterial genomes, I'd like to extract the entire fasta sequence which has a keyword Psychro in its header ( Ex - Psychrobacter - 18982 genomes).

I'm aware we can extract a subseq using the seqtk

seqtk subseq test.fa test.txt  

But I want to extract entire fasta sequence by providing the fasta headers in the test.txt

I provided example fasta header names (~18982 totally genomes I want to extract)

>NZ_CAJHBU010000049.1 **Psychrobacter vallis** isolate Psychrobacter vallis CMS39, whole genome shotgun sequence
>NZ_CAJHBM010000029.1 **Psychrobacter sp. JCM** 18903 isolate Psychrobacter sp. JCM18903, whole genome shotgun sequence
>NZ_CAJHBB010000047.1 **Psychrobacter sanguinis** isolate Psychrobacter sanguinis 13983, whole genome shotgun sequence

>NC_007204.1 **Psychrobacter arcticus** 273-4, complete sequence
>NC_007969.1 **Psychrobacter cryohalolentis K5**, complete sequence
>NC_007968.1 **Psychrobacter cryohalolentis K5** plasmid 1, complete sequence
>NC_008709.1 **Psychromonas ingrahamii 37**, complete sequence
>NC_020802.1 **Psychromonas sp. CNPT3,** complete sequence
>NC_018721.1 **Psychroflexus torquis ATCC** 700755, complete sequence

Suggestions please!

protein gene genome • 811 views
ADD COMMENT
5
Entering edit mode
2.5 years ago
GenoMax 147k

If you have the headers as shown above you can use the answers here: How do I extract Fasta Sequences based on a list of IDs?

ADD COMMENT
2
Entering edit mode

Well I tried using

faSomeRecords bactgenomes.fasta Psychro.txt  PSCHYo.fasta

Works fine!

ADD REPLY

Login before adding your answer.

Traffic: 2612 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6