How to split fasta file by ID, making sure to keep two entrys in each file
1
0
Entering edit mode
22 months ago
SaltedPork ▴ 170

Hi, I have a fasta file which looks like:

>reference
AGCT
>entry1
AGCT
>entry2
AGCT

I want to split the fasta file by ID, but also include the reference in each file.

So far I am using seqkit split --by-id file.fasta and then cat reference.fasta file1.fasta > results.fasta

However this is quite slow, is there a way of doing this in one command?

fasta split seqkit bash • 594 views
ADD COMMENT
3
Entering edit mode
22 months ago

If you have already split your files then it is just a question of looping through the list, you can loop in bash or more elegantly with parallel:

ls -1 *.fa | parallel 'cat reference.fa {}.fa > {}.merged.fasta'
ADD COMMENT

Login before adding your answer.

Traffic: 1701 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6