how to sort fasta file according to a header file
2
1
Entering edit mode
12 months ago
Nelo ▴ 20

Hi!

I have two files: one is protein fasta file (a.fa) & another is header.txt. I want to get my sequences in the same order as the header file. How can I do this?

fasta • 863 views
ADD COMMENT
4
Entering edit mode

This is a type of a question where the only appropriate response is "Would you like fries with your order?"

You are not showing any previous effort to solve this problem. There is also a search function which you don't seem to have tried.

This is not a service website where you simply come and make an order, and assume that someone will jump right at it and do your work for free.

ADD REPLY
0
Entering edit mode

I wanted to get my sequences in the same order as

That is not a simple sort then? Are you selecting specific sequences or just want to sort based on an external file?

ADD REPLY
1
Entering edit mode
awk 'NR==FNR{a[$0]; next} /^>/{if($1 in a) print $0; flag = $1 in a ? 1 : 0; next} flag' header.txt a.fa >reordered.fasta

this is the one I tried, but it is not giving the dersired output I want

ADD REPLY
0
Entering edit mode

Your response does not answer GenoMax's question. "Are you selecting specific sequences or just want to sort based on an external file?"

ADD REPLY
5
Entering edit mode
12 months ago

Just use seqkit faidx a.fa -l headers.txt, usage.

$ cat s.fa
>1
a
>2
c
>3
t

$ cat headers.txt 
2
3
1

$ seqkit faidx s.fa -l headers.txt 
[INFO] 3 patterns loaded from file
[INFO] create or read FASTA index ...
[INFO] create FASTA index for s.fa
>2
c
>3
t
>1
a
ADD COMMENT
2
Entering edit mode
12 months ago
tshtatland ▴ 190

Use seqkit fx2tab and seqkit fx2tab to convert between fasta and tsv. See:

Use awk or Perl to sort the resulting tsv file based on another file. See:

ADD COMMENT

Login before adding your answer.

Traffic: 2815 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6