How to sort fasta file numerically
1
3
Entering edit mode
3.7 years ago

Hi,

I was wondering if there was a way to sort a fasta file so that the chromosomes are in numeric order. Currently, the fasta file is computationally sorted where the chromosomes are order like this

>chr1
>chr10
>chr11
....
>chr2
>chr20

I was wondering if there was a way to sort the fasta file so the chromosomes are sorted numerically like this

>chr1
>chr2
>chr3
...
>chr10
>chr11
Assembly genome sequence • 4.0k views
ADD COMMENT
2
Entering edit mode
$ seqkit sort -nN test.fa -o test_out.fa
ADD REPLY
4
Entering edit mode
3.7 years ago
ATpoint 85k

Try:

bioawk -c fastx '{print}' in.fa | sort -k1,1V | awk '{print ">"$1;print $2}'

or any of:

Sort Multiple Fasta Numerically

ADD COMMENT
0
Entering edit mode

I saw that post. I tried Frédéric Mahé post and it doesn't appear to work. I can't seem to get Ih3 answer to work either

ADD REPLY
1
Entering edit mode

I edited my answer, please try again.

ADD REPLY
0
Entering edit mode

I believe it is because the orginal post has a different chromosome structure i.e ">chr2_1000-1020"

ADD REPLY
1
Entering edit mode

Yes, to make Frédéric Mahé answer work with your data, you need to change the -t field of the sort function so that it fits your fasta headers.

sort -t "r" -k2n for instance instead of sort -t "_" -k2n

sed -e '/>/s/^/@/' -e '/>/s/$/#/' file.fasta | tr -d "\n" | tr "@" "\n" | sort -t "r" -k2n | tr "#" "\n" | sed -e '/^$/d'
ADD REPLY
0
Entering edit mode

That works as well. Thank you!!

ADD REPLY
0
Entering edit mode

Ahhh I see. Yeah, that works now. Thank you

ADD REPLY

Login before adding your answer.

Traffic: 2531 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6