Alphabetic Sort In Unix/Perl With Preference On Order Of Alphabets To Be Followed
3
2
Entering edit mode
13.9 years ago
Monzoor ▴ 300

sorting DNA sequences in unix is done in alphabetic order Is is possible to sort DNA sequences with a specified order of alphabets ?

unix perl sort dna sequence • 4.6k views
ADD COMMENT
5
Entering edit mode
13.9 years ago

I'm not sure I understand your question. But if you want to sort for example using : C, A, T and G , I would use 'tr' to change the letters of the sequence. Something like

cat onesequenceperline.txt |\
tr "C" "0" | tr "A" "1" | tr "T" "2" | tr "G" "3" |\
sort |\
tr "0" "C" | tr "1" "A" | tr "2" "T" | tr "3" "G" > result.txt
ADD COMMENT
3
Entering edit mode

You can cut the number of processes down from 10 to 3 by removing the 'useless use of cat' and the redundant tr's:

tr 'CATG' '0123' < onesequenceperline.txt | sort | tr '0123' 'CATG' > result.txt
ADD REPLY
2
Entering edit mode

Wow!. This is a simple yet effective idea. Somehow never struck me. I have to check how it scales for huge data sets. Any way, thanks a lot PL.

ADD REPLY
0
Entering edit mode

@Monzoot : very nice suggestion, thanks :-)

ADD REPLY
0
Entering edit mode

@Keith , very nice suggestion ! thanks ! :-)

ADD REPLY
4
Entering edit mode
13.9 years ago
Rvosa ▴ 580

If you are asking how to sort alphabetically, a file like Pierre is imagining could be sorted alphabetically in perl like this:

perl -ane 'chomp;print sort {$a cmp $b} split(//,$_), "\n"' onesequenceperline.txt

Or in reverse order by switching $a and $b around (actually, in normal order the first argument to sort can be omitted so you could golf it down some more). An advantage of this is that it handles all IUPAC single nucleotide codes, but a disadvantage is that it doesn't let you define a custom ordering, as in Pierre's solution. If you want that, you will have to define a custom sort function, which won't fit neatly in a one-liner. Or at the very least a custom mapping, such as the %map hash, which achieves the same ordering as Pierre's, but sets all letters in the sequence to uppercase and checks to see if there are no unexpected letters (it dies if there are):

use strict;

my %map = (
    'C' => 0,
    'A' => 1,
    'T' => 2,
    'G' => 3,
);

while(<>) {
    chomp;
    print sort { $map{$a} <=> $map{$b} } grep { exists $map{$_} or die $_ } map { uc } split //;
    print "\n";
}
ADD COMMENT
0
Entering edit mode

This is also a good suggestion. Thank you is all I can say

ADD REPLY
0
Entering edit mode
13.9 years ago
Spitshine ▴ 660

I am not sure I understand your question either but it sounds as if you could use a custom compare function in Perl to pass to your sort. (http://perldoc.perl.org/functions/sort.html)

ADD COMMENT

Login before adding your answer.

Traffic: 2929 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6