sorting DNA sequences in unix is done in alphabetic order Is is possible to sort DNA sequences with a specified order of alphabets ?
sorting DNA sequences in unix is done in alphabetic order Is is possible to sort DNA sequences with a specified order of alphabets ?
I'm not sure I understand your question. But if you want to sort for example using : C, A, T and G , I would use 'tr' to change the letters of the sequence. Something like
cat onesequenceperline.txt |\
tr "C" "0" | tr "A" "1" | tr "T" "2" | tr "G" "3" |\
sort |\
tr "0" "C" | tr "1" "A" | tr "2" "T" | tr "3" "G" > result.txt
If you are asking how to sort alphabetically, a file like Pierre is imagining could be sorted alphabetically in perl like this:
perl -ane 'chomp;print sort {$a cmp $b} split(//,$_), "\n"' onesequenceperline.txt
Or in reverse order by switching $a
and $b
around (actually, in normal order the first argument to sort can be omitted so you could golf it down some more). An advantage of this is that it handles all IUPAC single nucleotide codes, but a disadvantage is that it doesn't let you define a custom ordering, as in Pierre's solution. If you want that, you will have to define a custom sort function, which won't fit neatly in a one-liner. Or at the very least a custom mapping, such as the %map
hash, which achieves the same ordering as Pierre's, but sets all letters in the sequence to uppercase and checks to see if there are no unexpected letters (it dies if there are):
use strict;
my %map = (
'C' => 0,
'A' => 1,
'T' => 2,
'G' => 3,
);
while(<>) {
chomp;
print sort { $map{$a} <=> $map{$b} } grep { exists $map{$_} or die $_ } map { uc } split //;
print "\n";
}
I am not sure I understand your question either but it sounds as if you could use a custom compare function in Perl to pass to your sort. (http://perldoc.perl.org/functions/sort.html)
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
You can cut the number of processes down from 10 to 3 by removing the 'useless use of cat' and the redundant tr's:
Wow!. This is a simple yet effective idea. Somehow never struck me. I have to check how it scales for huge data sets. Any way, thanks a lot PL.
@Monzoot : very nice suggestion, thanks :-)
@Keith , very nice suggestion ! thanks ! :-)