Entering edit mode
11.4 years ago
HG
★
1.2k
Hi everybody, can any one tell me please how to extract a largest contig from a multi-fasta file ?? using awk or grep ??
Hi everybody, can any one tell me please how to extract a largest contig from a multi-fasta file ?? using awk or grep ??
You can use Heng Li's bioawk and samtools:
Then the commands would be
# sort the sequences by length
$ cat w.fasta | bioawk -c fastx '{ print length($seq), $name }' | sort -k1,1rn | head -1
989 HR5V3UP02C00KT
# extract the sequence from the file
$ samtools faidx w.fasta HR5V3UP02C00KT
>HR5V3UP02C00KT
TCGTACTCGTACGTAGAGGTTCGATCCTAGGGTCCTACGACGGAAGTAAAAACGGCCGGT
CCGGGCCCCGGTTCGACGTCGGACCGTAACCAACGAAAATTGGCCGGTAAAGGGGGTTCC
...
cat seqs_oneline.fasta | perl -e 'while (<>) {$h=$_; $s=<>; $seqs{$h}=$s;} foreach $header (reverse sort {length($seqs{$a}) <=> length($seqs{$b})} keys %seqs) {print $header.$seqs{$header}}' | head -2
Source: sort fasta
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Than you so much but could you please check the error
don't type in the whole thing at first, build it one step at a time, and pipe it through a pager, that way you will notice potential errors
Dear Istvan, I am doing according to you suggestion:
Here is the problem
could you please tell me waht is the error ??
did you install bioawk? that's is the version of awk you will need to use.
Hello Istvan, I have downloaded bioawk but when i am trying to install its showing
But once i am checking bioawk like
Could you please let me know any solution ????
There is no space after
-k1,
can you explain this code please?