Question

How To Grep Largest Contig From A Multi Fasta File

0

Entering edit mode

12.0 years ago

HG ★ 1.2k

Hi everybody, can any one tell me please how to extract a largest contig from a multi-fasta file ?? using awk or grep ??

awk • 13k views

ADD COMMENT • link updated 4.1 years ago by FadyNabil ▴ 20 • written 12.0 years ago by HG ★ 1.2k

Ram · Answer 1 · 2013-08-08

6

Entering edit mode

12.0 years ago

Istvan Albert 102k

You can use Heng Li's bioawk and samtools:

Then the commands would be

# sort the sequences by length
$ cat w.fasta | bioawk -c fastx '{ print length($seq), $name }' | sort -k1,1rn | head -1
989    HR5V3UP02C00KT

# extract the sequence from the file 
$ samtools faidx w.fasta HR5V3UP02C00KT
>HR5V3UP02C00KT
TCGTACTCGTACGTAGAGGTTCGATCCTAGGGTCCTACGACGGAAGTAAAAACGGCCGGT
CCGGGCCCCGGTTCGACGTCGGACCGTAACCAACGAAAATTGGCCGGTAAAGGGGGTTCC
...

ADD COMMENT • link updated 5.8 years ago by Ram 45k • written 12.0 years ago by Istvan Albert 102k

0

Entering edit mode

Than you so much but could you please check the error

$ cat list4a.fasta | awk -c fastx '{ print length($seq), $name }'| sort -k1, 1rn | head -1
sort: invalid number after `,': invalid count at start of `'

ADD REPLY • link updated 5.8 years ago by Ram 45k • written 12.0 years ago by HG ★ 1.2k

2

Entering edit mode

don't type in the whole thing at first, build it one step at a time, and pipe it through a pager, that way you will notice potential errors

ADD REPLY • link 12.0 years ago by Istvan Albert 102k

0

Entering edit mode

Dear Istvan, I am doing according to you suggestion:

cat list4a.fasta           #working fine 
awk -c fastx '{ print length($seq), $name }'

Here is the problem

Usage: awk [POSIX or GNU style options] -f progfile [--] file ...
Usage: awk [POSIX or GNU style options] [--] 'program' file ...

could you please tell me waht is the error ??

ADD REPLY • link updated 5.8 years ago by Ram 45k • written 12.0 years ago by HG ★ 1.2k

1

Entering edit mode

did you install bioawk? that's is the version of awk you will need to use.

ADD REPLY • link 12.0 years ago by Istvan Albert 102k

0

Entering edit mode

Hello Istvan, I have downloaded bioawk but when i am trying to install its showing

hiren@FB11-10207:~/Desktop/spades/bioawk-master $ make 
make: `awk' is up to date.

But once i am checking bioawk like

hiren@FB11-10207:~/Desktop/spades/bioawk-master $ bioawk
bioawk: command not found

Could you please let me know any solution ????

ADD REPLY • link updated 5.8 years ago by Ram 45k • written 12.0 years ago by HG ★ 1.2k

1

Entering edit mode

There is no space after -k1,

ADD REPLY • link updated 5.8 years ago by Ram 45k • written 12.0 years ago by Pierre Lindenbaum 166k

0

Entering edit mode

can you explain this code please?

ADD REPLY • link 4.1 years ago by FadyNabil ▴ 20

Ram · Answer 2 · 2014-02-21

1

Entering edit mode

11.4 years ago

Prakki Rama ★ 2.7k

cat seqs_oneline.fasta | perl -e 'while (<>) {$h=$_; $s=<>; $seqs{$h}=$s;} foreach $header (reverse sort {length($seqs{$a}) <=> length($seqs{$b})} keys %seqs) {print $header.$seqs{$header}}' | head -2

Source: sort fasta

ADD COMMENT • link updated 5.8 years ago by Ram 45k • written 11.4 years ago by Prakki Rama ★ 2.7k