How To Grep A Specific Contig From A List?
4
1
Entering edit mode
13.8 years ago
Bdv ▴ 320

Hi

I have a list of contigs and want to extract one contig (by its title). What is the exact command for that?

Thanks!

contigs fasta • 11k views
ADD COMMENT
5
Entering edit mode

if your contig file is in fasta format, there are a lot of discussion already open. Have a search in the archives.

ADD REPLY
5
Entering edit mode
13.8 years ago
User 59 13k

This has already been addressed as Giovanni pointed out:

Please try at least a cursory search of the site before posting a new question!

ADD COMMENT
2
Entering edit mode
13.8 years ago
Alex ★ 1.5k

Default grep can't find multiline patterns.

You can use pcregrep.

Check this topic from stackoverflow.

ADD COMMENT
2
Entering edit mode
12.3 years ago

If you want to extract multiple contigs at different times, it is highly suggested that you build an index. Here some commands:

samtools faidx in.fasta #(create an index for the fasta file with all the contigs)
samtools faidx in.fasta contigname > contigname.fasta #(extract the single contig you need)
ADD COMMENT
0
Entering edit mode
13.8 years ago
Vashar ▴ 20
#to extract seqnames from MS blast result(seqtool) files:
grep contigname filename > out.txt

If you need more lines from contig add -A number after grep like: (for 3 lines)

grep -A 3 contigname filename > out.txt
ADD COMMENT
0
Entering edit mode

This is impractical. A better solution would be to use a regex like >CONTIG_NAME[^>]+ but only if the engine supports multi-line matching (i.e., not the Unix grep tool).

ADD REPLY
0
Entering edit mode

I've used agrep for this (as in: agrep -d '^>' pattern contigs.fasta) but unfortunately it is unreliable, and has some ugly limitations. I also implemented my own tool (sgrep, part of the biohaskell stuff)

ADD REPLY

Login before adding your answer.

Traffic: 1995 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6