Question

Search Motif In Raw Reads

2

Entering edit mode

11.5 years ago

Nicolas Rosewick 11k

Hi,

How can I find a specific pattern (~20nt long) and its reverse complement from my raw reads (paired-end data so 2 fastq files) ? and then extract them into 2 new fastq files ?

Thanks a lot,

N.

motif search read • 5.5k views

ADD COMMENT • link updated 11.5 years ago by k.nirmalraman ★ 1.1k • written 11.5 years ago by Nicolas Rosewick 11k

1

Entering edit mode

And what does this pattern look like? How specific - like no mismatches/indels?

ADD REPLY • link 11.5 years ago by Martin A Hansen 3.0k

score 4 · Answer 1 · 2013-05-22

4

Entering edit mode

11.5 years ago

Martin A Hansen 3.0k

Use Biopieces www.biopieces.org) and try something like this:

read_fastq -i file1.fq,file2.fq | patscan_seq -i -c -p acgtactagctagctactagc[2,1,1] | grab -p PATTERN -K | write_fastq -o matched.fq -x

This allows for 2 mismatches, 1 insertion and 1 deletion (and ambiguity codes).

ADD COMMENT • link 11.5 years ago by Martin A Hansen 3.0k

score 4 · Answer 2 · 2013-05-22

gunzip and paste both fasts on the fly, with awk , group by '4 lines', test if the pair matches your needs (here first read starts with the regex "^ATGGG[GC]AAAA*" split the output into two files using awk.

paste <(gunzip -c input_1.fastq.gz) <(gunzip -c input_2.fastq.gz)  |\
awk '{a[i]=$0;++i;if(i==4){split(a[1],S,"\t"); if(S[1] ~ "^ATGGG[GC]AAAA*") {for(j=0;j<4;++j) printf("%s\n",a[j]);}; i=0;}}' |\
awk -F '    ' '{print $1 >> "select_1.fq"; print $2 >> "select_2.fq";}'

score 0 · Answer 3 · 2013-05-22

0

Entering edit mode

11.5 years ago

k.nirmalraman ★ 1.1k

Well, MEME might not be the right choice for this, but FIMO could be used... It allows motif discovery based on PWM .

ADD COMMENT • link 11.5 years ago by k.nirmalraman ★ 1.1k