Search Motif In Raw Reads
4
2
Entering edit mode
11.6 years ago

Hi,

How can I find a specific pattern (~20nt long) and its reverse complement from my raw reads (paired-end data so 2 fastq files) ? and then extract them into 2 new fastq files ?

Thanks a lot,

N.

motif search read • 5.5k views
ADD COMMENT
1
Entering edit mode

And what does this pattern look like? How specific - like no mismatches/indels?

ADD REPLY
4
Entering edit mode
11.6 years ago

Use Biopieces www.biopieces.org) and try something like this:

read_fastq -i file1.fq,file2.fq | patscan_seq -i -c -p acgtactagctagctactagc[2,1,1] | grab -p PATTERN -K | write_fastq -o matched.fq -x

This allows for 2 mismatches, 1 insertion and 1 deletion (and ambiguity codes).

ADD COMMENT
4
Entering edit mode
11.6 years ago

gunzip and paste both fasts on the fly, with awk , group by '4 lines', test if the pair matches your needs (here first read starts with the regex "^ATGGG[GC]AAAA*" split the output into two files using awk.

paste <(gunzip -c input_1.fastq.gz) <(gunzip -c input_2.fastq.gz)  |\
awk '{a[i]=$0;++i;if(i==4){split(a[1],S,"\t"); if(S[1] ~ "^ATGGG[GC]AAAA*") {for(j=0;j<4;++j) printf("%s\n",a[j]);}; i=0;}}' |\
awk -F '    ' '{print $1 >> "select_1.fq"; print $2 >> "select_2.fq";}'
ADD COMMENT
0
Entering edit mode
11.6 years ago
k.nirmalraman ★ 1.1k

Well, MEME might not be the right choice for this, but FIMO could be used... It allows motif discovery based on PWM .

ADD COMMENT

Login before adding your answer.

Traffic: 1753 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6