Hi,
How can I find a specific pattern (~20nt long) and its reverse complement from my raw reads (paired-end data so 2 fastq files) ? and then extract them into 2 new fastq files ?
Thanks a lot,
N.
Hi,
How can I find a specific pattern (~20nt long) and its reverse complement from my raw reads (paired-end data so 2 fastq files) ? and then extract them into 2 new fastq files ?
Thanks a lot,
N.
Use Biopieces www.biopieces.org) and try something like this:
read_fastq -i file1.fq,file2.fq | patscan_seq -i -c -p acgtactagctagctactagc[2,1,1] | grab -p PATTERN -K | write_fastq -o matched.fq -x
This allows for 2 mismatches, 1 insertion and 1 deletion (and ambiguity codes).
gunzip and paste both fasts on the fly, with awk , group by '4 lines', test if the pair matches your needs (here first read starts with the regex "^ATGGG[GC]AAAA*" split the output into two files using awk.
paste <(gunzip -c input_1.fastq.gz) <(gunzip -c input_2.fastq.gz) |\
awk '{a[i]=$0;++i;if(i==4){split(a[1],S,"\t"); if(S[1] ~ "^ATGGG[GC]AAAA*") {for(j=0;j<4;++j) printf("%s\n",a[j]);}; i=0;}}' |\
awk -F ' ' '{print $1 >> "select_1.fq"; print $2 >> "select_2.fq";}'
Well, MEME might not be the right choice for this, but FIMO could be used... It allows motif discovery based on PWM .
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
And what does this pattern look like? How specific - like no mismatches/indels?