Hi all,
I need some help with grep or any other command that will help do the job. I am very new to the command line. Any help is appreciated, thank you.
I recently did some amplicon sequencing of a multiplexed PCR reaction. I used nearly 90 primer pairs to multiplex a PCR reaction to generate amplicons. Sequencing libraries of these amplicons were made and the libraries read on a MiSeq instrument. 4 such reactions, differing in some primer pairs were used for sequencing. I now have the fastq files. Now i want to see the representation of each primer product in the fastq file, do decide which primer pool I should proceed with for my actual experiments. The MiSeq run was single-end and so I want to look for the forward primer sequence in the resultant fastq files.
I have been using grep to get answers but i only know how to do it individually
grep -c ^AAAGTGTGTGGGGATGATATGG ./*.fastq
c
for count
^
to search for string at the beginning of the sequence
The results that I get from this is
./myfastq1.fastq:number
./myfastq2.fastq:number
./myfastq3.fastq:number
./myfastq4.fastq:number
Then I take the number and paste it in an excel file. I know- terrible!!!
I have been searching for help similar to what i need but with no positive outcome.
My request here is:
I have a tab delimited file forwardprimers.txt
with; (col1) primer name (col2) primer sequence, for 90 primers
I have 4 fastq files to query these primer sequences.
Is there a way to query the sequences in primer file with fastq file and get the counts for each primer name in a new output file. Thank you.
Use a
for
loop to go through yourforwardprimers.txt
file and then usebbduk.sh
from BBMap suite infilter mode
to simply get the stats for each run. You could provide output files if you actually want to parse out reads that contain that primer. A guide to usebbduk.sh
is available.Looks like there is openPrimeR (LINK) that also should help.