Post the commands you are running. And what are you trying to accomplish? What makes sense to me is assemble reads into contigs, then map primers into contigs.
bwa aln index input1p.fastq > 1p.sai; bwa aln index input2p.fastq > 2p.sai; bwa sampe -n 1000 -N 1000 index 1p.sai 2p.sai input1p.fastq input2p.fastq >1.sam;
Above is the code I used, those are just basic bwa code. After that I used a java script to convert sam file into readable file. What I was trying to do was to check if some of our reads matched to those primers. I don't know if bwa can deal with long contigs. Our reads length has already been much longer than primer length.
ADD REPLY
• link
updated 4.9 years ago by
Ram
44k
•
written 8.9 years ago by
Crystal
▴
70
You should map the primers into your reads, not the opposite. If you have only six primers, you could try just to use grep to find the primers on the reads files.
Anyway, I would assemble the reads and then map the primers into the contigs, not against the reads. I have used bowtie for this, but bwa should work as well - but I think bwa mem is better (certainly simpler) than bwa sampe.
BWA is a software package for mapping reads against a large reference genome. BWA is fast but not as sensitive as other programs, ex., blast or needle. To achieve great speed, BWA requires the reads to show low-divergence from the reference (see program description here) and uses some heuristics (tricks) that assume that there is a long stretch of DNA that matches exactly between the ref and the reads. This is called a seed. Depending on the version of BWA you are using, this can be between 19 to 32 bases by default. In your case, you are most likely not getting any hits before your references are shorter than the seed length.
If you are looking for exact matches, then you could use grep, as suggested by @h.mon. This would require putting the sequence and the id on the same line, so that you can retrieve the hits.
If you are looking for exact matches, then you could use grep, as suggested by @h.mon. This would require putting the sequence and the id on the same line, so that you can retrieve the hits.
No, just use the 'B' (before) and 'A' (after) flags to return the lines that precede and follow the match.
@harold.smith.tarheel: -B (or -A) would be ideal if the sequence if formatted as one line. If the sequences are formatted on 2 lines (60 or 80 bases per line, as is often the case with Fasta files), -B 2 (or -A 2) would return DNA sequence from the previous record.
The code from the OP shows that the input is FASTQ (not FASTA), which specifies a four-line format (identifier, sequence, optional, quality) for each sequence/read. So 'grep -B 1 -A 2' would return the entire entry for the matching sequence.
Note, this assumes that the 15bp primer string is not present in the other lines (unlikely but not impossible), so it might be preferable to add a secondary criterion, such as the initial character(s) of the identifier line (which is usually identical for every read).
Edit: Actually, Sanger PHRED scores do not include 'T', so secondary filtering is unnecessary. The primer string would match only the sequence line (unless the quality scores are old Illumina PHRED+64, or the primer contains only A/C/G).
Post the commands you are running. And what are you trying to accomplish? What makes sense to me is assemble reads into contigs, then map primers into contigs.
Above is the code I used, those are just basic bwa code. After that I used a java script to convert sam file into readable file. What I was trying to do was to check if some of our reads matched to those primers. I don't know if bwa can deal with long contigs. Our reads length has already been much longer than primer length.