Align metagenomic shotgun sequencing reads to primers
2
0
Entering edit mode
8.9 years ago
Crystal ▴ 70

Hi All,

I'm trying to use BWA to align metagenomics reads (~100bp) to six primers (~15bp).

All the steps work fine but when I convert sam file to readable file, there was no results at all.

Is it appropriate to use BWA to align reads to super short primer sequences or is there other way I can do this?

Thanks

bwa metagenomics alignment • 3.7k views
ADD COMMENT
0
Entering edit mode

Post the commands you are running. And what are you trying to accomplish? What makes sense to me is assemble reads into contigs, then map primers into contigs.

ADD REPLY
0
Entering edit mode
​bwa aln index input1p.fastq > 1p.sai; bwa aln index input2p.fastq > 2p.sai; bwa sampe -n 1000 -N 1000 index 1p.sai 2p.sai input1p.fastq input2p.fastq >1.sam;

Above is the code I used, those are just basic bwa code. After that I used a java script to convert sam file into readable file. What I was trying to do was to check if some of our reads matched to those primers. I don't know if bwa can deal with long contigs. Our reads length has already been much longer than primer length.

ADD REPLY
0
Entering edit mode
8.9 years ago
h.mon 35k

You should map the primers into your reads, not the opposite. If you have only six primers, you could try just to use grep to find the primers on the reads files.

Anyway, I would assemble the reads and then map the primers into the contigs, not against the reads. I have used bowtie for this, but bwa should work as well - but I think bwa mem is better (certainly simpler) than bwa sampe.

ADD COMMENT
0
Entering edit mode
8.9 years ago

BWA is a software package for mapping reads against a large reference genome. BWA is fast but not as sensitive as other programs, ex., blast or needle. To achieve great speed, BWA requires the reads to show low-divergence from the reference (see program description here) and uses some heuristics (tricks) that assume that there is a long stretch of DNA that matches exactly between the ref and the reads. This is called a seed. Depending on the version of BWA you are using, this can be between 19 to 32 bases by default. In your case, you are most likely not getting any hits before your references are shorter than the seed length.

If you are looking for exact matches, then you could use grep, as suggested by @h.mon. This would require putting the sequence and the id on the same line, so that you can retrieve the hits.

ADD COMMENT
0
Entering edit mode

If you are looking for exact matches, then you could use grep, as suggested by @h.mon. This would require putting the sequence and the id on the same line, so that you can retrieve the hits.

No, just use the 'B' (before) and 'A' (after) flags to return the lines that precede and follow the match.

ADD REPLY
0
Entering edit mode

@harold.smith.tarheel: -B (or -A) would be ideal if the sequence if formatted as one line. If the sequences are formatted on 2 lines (60 or 80 bases per line, as is often the case with Fasta files), -B 2 (or -A 2) would return DNA sequence from the previous record.

ADD REPLY
0
Entering edit mode

The code from the OP shows that the input is FASTQ (not FASTA), which specifies a four-line format (identifier, sequence, optional, quality) for each sequence/read. So 'grep -B 1 -A 2' would return the entire entry for the matching sequence.

Note, this assumes that the 15bp primer string is not present in the other lines (unlikely but not impossible), so it might be preferable to add a secondary criterion, such as the initial character(s) of the identifier line (which is usually identical for every read).

Edit: Actually, Sanger PHRED scores do not include 'T', so secondary filtering is unnecessary. The primer string would match only the sequence line (unless the quality scores are old Illumina PHRED+64, or the primer contains only A/C/G).

ADD REPLY
0
Entering edit mode

Well, thank you so much for your help. So now I just need to use command like "grep -B 1 -A 2 'primer sequence' file.fastq"?

ADD REPLY

Login before adding your answer.

Traffic: 2761 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6