Question

Mapping Illumina reads using BWA

0

Entering edit mode

7.0 years ago

James ▴ 20

Hi, I am trying to get full gene sequences of Avian Influenza virus from Illumina reads. My sequencing guys do the Illumina run, use velvet to do a denovo assembly, blast the results to get the best reference, then use BWA to map reads to that reference, and then call a new consensus sequence.

My first question is The reference to use is often coding region, not full gene, because that is what is most often published. Is there something I can do with the data to extend the reads beyond the 3' and 5' ends of my reference that was mapped against?

My second question is The reference might have insertions or deletions compared to an isolate I am trying to sequence. Is there a way for BWA to recognise where my data is longer than the reference give that data? At the moment I think BWA just trims reads to fit the reference.

Thanks for your help James

sequencing Assembly mapping BWA BWA mem • 1.8k views

ADD COMMENT • link updated 7.0 years ago by h.mon 35k • written 7.0 years ago by James ▴ 20

score 2 · Accepted Answer · 2017-11-29

Your mapping approach will only work well if your blasted reference is really close to the strain you sequenced.

Is there something I can do with the data to extend the reads beyond the 3' and 5' ends of my reference that was mapped against?

Select a longer reference, or build the reference from your sequencing. However, a single assembly (like your Velvet assembly) will often lead to incomplete or fragmented virus genomes. To get more complete genomes, you can to perform two (or more) different assemblies and use some meta-assembler to get a final assembly. This is the approach used by MetaViC - although MetaViC does more than that.

Also, Tadpole (from BBTools) and Mapsembler can extend the ends of target sequences, you could use them to create a longer reference.

Is there a way for BWA to recognise where my data is longer than the reference give that data? At the moment I think BWA just trims reads to fit the reference.

If your "longer data" is not part of the reference, no, BWA can not recognize it, it is just a mapper. Use a longer / more complete reference.