in an effort to map RNA-seq reads directly onto the genome I set the gap opening and gap extension parameters of bwa both to 0 (-O0 and -E0) in order to allow for exon junctions. However, bwa still reports not only perfect matches but also distance 1 matches rather than gapped alignments. Can this behaviour be changes somehow? I want gapped and ungapped alignments (in case both are without mismatches) to be treated equal.
Can you provide an example of a few reads and the reference sequence, so one could test it a bit, please? Find some reads that should give gapped alingments make an example file available.
OPTIONS:
[...]
-o INT Maximum number of gap opens [1]
-e INT Maximum number of gap extensions, -1 for k-difference mode (disallowing long gaps)[-1]
If you look at the defaults for these, it seems like only one short gap would be allowed, so try with increasinf -o and -e. Otherwise I support Ketil's answer.
I think the Augustus genome annotation pipeline suggests using BLAT to map RNAseq data exactly for this reason, BLAT is fairly good at mapping RNA to DNA. This will of course require you to rework the rest of your analysis, but converting BLAT output (PSL format) to SAM shouldn't be very hard.
If your data are too large to handle comfortably with BLAT, perhaps you can first map with BWA, and then remap just the reads that don't map in their (almost) entirety using BLAT.
Have you tried setting gap opening non-zero and gap extension zero? If both are 0, this might lead to unpredictable results.
Dear Michael,
Thank you for the reply! I tried now: ./bwa aln -e1000 -O1 -E0 together with: ./bwa samse -n1000 but still don't get the gapped alignments!?
Can you provide an example of a few reads and the reference sequence, so one could test it a bit, please? Find some reads that should give gapped alingments make an example file available.
Hello Michael,
these are the test genome sequences:
And this is the test read:
So far bwa always reports matches to GENOME_ori1-3 and GENOME_1a but not GENOME_insert. Would be great if you could figure out why that is!