Reporting Gapped Alignments With Bwa
2
2
Entering edit mode
13.7 years ago
Gail ▴ 20

Dear all,

in an effort to map RNA-seq reads directly onto the genome I set the gap opening and gap extension parameters of bwa both to 0 (-O0 and -E0) in order to allow for exon junctions. However, bwa still reports not only perfect matches but also distance 1 matches rather than gapped alignments. Can this behaviour be changes somehow? I want gapped and ungapped alignments (in case both are without mismatches) to be treated equal.

Any help would be greatly appreciated!

Gail

next-gen sequencing rna bwa alignment • 5.5k views
ADD COMMENT
0
Entering edit mode

Have you tried setting gap opening non-zero and gap extension zero? If both are 0, this might lead to unpredictable results.

ADD REPLY
0
Entering edit mode

Dear Michael,

Thank you for the reply! I tried now: ./bwa aln -e1000 -O1 -E0 together with: ./bwa samse -n1000 but still don't get the gapped alignments!?

ADD REPLY
0
Entering edit mode

Can you provide an example of a few reads and the reference sequence, so one could test it a bit, please? Find some reads that should give gapped alingments make an example file available.

ADD REPLY
0
Entering edit mode

Hello Michael,

these are the test genome sequences:

GENOME_ori1 AACCTTGGAAAAAACCTTGGCCCCAACCTTGGTTTTAACCTTGGGGGGG GENOME_ori2 tAACCTTGGAAAAAACCTTGGCCCCAACCTTGGTTTTAACCTTGGGGGGG GENOME_ori3 ttAACCTTGGAAAAAACCTTGGCCCCAACCTTGGTTTTAACCTTGGGGGGG GENOME_insert AACCTTGGAAAAAAaaaCCTTGGCCCCAACCTTGGTTTTAACCTTGGGGGGG GENOME_1a AACCTTGGAAAAAACCTTGGCCCCAACCTTGGTTTTAACCTTGGGGGGa

And this is the test read:

Ori AACCTTGGAAAAAACCTTGGCCCCAACCTTGGTTTTAACCTTGGGGGGG

So far bwa always reports matches to GENOME_ori1-3 and GENOME_1a but not GENOME_insert. Would be great if you could figure out why that is!

ADD REPLY
1
Entering edit mode
13.7 years ago
Michael 55k

Could it be that you have to set other options?

From the documentation of bwa:

OPTIONS:
[...]
-o INT  Maximum number of gap opens [1]
-e INT  Maximum number of gap extensions, -1 for k-difference mode (disallowing long gaps) [-1]

If you look at the defaults for these, it seems like only one short gap would be allowed, so try with increasinf -o and -e. Otherwise I support Ketil's answer.

ADD COMMENT
0
Entering edit mode

I just saw, your comment that you use -e1000 already, but still that allows for only one gap.

ADD REPLY
0
Entering edit mode

Hello Michael,

Many thanks for your help! But it doesn't even report/find the ones which are supposed to have only one gap :)

Best wishes!

ADD REPLY
0
Entering edit mode
13.7 years ago
Ketil 4.1k

I think the Augustus genome annotation pipeline suggests using BLAT to map RNAseq data exactly for this reason, BLAT is fairly good at mapping RNA to DNA. This will of course require you to rework the rest of your analysis, but converting BLAT output (PSL format) to SAM shouldn't be very hard.

If your data are too large to handle comfortably with BLAT, perhaps you can first map with BWA, and then remap just the reads that don't map in their (almost) entirety using BLAT.

ADD COMMENT
0
Entering edit mode

Dear Ketil,

many thanks! sound like a great idea!

ADD REPLY
0
Entering edit mode

samtools has a tool samtools/misc/psl2sam.pl for this

ADD REPLY

Login before adding your answer.

Traffic: 3039 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6