strand handling of of reference sequence for index
0
0
Entering edit mode
7.4 years ago
yifangt86 ▴ 60

Following my other post, I was wondering how do the short reads mappers (bwa & bowtie2) handle the "-" strand of the reference at the step to build index. As short reads are always in 5'-->3' direction for either strand, but from the mapping result SAM file there are reads mapped to both strands.

1) I assumed the "-" strand of the reference is concatenated to the "+" so that the final length/index is simply doubled;

2) Or, it is needed to reverse complemented each read at mapping step so that mapping was done twice;

Mapper STAR uses the first approach as the author told me, but I could not figure this out from the source codes of both bwa or bowtie2. Can somebody confirm this for me? Thanks.

strand handling reference index short read mapping • 1.5k views
ADD COMMENT
1
Entering edit mode

I think bwa and bowtie index both the forward and reverse reference, and map once.

BBMap reverse-complements the read and does mapping twice.

ADD REPLY
0
Entering edit mode

Thanks Brian!

Is it possible for you to elaborate the indexing procedure for both the forward and reverse strand, technically? What I can guess is the "-" strand is concatenated to the "+" strand to have a single string. Then, how the offsets for each chromosome are distinguished from each other (2 strands and multiple chromosomes)? I saw the outputs of bwa index step for mouse genome, there are

Mouse_genome.fa.amb
Mouse_genome.fa.ann
Mouse_genome.fa.bwt
Mouse_genome.fa.pac
Mouse_genome.fa.rbwt
Mouse_genome.fa.rpac
Mouse_genome.fa.rsa
Mouse_genome.fa.sa

Are those files with "r" (*.rbwt, *rpac, *.rsa) for the reverse strand? However, if I index the small genome as lambda_virus, I did not see the pattern.

lambda_virus_bwa.amb
lambda_virus_bwa.ann
lambda_virus_bwa.bwt
lambda_virus_bwa.pac
lambda_virus_bwa.sa

Does this mean if the genome is big (Mouse) then there will be separate index for the "-" strand of the genome?

From bowtie2-build, I have:

lambda_virus.1.bt2
lambda_virus.2.bt2
lambda_virus.3.bt2
lambda_virus.4.bt2
lambda_virus.rev.1.bt2
lambda_virus.rev.2.bt2

Are the .rev.1.bt2, *.rev.2.bt2 for the reverse strand? But, how came there is no .rev.3.bt2 or *.rev.4.bt2?

Thanks a lot!

ADD REPLY
0
Entering edit mode

I have not looked at the code, so my assumption of how they operate was indeed based on the file extension generated when indexing :) I'm not sure why sometimes the "reverse" ones are not present.

ADD REPLY

Login before adding your answer.

Traffic: 1892 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6