Question

Mapping Inverse Reads On The Same Strand

0

Entering edit mode

12.8 years ago

Vikas Bansal ★ 2.4k

Hello everyone,

I think, I am missing something about how mapping tools work. I am mapping reads using BWA.

My genome file (reference file in fasta) -

>chr1
AACCCCCCCCTCCCCCCGCTTCTGGCCACAGCACTTAAACACATCTCTGC
CAAACCCCAAAAACAAAGAACCCTAACACCAGCCTAACCAGATTTCAAAT
TTTATCTTTAGGCGGTATGCACTTTTAACAAAAAANNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
GCCCATCCTACCCAGCACACACACACCGCTGCTAACCCCATACCCCGAAC
CAACCAAACCCCAAAGACACCCCCCACAGTTTATGTAGCTTACCTCNNNN

>chrMm
GATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCAT
TTGGTATTTTCGTCTGGGGGGTGTGCACGCGATAGCATTGCGAGACGCTG
GAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTGCCTCATT
CTATTATTTATCGCACCTACGTTCAATATTACAGGCGAACATACCTACTA
AAGTGTGTTAATTAATTAATGCTTGTAGGACATAATAATAACAATTGAAT
GTCTGCACAGCCGCTTTCCACACAGACATCATAACAAAANAATTTCCACC

My reads are in fasta format -

>Read#BetweenN's
TTTTAACAAAAAAGCCCATCCTACCCAGC
>VIK
CAACCAAACCCCAAAGACACCCCCCACAG
>VIK#Inverse
GACACCCCCCACAGAAACCCCAAACCAAC
>VIK#COMPLIMENTARY
GTTGGTTTGGGGTTTCTGTGGGGGGTGTC
>VIK#Inverse_of_COMPLIMENTARY
CTGTGGGGGGTGTCTTTGGGGTTTGGTTG

Commands for BWA -

bwa index -a is genome
bwa aln  genome reads > out.sai
bwa samse genome out.sai reads > out.sam

The output from BWA is (sam file) -

@SQ                           SN:chr1   LN:300
@SQ                           SN:chrMm  LN:300
Read#BetweenN's               4         *       0    0   *    *  0  0  TTTTAACAAAAAAGCCCATCCTACCCAGC  *
VIK                           0         chr1    251  37  29M  *  0  0  CAACCAAACCCCAAAGACACCCCCCACAG  *  XT:A:U  NM:i:0  X0:i:1  X1:i:0  XM:i:0  XO:i:0  XG:i:0  MD:Z:29
VIK#Inverse                   4         *       0    0   *    *  0  0  GACACCCCCCACAGAAACCCCAAACCAAC  *
VIK#COMPLIMENTARY             4         *       0    0   *    *  0  0  GTTGGTTTGGGGTTTCTGTGGGGGGTGTC  *
VIK#Inverse_of_COMPLIMENTARY  16        chr1    251  37  29M  *  0  0  CAACCAAACCCCAAAGACACCCCCCACAG  *  XT:A:U  NM:i:0  X0:i:1  X1:i:0  XM:i:0  XO:i:0  XG:i:0  MD:Z:29

So my first read did not mapped because half of it lies on left side of N's and half on right side of N's on chr1 (so no problem with this read). Second read mapped to forward strand (good).Fifth read (inverse of complimentary of second read) mapped to same position but on reverse strand (very good).

Question- Third read(VIK#Inverse) is inverse of second read. So if you will read the reference genome from right hand side at last line of chr1, it is the same sequence but it did not get mapped. So, is there any option with which I can map my read like this? If not, then why it is not a good idea to map reads like this? (same goes for the fourth read but according to reverse strand)

mapping bwa next-gen sequencing read • 3.5k views

ADD COMMENT • link updated 12.8 years ago by lh3 33k • written 12.8 years ago by Vikas Bansal ★ 2.4k

1

Entering edit mode

You ask why it's not a good idea to map such reads -- it's because they're just not the same "thing". Can I put the shoe on the other foot and ask why you think it is a good idea for the VIK#Inverse read to be mapped here? What would it mean to consider, say, "GATACA" to be the same sequence as "ACATAG"?

ADD REPLY • link 12.8 years ago by Steve Lianoglou 5.2k

0

Entering edit mode

I am not saying to map these reads always, I asked if there is any option with which I can map reads like this. In some cases it will be useful - Consider I am looking for SNP's (not looking for inversion) and there is inversion and SNP (within the read) at this particular position, at least if I will map this read at that position, I can find SNP rather than just throwing it away as unmapped read.

ADD REPLY • link 12.8 years ago by Vikas Bansal ★ 2.4k

score 0 · Answer 1 · 2012-03-30

0

Entering edit mode

12.8 years ago

lh3 33k

How to present read sequences is mandated by the SAM spec. There is no way to output otherwise. As to why, because for most applications, working with sequences on the same strand as the reference is easier. Surely having reads on their original strands is sometimes more convenient, but this happens less frequently. When SAM was in design, a colleague of mine argued reads should be put on their original strand, but he later told me that the SAM way is convenient for most of his works.

ADD COMMENT • link 12.8 years ago by lh3 33k

0

Entering edit mode

Thanks for your reply. But I was asking about the third and fourth reads which were not mapped. I will edit my question little bit.

ADD REPLY • link 12.8 years ago by Vikas Bansal ★ 2.4k