Hello everyone,
I think, I am missing something about how mapping tools work. I am mapping reads using BWA.
My genome file (reference file in fasta) -
>chr1 AACCCCCCCCTCCCCCCGCTTCTGGCCACAGCACTTAAACACATCTCTGC CAAACCCCAAAAACAAAGAACCCTAACACCAGCCTAACCAGATTTCAAAT TTTATCTTTAGGCGGTATGCACTTTTAACAAAAAANNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN GCCCATCCTACCCAGCACACACACACCGCTGCTAACCCCATACCCCGAAC CAACCAAACCCCAAAGACACCCCCCACAGTTTATGTAGCTTACCTCNNNN >chrMm GATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCAT TTGGTATTTTCGTCTGGGGGGTGTGCACGCGATAGCATTGCGAGACGCTG GAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTGCCTCATT CTATTATTTATCGCACCTACGTTCAATATTACAGGCGAACATACCTACTA AAGTGTGTTAATTAATTAATGCTTGTAGGACATAATAATAACAATTGAAT GTCTGCACAGCCGCTTTCCACACAGACATCATAACAAAANAATTTCCACCMy reads are in fasta format -
>Read#BetweenN's TTTTAACAAAAAAGCCCATCCTACCCAGC >VIK CAACCAAACCCCAAAGACACCCCCCACAG >VIK#Inverse GACACCCCCCACAGAAACCCCAAACCAAC >VIK#COMPLIMENTARY GTTGGTTTGGGGTTTCTGTGGGGGGTGTC >VIK#Inverse_of_COMPLIMENTARY CTGTGGGGGGTGTCTTTGGGGTTTGGTTG
Commands for BWA -
bwa index -a is genome bwa aln genome reads > out.sai bwa samse genome out.sai reads > out.sam
The output from BWA is (sam file) -
@SQ SN:chr1 LN:300 @SQ SN:chrMm LN:300 Read#BetweenN's 4 * 0 0 * * 0 0 TTTTAACAAAAAAGCCCATCCTACCCAGC * VIK 0 chr1 251 37 29M * 0 0 CAACCAAACCCCAAAGACACCCCCCACAG * XT:A:U NM:i:0 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:29 VIK#Inverse 4 * 0 0 * * 0 0 GACACCCCCCACAGAAACCCCAAACCAAC * VIK#COMPLIMENTARY 4 * 0 0 * * 0 0 GTTGGTTTGGGGTTTCTGTGGGGGGTGTC * VIK#Inverse_of_COMPLIMENTARY 16 chr1 251 37 29M * 0 0 CAACCAAACCCCAAAGACACCCCCCACAG * XT:A:U NM:i:0 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:29
So my first read did not mapped because half of it lies on left side of N's and half on right side of N's on chr1 (so no problem with this read). Second read mapped to forward strand (good).Fifth read (inverse of complimentary of second read) mapped to same position but on reverse strand (very good).
Question- Third read(VIK#Inverse) is inverse of second read. So if you will read the reference genome from right hand side at last line of chr1, it is the same sequence but it did not get mapped. So, is there any option with which I can map my read like this? If not, then why it is not a good idea to map reads like this? (same goes for the fourth read but according to reverse strand)
You ask why it's not a good idea to map such reads -- it's because they're just not the same "thing". Can I put the shoe on the other foot and ask why you think it is a good idea for the VIK#Inverse read to be mapped here? What would it mean to consider, say, "GATACA" to be the same sequence as "ACATAG"?
I am not saying to map these reads always, I asked if there is any option with which I can map reads like this. In some cases it will be useful - Consider I am looking for SNP's (not looking for inversion) and there is inversion and SNP (within the read) at this particular position, at least if I will map this read at that position, I can find SNP rather than just throwing it away as unmapped read.