Entering edit mode
4.4 years ago
haasroni
•
0
Hi Biostars!
I am trying to manipulate a sam file, in the following way:
- Take reads and replace all Ts with Cs (I term these reads manipulated reads)
- Replace all Ts with Cs in the reference genome
- Align the reads (using bowtie if it matters) to create a SAM file
- Replace in the SAM file each manipulated read with the original read (as before the Ts to Cs conversion). Note that if the read was mapped to the reverse strand, I would retrieve the reverse complement sequence of the original read.
After that, I create a pileup file from the resulted SAM file using samtools.
But, I noticed that mismatches are reported in the pileup file only for reads mapped to the forward strand (sam file FLAG == 0) but not when the reads are mapped to the reverse strand (sam file FLAG == 16).
For example, In this case, (= if it is identical to the aligned reference base)
SRR11517744.1453528 16 NC_000005.9 52014181 255 99M * 0 0 CCCCCC=CCC==CC=CCC=CCC=CCC=CCCCCGC=CCC=CCC=CCC=CCC=C=CCC=C=C=C=C=C=CCC=C=C=CCC=CC==CCC=CCC=CCACCCCA E/EA/EE//E<A/<EEE//E/EE/E/AE/AE</A<//EEE<//////E/6///<//EAE///E//EEEE<AA/E//E6E/EE/E//EA/E/E6/E/6AA XA:i:0NM:i:72 MD:Z:0t0g0t0t0t0t1t0t0t2t0t1t0t0t1t0t0t1t0t0t1t0t0t0g0t0t0t1t0t0t1t0t0t1t0t0t1t0t0t1t1t0t0t1t1t1t1t1t1t0t0t1t1t1t0t0t1t0t2t0t0t1t0t0t1t0t0g0t0t0t0t0g0
The pileup file will show for the first read position, 52014181 no mismatch:
NC_000005.9 52014181 t 0 * *
And not as I exacted:
NC_000005.9 52014181 t 1 c E
Can anyone think why is that?
Thank you!!!!!
Don't you need to be turning all the A's to G's in the reverse reads?
Yes, you are right. I do. I didn't get into all details since I wanted to make it as simple as possible for you to read. Thank you!