Reverse Read Direction In A Bam File
2
I have a bam alignment file with SOLiD mate-pair data (R3/F3). I want to reverse the second read so that it points towards the first read instead of in the same direction.
------>R3 ------>F3
to
------>R3 <------F3
What I want to know is if it is enough to identify the F3 reads and change the 0x10 flag which indicates SEQ being reverse complemented or would I need to reverse complement the SEQ in column 10, QUAL in column 11, and the CIGAR string to end up with a valid bam file? Anything that I might be missing in this operation?
bam
• 7.3k views
You won't need to reverse complement the sequence and qual fields, those are always 5'->3' of the + strand. However, you will need to change the flags of the R3 read to add the 0x20 flag (in addition to changing the F3 flag).
OK I wrote a java program for this. I pushed it on github : https://github.com/lindenb/jvarkit#biostar76892
before :
samtools view src.bam | grep "HWI-1KL149:6:C0KTCACXX:5:2107:2283:35906"
HWI-1KL149:6:C0KTCACXX:5:2107:2283:35906 177 3 1264832 37 101M = 1264940 109 AGGTGGTGAAGCATGAGATGTAGGGAGAGCTGCTTTAAAACCCAGCACAAGGCTGGTTGTACTGGCTCACACCTGTAATCCCAGGTCTTTGGGAGGCTGAG """#""""#"""#"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""#""""""#""##"""""""#"#" X0:i:1 X1:i:0 BD:Z:ABBACCABBABAAABABAABAACBBABAA@AAABAABBAABB@AAAAAAABA@ABAA@BAA@@BAAAAAAAA@@BAAAABA@ABAAABAACBBACBAABAA MD:Z:0C0C0C1C0C0G1G0T1T0T0G0C0C0T0T0C0T0G0T0A0C0A2T2C0A0T0G0C1T0G0T0G2G0T0T0T0G1G0T1T1C0C0A0A0G0T0G0C0G0A1T0G1G0C0T1T0T2C0G0T0G0T0G1C1C0A1G1G0T1C0C1T0T2C0A0 RG:Z:idp63088 XG:i:0 BI:Z:ABBAEDCCCBCBBABABAAAA@CBBAB@A@@A@AAAAAAABA@@AAA@A@BA@@BAA@A@@@@BAA@A@@@A@@AAAAABA@@BAAABBACBBACBBABAA AM:i:37 NM:i:75 SM:i:37 XM:i:1 XO:i:0 MQ:i:37 XT:A:U
HWI-1KL149:6:C0KTCACXX:5:2107:2283:35906 113 3 1264940 37 101M = 1264832 -109 GGTATCTCCATGCTCGAAGCCCTGACCTACTGTATTGCCCCGAAAGTCTTCCCTGCTGTGGCTGCATCTTTTCCACGTGGATAATCTTGGTTCATCTCTAG """##"""""""""""""""""""""""""#"""""""""""""""""""""""""""#"""""""""""""""#""""""""""""#""#""##"#"##" X0:i:1 X1:i:0 BD:Z:BBAABBBBAAABBBCBAABCBA@BAAAAAAABAAAAACCCBABAAAAAAACBAAAAABABA@AA@AAABBAAAAACB@BBAAAAAAAABBBAABBBBAAAA MD:Z:0T0T1G0C0A1G0T1C0C0A1G0T0G0C0A0T0G0T0G0T0G2A0T4G0C0A0A0T0G0T0G0C1G0G0T0G1C0A0G0T0T0G0C0A4C1A0T0G0C0G0T0G2G0G1C0G0T0G0A1C0G0T0G1G0C2T2T0C0G0T0G0T0A0T1 RG:Z:idp63088 XG:i:0 BI:Z:BABADDCCBBBCBBCBAABCBA@AABAAA@AAAAA@BBBBBAAA@AA@AABA@@A@@A@BA@@A@AA@AAAAAAABB@BAAAAAAAA@CBAAABBBBAAAA AM:i:37 NM:i:74 SM:i:37 XM:i:0 XO:i:0 MQ:i:37 XT:A:U
Fixing:
java -jar dist/biostar76892.jar ONLYSAVEFIXED=true \
IN=src.bam \
OUT=fix.bam \
VALIDATION_STRINGENCY=LENIENT
result
samtools view fix.bam | grep "HWI-1KL149:6:C0KTCACXX:5:2107:2283:35906"
HWI-1KL149:6:C0KTCACXX:5:2107:2283:35906 163 3 1264832 37 101M = 1264940 109 AGGTGGTGAAGCATGAGATGTAGGGAGAGCTGCTTTAAAACCCAGCACAAGGCTGGTTGTACTGGCTCACACCTGTAATCCCAGGTCTTTGGGAGGCTGAG """#""""#"""#"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""#""""""#""##"""""""#"#" X0:i:1 X1:i:0 BD:Z:ABBACCABBABAAABABAABAACBBABAA@AAABAABBAABB@AAAAAAABA@ABAA@BAA@@BAAAAAAAA@@BAAAABA@ABAAABAACBBACBAABAA MD:Z:0C0C0C1C0C0G1G0T1T0T0G0C0C0T0T0C0T0G0T0A0C0A2T2C0A0T0G0C1T0G0T0G2G0T0T0T0G1G0T1T1C0C0A0A0G0T0G0C0G0A1T0G1G0C0T1T0T2C0G0T0G0T0G1C1C0A1G1G0T1C0C1T0T2C0A0 RG:Z:idp63088 XG:i:0 BI:Z:ABBAEDCCCBCBBABABAAAA@CBBAB@A@@A@AAAAAAABA@@AAA@A@BA@@BAA@A@@@@BAA@A@@@A@@AAAAABA@@BAAABBACBBACBBABAA AM:i:37 NM:i:75 SM:i:37 XM:i:1 XO:i:0 MQ:i:37 XT:A:U rv:i:1
HWI-1KL149:6:C0KTCACXX:5:2107:2283:35906 83 3 1264940 37 101M = 1264832 -109 GGTATCTCCATGCTCGAAGCCCTGACCTACTGTATTGCCCCGAAAGTCTTCCCTGCTGTGGCTGCATCTTTTCCACGTGGATAATCTTGGTTCATCTCTAG """##"""""""""""""""""""""""""#"""""""""""""""""""""""""""#"""""""""""""""#""""""""""""#""#""##"#"##" X0:i:1 X1:i:0 BD:Z:BBAABBBBAAABBBCBAABCBA@BAAAAAAABAAAAACCCBABAAAAAAACBAAAAABABA@AA@AAABBAAAAACB@BBAAAAAAAABBBAABBBBAAAA MD:Z:0T0T1G0C0A1G0T1C0C0A1G0T0G0C0A0T0G0T0G0T0G2A0T4G0C0A0A0T0G0T0G0C1G0G0T0G1C0A0G0T0T0G0C0A4C1A0T0G0C0G0T0G2G0G1C0G0T0G0A1C0G0T0G1G0C2T2T0C0G0T0G0T0A0T1 RG:Z:idp63088 XG:i:0 BI:Z:BABADDCCBBBCBBCBAABCBA@AABAAA@AAAAA@BBBBBAAA@AA@AABA@@A@@A@BA@@A@AA@AAAAAAABB@BAAAAAAAA@CBAAABBBBAAAA AM:i:37 NM:i:74 SM:i:37 XM:i:0 XO:i:0 MQ:i:37 XT:A:U rv:i:1
Login before adding your answer.
Traffic: 1666 users visited in the last hour
Thanks, so would the following gawk script accomplish the reversal of the reads properly?
gawk 'BEGIN{OFS="\t"}{if (and($2, 0x40)){$2=xor($2, 0x20)} else if (and($2, 0x80)){$2=xor($2, 0x10)} print $0}'
I'm not overly familiar with gawk, but that at least looks correct.
you also have to check that both reads are on the same chromosome, the distance between the reads.
Why would the distance between the reads change?
it wont change, but if they're too far (e.g 10Mb) , they cannot be "properly paired"