I have a bam alignment file with SOLiD mate-pair data (R3/F3). I want to reverse the second read so that it points towards the first read instead of in the same direction.
------>R3 ------>F3
to
------>R3 <------F3
What I want to know is if it is enough to identify the F3 reads and change the 0x10 flag which indicates SEQ being reverse complemented or would I need to reverse complement the SEQ in column 10, QUAL in column 11, and the CIGAR string to end up with a valid bam file? Anything that I might be missing in this operation?
You won't need to reverse complement the sequence and qual fields, those are always 5'->3' of the + strand. However, you will need to change the flags of the R3 read to add the 0x20 flag (in addition to changing the F3 flag).
Thanks, so would the following gawk script accomplish the reversal of the reads properly?
gawk 'BEGIN{OFS="\t"}{if (and($2, 0x40)){$2=xor($2, 0x20)} else if (and($2, 0x80)){$2=xor($2, 0x10)} print $0}'
Thanks, so would the following gawk script accomplish the reversal of the reads properly?
gawk 'BEGIN{OFS="\t"}{if (and($2, 0x40)){$2=xor($2, 0x20)} else if (and($2, 0x80)){$2=xor($2, 0x10)} print $0}'
I'm not overly familiar with gawk, but that at least looks correct.
you also have to check that both reads are on the same chromosome, the distance between the reads.
Why would the distance between the reads change?
it wont change, but if they're too far (e.g 10Mb) , they cannot be "properly paired"