Hello Biostars,
I mapped my mate-pair data against inverted chromosome 9 sequence in an attempt to find a few large inversions (1 million base pairs+) that may have resulted from equally large deletions.
Steps I took:
1) Downloaded the human chr9.fa from UCSC Genome Browser.
2) reversed the sequence with a C program that reads the file backwards and writes into a new file.
3) I called the new file chr9inv.fa. I changed the beginning of the file from >chr9 to >chr9inv
4) I then used this file as a reference for BWA-MEM alignment on Galaxy servers.
Very few reads were aligned, and the ones that were were not properly aligned. I am not sure why.
Does BWA algorithm have issues when aligning with very large deletions in the genome? I may have done some error that I am not aware of. Any ideas?
Err, think about what you did biologically...
I drew it out biologically:
Sample Sequence
5' -A T C C G - 3'
3' -T A G G C - 5'
Would the inverted sequence be like -
(rotated 180 degrees)
5' - C G G A T - 3'
3' - G C C T A - 5'
or (flipped over)
5' - G C C T A - 3'
3' - C G G A T - 5'
The former.
I will adjust to reverse-complement the reads. Thank you!
what is the purpose of the step (2) ??
I was thinking that if my case had an inversion, then there will be a few reads that will map successfully against an inverted chromosome 9. What do you think?
BWA mem will align reads across inversions without you needing to do that (further, you're just swapping the strands and the alignment is against both of them anyway). Look for supplemental alignments, which would indicate possible points of inversion.
what Devon said. Furthermore, for a mate-pair (not paired-end), I think you'll have to reverse-complement one the two fastq files.