Hi, I used GATK germline variant calling pipeline to call short variants on paired end fastq files. After got the final analysis ready vcf, applied some extra filters, I inspected bam files in IGV for those variants of interest and found some strange things for one sample. Two variants of interest in this sample can only be found on inversion reads.
In this first graph, the alternative allele G can only be found in RR and LL reads (blue color) in IGV. 13 out of 15 inversion reads have this G allele.
In the second graph, similarly, the alternative allele T can only be found in inversion reads. All the inversion reads have this T allele.
Further, I realized that all the inversion reads have same size shown in figure 3.
I wonder if these inversions are true inversions or they are artifacts (given all the same size) thus the variants only found on these reads are also not real.
do you mean reads mapped on the reverse strand ? it's a known problem : https://bmcgenomics.biomedcentral.com/articles/10.1186/1471-2164-13-666
Hi, thanks for the paper, I will read it. Not quite sure if these blue LL and RR reads are mapped on the reverse strand. This post from IGV mentioned that reads tagged with "LL" or "RR" imply inversions in sequenced DNA with respect to the reference. https://software.broadinstitute.org/software/igv/interpreting_pair_orientations Looks like these are associated with read orientation not strand?