Good morning everyone,
I hope someone can help me with this issue. In my lab, we performed a targeted sequencing with Illumina MiSeq. We sequenced one single gene (whole gene, both introns and exons) in 96 samples, belonging to two different experimental groups. After the alignment and the duplicates removal, I decided to visualize my bam files using the Golden Helix Genome Browser. I noticed a reduction of the coverage for a 200bp region in the only coding exon of my sequenced gene. This reduction occurs in all my samples but it is more evident in one group than the other (statistically determined using read depth values).
How should I interpret these data?
structural variance between alleles of the same gene?
If you have a reference set available then you can try
ExomeDepth
(LINK) package to see if you have a CNV. There are other packages available as well.Thanks GenoMax for your answer. To my knowledge, CNV calling tools usually work on whole genome/whole exome sequencing data. I have sequenced only one gene. Therefore, I believe that CNV calling would not work properly.
How did you normalize the data and assess significance? Could this be a general issue due to GC bias and "sequencability" of that stretch of DNA, did you check public sequencing data, e.g. GTEx or any other human samples whether this reduction is a general phenomenon for that region?
Edit: If this is not human then any other public (e.g. WGS or exome) samples from the same species.
Thanks ATpoint for your response. I extracted read depth values with "samtools depth" samtools depth manual. I calculated the read depth mean for the background coverage as well as for the 200bp region coverage for each sample. First I checked if the background coverage differed between samples of the two experimental groups (t test) and they were not significantly different. Then, I subtracted the average read depth of the 200bp region to the average background read depth, for each sample (normalization), to check how the depth of loss changed between our conditions. Finally, I performed the t test between the two experimental groups, using these differences, and I got a Pval = 0.03
I though this was the best way to evaluate the loss of coverage that I noticed with the visual inspection on Genome Browse
using PCR to validate the 200bp deletion if the quality of these reads is not very low.