Hi,
Here is my question :
I have several files in .bam format resulting from the alignment of reads of different individuals against a ref genome. I know that in a particular region, there is a large deletion, that can either be homozygous or heterozygous (some individual have one chromosome with the big deletion, one with the intact region).
Homozygous deleted individuals are easily detected as there are 0 reads in this region.
I would like to detect heterozygous individuals using the mean coverage in this region. The idea is that, for an individual who is heterozygous for the deletion, there will be a significant decrease in reads coverage in this region compare to the rest of the chromosome.
To do so i used the samtools depth command. I first computed the coverage over the entire chromosome :
samtools depth -r Chromosome1 Bam_files/Ind6_vs_Genome_sorted.bam | awk '{sum+=$3} END { print "Average = ",sum/NR}'
Average = 12.1426
Then i computed the coverage on the region of interest :
samtools depth -r Chromosome1:5566-60000 Bam_files/Ind6_vs_Genome_sorted.bam | awk '{sum+=$3} END { print "Average = ",sum/NR}'
Average = 5.39289
It seems like there is indeed a decrease of coverage in this region but how can i conclude that this is significant ?
I hope this is clear,
Thanks,
Maxime
Here are two solutions i propose, but i have no idea if it is correct :
1 - First, compare the two coverages (whole chromosome and region) between an individual we know does not have the deletion (homozygous non deleted) and an individual we are investigating with a Chi2 and if it is significantly different, then we can conclude this individual is heterozygous for the deletion.
2- If coverage of the region is comprised between 40-60% of the whole chromosome coverage, conclude that the individual is heterozygous
Hi maxime.policarpo ,
I've moved this post to a comment as it does not actually answers your original question. This way we can keep the forum structured and organised