How to conclude in a significant decrease of coverage depth
1
0
Entering edit mode
5.9 years ago

Hi,

Here is my question :

I have several files in .bam format resulting from the alignment of reads of different individuals against a ref genome. I know that in a particular region, there is a large deletion, that can either be homozygous or heterozygous (some individual have one chromosome with the big deletion, one with the intact region).

Homozygous deleted individuals are easily detected as there are 0 reads in this region.

I would like to detect heterozygous individuals using the mean coverage in this region. The idea is that, for an individual who is heterozygous for the deletion, there will be a significant decrease in reads coverage in this region compare to the rest of the chromosome.

To do so i used the samtools depth command. I first computed the coverage over the entire chromosome :

samtools depth -r Chromosome1 Bam_files/Ind6_vs_Genome_sorted.bam | awk '{sum+=$3} END { print "Average = ",sum/NR}'

Average = 12.1426

Then i computed the coverage on the region of interest :

samtools depth -r Chromosome1:5566-60000 Bam_files/Ind6_vs_Genome_sorted.bam | awk '{sum+=$3} END { print "Average = ",sum/NR}'

Average = 5.39289

It seems like there is indeed a decrease of coverage in this region but how can i conclude that this is significant ?

I hope this is clear,

Thanks,

Maxime

alignment genome samtools next-gen coverage • 1.4k views
ADD COMMENT
0
Entering edit mode

Here are two solutions i propose, but i have no idea if it is correct :

1 - First, compare the two coverages (whole chromosome and region) between an individual we know does not have the deletion (homozygous non deleted) and an individual we are investigating with a Chi2 and if it is significantly different, then we can conclude this individual is heterozygous for the deletion.

2- If coverage of the region is comprised between 40-60% of the whole chromosome coverage, conclude that the individual is heterozygous

ADD REPLY
0
Entering edit mode

Hi maxime.policarpo ,

I've moved this post to a comment as it does not actually answers your original question. This way we can keep the forum structured and organised

ADD REPLY
0
Entering edit mode
5.9 years ago
Ahill ★ 2.0k

Comparing read depths across individuals might be affected by inter-sample differences in sequencing quality/yield. Tools for identifying structural variations like deletions from read depth are available - take a look at the list of software in Table 1 in this paper or this paper. Perhaps using one of those software tools to ID copy-number variations may be the easiest and most accurate way to verify the presence of deletions in your region. If that doesn't suit you could try to formulate your own statistical model to identify regions with significantly reduced read depths: for example by doing a randomization test using average read depths from randomly sampled segments of the same size as your target region (for more ideas see e.g. this thread and references therein).

ADD COMMENT
0
Entering edit mode

Thanks a lot for your suggestions !

Max

ADD REPLY

Login before adding your answer.

Traffic: 1659 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6