Entering edit mode
8.5 years ago
MAPK
★
2.1k
I was reading this paper by Powell et al ( http://www.nature.com/nrg/journal/v11/n11/full/nrg2865.html ). They have described the instances where IBD values are calculated to be negative because: "The use of the current generation as the base causes some of the relationships to be negative and so they cannot be interpreted as probabilities but they can be interpreted as the correlation of homologous alleles in different gametes." Can someone please explain me what they mean by that statement in simple terms.
I can't read beyond the abstract of that paper, and i'm not an expert in IBD, but i'm happy to take a shot at your question until a better answer arrives.
To calculate the probability of some identical segment in two individual's genomes as being IBD, you need to know the pedigree (how the two individuals are related) and the DNA sequence of the shared segment. The DNA of the shared segment in both individuals obviously has to be identical (or at least very similar - there is an allowance for de novo mutations depending on how far back the common ancestor is). We also need the sequence to know how big this block of identical DNA is - where does it start/stop being IBS.
The reason is, a "proper" IBD calculation would go something like: "given the size of this (more or less) genotypically identical block shared between these two individuals, and given the distance since their last common ancestor, the probability of this block being IBD and not just IBS is X". This probability would go from 0 to 1. The size of the block is important because the further away the two individuals' common ancestor is, the smaller an IBS block can be and still potentially be IBD. However, if the individuals have a very recent common ancestor, due to the limited number of crossing-over events that could have taken place, only larger blocks are likely to be IBD. Smaller blocks that are IBS are likely to be de novo variant coincidences (or IBD from very far back common ancestors, which isn't interesting).
So that's the theory, but in practice that isn't very useful since we often don't know the pedigree, and we might not even know the exact genotype of both individuals for all bases of their DNA - we just have a few SNPs called from a genotyping chip.
However, if you rearrange the IBD formula you can use it to fill in one or both of these gaps given some assumptions, with some statistical certainty/probability. For example "I see a block of 100 SNPs shared between these two individuals. If this region is not just IBS for these 100 SNPs but IBD, then all the DNA in between the SNPs that i didn't sequence are going to be the same too (which is interesting and has practical utility) - thus, what is the probability of this region being IBD, given the size of the block and the assumption that the two individuals are somewhat related by X generations?"
The X generations bit here is important though. Changing X here will change your probabilities, as mentioned earlier due to the number of crossing-over events permitted. It's at this point I don't really know how IBD is calculated, since my University only taught me the theory and nothing that had any practical utility ;)
But based on your quote, I would imagine they used some calculation where instead of assuming X generations separate the individuals, they work backwards from the current generation until they find a value of X that could lead to IBD, and then decide if that is plausible. Perhaps calculating IBD in this way leads to negative values which "cannot be interpreted as probabilities but they can be interpreted as the correlation of homologous alleles in different gametes". This is just speculation on my part, since i don't really know for sure, and the last time anyone mention IBD to me it was in relation to Irritable Bowl Disease.
Whispers Access to the paper is possible through sci-hub.
This has changed my life.
Almost every paper is available, and often easier than using the VPN provided by your university... Forget about paywalls. Probably not entirely legal.
I don't know - i'm ethically conflicted. In many countries using scihub would be legal. Let's save this conversation for another day...
Agreed, although it's already a considerable public discussion. It's maybe not the time and topic to discuss this, but I'm open to elaborate on my opinion. (but I'm in holiday the following weeks without desire to access the internet)