My apologies if this has been asked before or is there is a section regarding such issues already existing, if so, would appreciate if I could be directed there.
Briefly, I am trying to come up with a strategy of selecting samples for an exome sequencing experiment on a complex, infectious disease. I currently have family samples that were previously used in another linkage study for the disease. The sample set consists of some families with 5 affected members, 4 affected members and some with 3 affected members and 2 affected members.
I would like to use samples from the families with 5 and 4 affected members for an exome sequencing run as my first screening step to look for variants. One thing I've read and also have been advised to do is to select the proband and one other distantly related affected family member for sequencing. I've tried to find out why that is the case but I am finding it difficult to understand why that would be a good method? If there are any materials that I can read or look through that might help me understand why this is the case I would greatly appreciate knowing.
Another issue that I have is that my samples do not have variant frequency information in any public database (e.g. 1000 genomes, HapMap). Would it be possible to screen and filter for common variants using another genetically closely related population instead?
Thanks a lot for any advice or help!
Building on this, depending on your pedigrees, more is also better. Unaffected people are also helpful, particularly where you may have an issue with population background for screening purposes. This is all assuming of course that you are looking for variants in the patients own genomes that somehow tie into something to do with the infection.
What population are your samples from where you have issues with background variant frequency determination? If there isn't a perfect match, use of a reasonable alternative population is generally considered ok but depending on what you find you may need to also do some sort of screening of specific variants in a good control group.
Yea, I got tripped up in where the infection comes in to this study. At first I thought they were sequencing the virus/bacteria that causes the infection, but when they brought up HapMap and 1000 Genomes, I assume the study is on human subjects. Maybe on immunity or susceptibility to the infection.
Thanks so much for the explanations and advice Katie and Dan, really appreciate your help!
To answer Dan's question on which population we are looking at, it's a Thai population and we are thinking of looking at the CHS frequency data from 1000 Genomes for screening.