Hi Everyone,
I am trying to solve a problem in my analysis and I want to make sure I understand the sequencing procedure that is happening for NGS.
Imagine we want to do a whole exome sequencing, the steps would be:
- Take the DNA
- Fragment the DNA
- Add the adapters to the end of fragments
- Use a cDNA based filter to only keep the coding region fragments
- Sequence ...
So now here is my question, how do we know the cDNA sequence before the sequencing? Is it based on GRCh19/38?
For highly polymorphic regions, such as HLA or KIR regions, is the cDNA sequence based on reference genome? (and therefor it is possible some reads are not captured?)
Thanks,
Thanks to ATpoint response, I went ahead and read the reference for a sample exome capture kit. It was mentioned there and I quote:
So I guess this type of information should be available in the reference of each capture kit people use for their experiment, in case someone in the future is looking for the sequence assembly used for their capture kit.
Yes the coordinates of all the exome baits is available on their website.