Hi,
I was wondering what exactly is the difference between germline and somatic variant calling?
Why is a tumor-normal study considered to be somatic?
Can indels be picked up on germline or somatic samples?
Thanks
Hi,
I was wondering what exactly is the difference between germline and somatic variant calling?
Why is a tumor-normal study considered to be somatic?
Can indels be picked up on germline or somatic samples?
Thanks
I was wondering what exactly is the difference between germline and somatic variant calling?
Germline variants are either diploid/biallelic, so expected alternative allele frequency is 50% for a heterozygous position. Somatic variants depend on the tumor purity and are not present in all cells tested. As such variant allele frequencies can be much lower.
Why is a tumor-normal study considered to be somatic?
Because you are looking for differences between the tumor and the normal sample, and therefore variants which are not part of the germline but appeared somatically.
Can indels be picked up on germline or somatic samples?
Yes.
EDIT: My answer below is really naive and may contain inaccuracies/broad generalizations. Here is a better, more robust post from GATK: https://gatk.broadinstitute.org/hc/en-us/articles/360035890491?id=11127
I'm going to take a shot at answering this:
Somatic variants = variants seen in a somatic cell not seen in other somatic cells. Somatic cells are not inherited by offspring. Germline variants = variants seen in germline cells that are passed on to offspring
Somatic and germline, as you can see, are based on which cell the genetic material is extracted from. Indels are a type of mutation that can occur anywhere.
Tumor normal is considered somatic as cancer is usually an abnormality in the somatic DNA, corrupting one particular cluster of cells in your body, rendering them with a different genotype than the rest of your body.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
In simpler terms, germline variants are variants that are inherited
byfrom the parents via the germ cells, so sperm and oocytes, means the variant has already been present in the genome of at least one of the the parents. Somatic variants arise de novo in the genome of the respective individual. Example: A variant that occurs in a stem cell will be found in all offspring cells that derive from that stem cells, but not in all the other cells of the organism. In order to distunguish germline from somatic, one sequences the tumor sample and a matched-normal. E.g. in case of lung cancer, one takes the tumor biopsy from the lung, and a matched-normal from the blood. Even though germline variants (risk factor variants) can contribute to pathogenesis, somatic variants are typically more involved a diseases, that is why they are of special interest.Without a matched-normal control, one could not distinguish between somatic and germline, because every genome contains tens of thousands of mutations towards the reference genome, so a matched-normal from the same donor is necessary.
Indels are simply additional of missing nucleotides in comparison to a reference, therefore they can be found in both germline and somatic.
You should change "inherited by the parents" to "inherited from the parents".
As a non-native speaker, where is the difference?
Inherited by parents would indicate inheritance from grand-parents, (which is probably true if these mutations are being transmitted through multiple generations).
Aside from the conceptual answers on what is somatic and germline, I would very much like to know how a software can tell apart germline and somatic. I know that tools are specific for either somatic or germline variant calling, but other than that what are the basic assumptions that these tools rely on?
Germline: compare child sequence to parents' sequences, infer which allele the child inherited from each parent. Possible alternatives: de novo mutation, mendelian abnormality
Somatic: compare tumor sequence (on sequence from any one cell) from individual to sequence from normal tissue (or any other cell type) from the same individual. Exclude all variants seen in the germline. Those are somatic variants specific to the first (tumor) cell.
Tools can't tell things apart, tools are just software and software is dumb. Experiments need to be designed so the results put in context make sense.
Thanks, RamRS. I do understand that 'software is dumb' but there must be some underlying assumption to call either somatic or germline variants other than experiment design. Otherwise, we wouldn't need distinct software (I.e., HaplotypeCaller for germline and Mutect2 for somatic, both from GATK). Plus it is not always the case that variant calling will be your end goal and so your study design might not be 'gold-standard designed' to perform variant calling. I would appreciate other inputs with reasons other than the experimental design to tell apart somatic and germline variant calling.
Wouter's answer pretty much addresses your question with its take on ploidy and tumor purity. See Mutect2's page for a similar description.
Honestly, most cancer genomic studies involve matched normal samples, so I don't see why a study would need to be "gold standard designed for variant calling" to contain matched normals.
I agree with you. I was referring to germline variant call, should've made that clear. My understanding is that you'd need child-parent comparison if you want to find new variants or in a clinical setup. I just have some RNA-seq samples and I want to compare their SNPs. Bottom line, the basic assumption would be the frequency of the variant, right?
You'd need parents (and ideally a normal non-phenotypic sibling) for germline experiments, yes. That would enable discovery of transmitted and de novo variants, along with refined attribution of phenotypes (although I don't know tools that take normal siblings into account).
I'm not conversant with working on variants found in RNA-seq, so someone else will have to give you feedback on your statement on variant frequency.