I have a question about the matched normal sample vs virtual normal sample. By definition, a matched normal (MN) is a sample of healthy tissue of the same individual, in order to distinguish germline mutations from somatic mutations. On the other hand, samples from healthy, unrelated individuals serve as a virtual normal (VN) in the absence of associated normal sample.
We are planning to perform whole genome sequencing (WGS) of multiple tumor samples and virtual normal samples (1/3rd the number of tumor samples) with the goal of identification of somatic mutations. However, I see that most of the analysis pipelines (e.g. GATK Mutect) are designed for the analysis of tumor/normal pairs while there are few recent examples (Hiltemann et al., Teer et al.) which describe somatic mutation calling without matching normal (i.e. with virtual normal).
From bioinformatics point of view, Can you please provide recommendations for following:
- Is it always recommended to have a matched normal for each tumor? i.e. use the same number of tumor and normal samples for sequencing.
- In case of absence of matched normals, It may be best to create the panel of normals (PoN) using the virtual normals to determine the somatic mutations. Is this correct? How many normal samples are required/necessary for considering as PoN?
- In case of absence of matched normals, which other bioinformatics workflow do you recommend to accurately call the somatic mutations?
- Please suggest any other important considerations for absence of matched normal samples.
Thank you Chris Miller for the suggestions. I have related question about sample collection strategy.
For WGS, is it sufficient to collect samples from any somatic tissues? i.e. may be lymphoma tumor samples and normals from the skin/blood of the same patient rather than the tissue adjacent to tumor. Will this be the right "matched normal" sample?
For RNASeq, does normal sample need to be from exact same tissue type? i.e. RNAseq should be performed with the different tissue type (skin/blood) from same patient or same tissue type from different healthy individual.
Please share your thoughts.
1) The only real concern is that the normal should be as free from tumor contamination as possible. Blood is a fine control for most solid tumors, but leukemias are trickier, as you often find tumor contamination in the skin. I think I remember that skin samples from lymphoma patients tend to be free of tumor content, but do a quick lit search to check.
2) For info on normal RNAseq controls, you'll want to consult previous questions like these: A: Why is normal blood used for matched tumor (instead of adjacent norm tissue)? The short answer is that normal RNAseq as controls is rare, because of a) many tissues don't have a way to access good normals (can't scoop out healthy brain) b) matching tissue type well is surprisingly hard