Hi,
We are currently working on whole exome analysis of breast cancer samples using the GDC Bioinformatics Pipeline https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/DNA_Seq_Variant_Calling_Pipeline/
The tumor data .bam file was downloaded from GDC legacy archives. https://portal.gdc.cancer.gov/legacy-archive/files/9efa8d39-37e0-4236-9737-e14ddcfd93ff
The reference genome was downloaded from here https://gdc.cancer.gov/about-data/data-harmonization-and-generation/gdc-reference-files GRCh38.d1.vd1.fa.tar.gz
We were able to complete the Genome Alignment and Alignment Co-Cleaning, next wanted to do the variant calling and copy number variation.
In order to perform variant calling and copy number variation do we need normal exome data (or not), is there any publicly available normal exome data for using it for variant calling and copy number variation.
And is it OK to use any other normal exome data that was not obtained from the same study.
Or is it mandatory to use the normal exome data from the same study (what if the normal exome data is not available from the same study).
Thanks Dr. Prabhakar
Is that paired normal sample necessary for the reason I wrote above (excluding variants of the individual that are not tumor related)? Or is there something else that does not come to my mind?
Your answer was right on! For the variants, yes, excluding variation in the germline is the rationale. For CNVs, having a "matching" exome is useful both to exclude germline copy number variation and also to provide a relatively close match in protocol and capture regions.
Thanks you, came across a paper in Genomics Medicine
ISOWN: accurate somatic mutation identification in the absence of normal tissue controls
https://genomemedicine.biomedcentral.com/articles/10.1186/s13073-017-0446-9 would you comment on accuracy of such kind of methods in general are they reliable.