Question

Whole Exome data analysis (Hiseq)

0

Entering edit mode

6.8 years ago

prabhakar8279 ▴ 10

Hi,

We are currently working on whole exome analysis of breast cancer samples using the GDC Bioinformatics Pipeline https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/DNA_Seq_Variant_Calling_Pipeline/

The tumor data .bam file was downloaded from GDC legacy archives. https://portal.gdc.cancer.gov/legacy-archive/files/9efa8d39-37e0-4236-9737-e14ddcfd93ff

The reference genome was downloaded from here https://gdc.cancer.gov/about-data/data-harmonization-and-generation/gdc-reference-files GRCh38.d1.vd1.fa.tar.gz

We were able to complete the Genome Alignment and Alignment Co-Cleaning, next wanted to do the variant calling and copy number variation.

In order to perform variant calling and copy number variation do we need normal exome data (or not), is there any publicly available normal exome data for using it for variant calling and copy number variation.

And is it OK to use any other normal exome data that was not obtained from the same study.

Or is it mandatory to use the normal exome data from the same study (what if the normal exome data is not available from the same study).

Thanks Dr. Prabhakar

RNA-Seq Copy number variation variant calling • 1.9k views

ADD COMMENT • link updated 6.8 years ago by Sean Davis 27k • written 6.8 years ago by prabhakar8279 ▴ 10

score 2 · Answer 1 · 2018-02-07

Concerning exome data and CNV: have a look at Best Copy Number Variation Tools and Human NGS Cancer Data for tool development, algorithm benchmarking, teaching, pipeline evaluation, etc..

Concerning variant calling: I am not a cancer bioinformatician, but I assume the process is the same as in plant bioinformatics. You can do variant calling without control data, however that means that any normal, non cancer-related polymorphisms of the individual will be in your results. In order to get rid off those, you would need to have a control set.

score 2 · Answer 2 · 2018-02-07

2

Entering edit mode

6.8 years ago

Sean Davis 27k

In your case, it appears that you are working with cell line data.

Generally, somatic variant calling (variant in the tumor and not the normal) requires a paired normal sample for comparison. Since you are working with cell lines, you'll find that such paired normals do not normally exist. That leaves you with a challenge for analysis since you are most likely interested in these somatic variants and are missing one half of the "equation" for finding them. I am not giving you an answer here as to how to proceed since there is not an "accepted best practice" for cancer cell line data.

That said, in general, the short answers to your questions, specifically for somatic variation detection, are:

In order to perform variant calling and copy number variation do we need normal exome data (or not), is there any publicly available normal exome data for using it for variant calling and copy number variation.

Yes.

And is it OK to use any other normal exome data that was not obtained from the same study.

No.

what if the normal exome data is not available from the same study

You are kinda stuck and need to fall back to more heuristic approaches, driven by the questions you want to answer.

ADD COMMENT • link 6.8 years ago by Sean Davis 27k

0

Entering edit mode

Is that paired normal sample necessary for the reason I wrote above (excluding variants of the individual that are not tumor related)? Or is there something else that does not come to my mind?

ADD REPLY • link 6.8 years ago by cschu181 ★ 2.8k

1

Entering edit mode

Your answer was right on! For the variants, yes, excluding variation in the germline is the rationale. For CNVs, having a "matching" exome is useful both to exclude germline copy number variation and also to provide a relatively close match in protocol and capture regions.

ADD REPLY • link 6.8 years ago by Sean Davis 27k

0

Entering edit mode

Thanks you, came across a paper in Genomics Medicine

ISOWN: accurate somatic mutation identification in the absence of normal tissue controls

https://genomemedicine.biomedcentral.com/articles/10.1186/s13073-017-0446-9 would you comment on accuracy of such kind of methods in general are they reliable.

ADD REPLY • link 6.8 years ago by prabhakar8279 ▴ 10