Question

Somatic Mutation Identification for Tumor without Normal

1

Entering edit mode

7.5 years ago

haiying.kong ▴ 360

I have WES for 6 tumors which do not have matching normals. I identified all mutations, both germline and somatic with GATK. Then excluded all mutations that are considered as germline by ExAC, and germline mutations that are identified from our about 90 normal samples. The number of somatic mutations identified in this way for the 6 tumors is 251495. I did expect this number could be high, but did not expect to be this high.

The number of somatic mutations identified with GATK for tumors with matching normals is 3 or 4 digits for one tumor.

Even with most conservative estimate, the mutation in the tumors without matching normal is 10 times higher.

Is there anyway I can still use the 6 tumors?

somatic mutation • 3.8k views

ADD COMMENT • link updated 6.5 years ago by Biostar 20 • written 7.5 years ago by haiying.kong ▴ 360

0

Entering edit mode

What histology are your tumors? Melanoma, RCC, etc.? The magnitude of mutational load, whether wrong or not at this juncture, will be somewhat more predictable knowing this.

ADD REPLY • link 7.5 years ago by CMosychuk ▴ 20

1

Entering edit mode

primary melanoma whole exome sequence.

ADD REPLY • link 7.5 years ago by haiying.kong ▴ 360

score 0 · Answer 1 · 2017-05-24

0

Entering edit mode

7.5 years ago

Manuel Landesfeind ★ 1.4k

For me the number of mutations in your tumors seems suspiciously high. We do a very similar approach in removing potential germline mutations from tumors lacking matched normal samples (using different annotation databases and no internal normal-pool). In our models we usually retain between 500 and 2000 mutations (of course with exceptions of highly mutated tumors).

ADD COMMENT • link 7.5 years ago by Manuel Landesfeind ★ 1.4k

0

Entering edit mode

Could you please tell me which database you use as germline mutation list?

ADD REPLY • link 7.5 years ago by haiying.kong ▴ 360

1

Entering edit mode

You can have a look to gnomAD which is bigger and include ExAC : http://gnomad.broadinstitute.org/ .

ADD REPLY • link 7.5 years ago by Titus ▴ 910

2

Entering edit mode

What can be discussed is the threshold you use: when do you consider a variant as putative germline? 1%, 10%, ... allelic frequency? In any population or overall?

PS: If you figure out a good threshold or some literature, please share here because it will save me some time to evaluate this by myself in the near future ;-)

ADD REPLY • link 7.5 years ago by Manuel Landesfeind ★ 1.4k

1

Entering edit mode

I absolutely agree with Titus that gnomAD and ExAC are key databases. We use 1000G, HapMap and others because our established pipeline uses hg19 genome assembly.

Moving to hg38 and corresponding databases is in progress and I plan to use gnomAD and ExAC too.

ADD REPLY • link 7.5 years ago by Manuel Landesfeind ★ 1.4k

1

Entering edit mode

Dear Manuel,

The website is saying "What genome build is the gnomAD data based on? All data are based on GRCh37/hg19"

http://gnomad.broadinstitute.org/faq

ADD REPLY • link 7.5 years ago by haiying.kong ▴ 360

0

Entering edit mode

Thats interesting... for some reason I thought it would be hg38... don't know why..

ADD REPLY • link 7.5 years ago by Manuel Landesfeind ★ 1.4k

0

Entering edit mode

Could you please give me complete list of database you use as reference for germline mutations for hg19/hg37? It would be wonderful if you could give me links to download as well :)

ADD REPLY • link 7.5 years ago by haiying.kong ▴ 360