Question

Identifying germline and tumor samples

1

Entering edit mode

10.2 years ago

Kasthuri ▴ 300

Given two genome .bam files, one of which we know is from a tumor sample and the other from normal/germline from the same person, is there an efficient way to correctly identify them bioinformatically?

Thanks. -K.

Germline Tumor Normal • 3.1k views

ADD COMMENT • link updated 3.8 years ago by Ram 44k • written 10.2 years ago by Kasthuri ▴ 300

Ram · Answer 1 · 2014-10-19

1

Entering edit mode

10.2 years ago

Sean Davis 27k

A simple approach is to use a copy-number or allelic imbalance analysis. Such analyses will almost always show significant abnormalities in the tumor sample. While there will also be copy number and apparent blocks of loss of heterozygosity in a "normal" genome, tumors typically have this to a much larger extent. There are many tools to do copy number analysis; the particular choice will probably not make much difference for such a broad question.

ADD COMMENT • link updated 3.8 years ago by Ram 44k • written 10.2 years ago by Sean Davis 27k

0

Entering edit mode

Good idea. In fact, I did this analysis and saw one of them had huge variations (in particular loss). I inferred it should be the tumor since the other one was clean. I used Control-FREEC. I was thinking of some analysis that goes along with this to doubly confirm. For instance, if we call somatic mutations between actual normal (which we don't know) treating it as tumor and vice versa for the actual tumor treating it as normal, we should have less mutations since by theory a real tumor should contain all SNPs found in the germline plus de novo purely somatic mutations. But given the noise found in NGS, this seems tricky.

ADD REPLY • link updated 3.8 years ago by Ram 44k • written 10.2 years ago by Kasthuri ▴ 300

0

Entering edit mode

I would suggest pairing your copy number analysis with an analysis of regions of allelic imbalance. Your suggestion of doing a comparison of somatic variants should work, in theory, but somatic variant calling is, in my experience, not as quantitative as one might hope. However, allelic imbalance is fairly robust and should be present in the vast majority of tumor samples. Note that Control-FREEC should have this information readily available.

ADD REPLY • link updated 3.8 years ago by Ram 44k • written 10.2 years ago by Sean Davis 27k

Ram · Answer 2 · 2014-10-19

0

Entering edit mode

10.2 years ago

Manvendra Singh ★ 2.2k

Yes,

You can analyze their expression value following other Human lines which are available and you want to compare with.

Then cluster their transcriptome on spearman's correlation, where ever its clustering, sample belongs to the same

ADD COMMENT • link updated 3.8 years ago by Ram 44k • written 10.2 years ago by Manvendra Singh ★ 2.2k

0

Entering edit mode

I think the data are from genomic sequencing, not transcriptomic? Perhaps @Kasthuri could comment.

ADD REPLY • link updated 3.8 years ago by Ram 44k • written 10.2 years ago by Sean Davis 27k

0

Entering edit mode

Yes, these are genomic data and not from transcriptome. You are right Sean.

ADD REPLY • link updated 3.8 years ago by Ram 44k • written 10.2 years ago by Kasthuri ▴ 300

0

Entering edit mode

Yes, I realized it now. I remove my answer.

ADD REPLY • link updated 3.8 years ago by Ram 44k • written 10.2 years ago by Manvendra Singh ★ 2.2k

Ram · Answer 3 · 2014-10-19

0

Entering edit mode

10.2 years ago

Renesh ★ 2.2k

Use gene expression analysis approach by counting the reads in these two samples. Compare the fold change in expression in between them.

ADD COMMENT • link updated 3.8 years ago by Ram 44k • written 10.2 years ago by Renesh ★ 2.2k

0

Entering edit mode

Sorry, I should have been more specific. This is WGS and not RNA-seq. Thanks.

ADD REPLY • link updated 3.8 years ago by Ram 44k • written 10.2 years ago by Kasthuri ▴ 300