Given two genome .bam files, one of which we know is from a tumor sample and the other from normal/germline from the same person, is there an efficient way to correctly identify them bioinformatically?
Thanks. -K.
Given two genome .bam files, one of which we know is from a tumor sample and the other from normal/germline from the same person, is there an efficient way to correctly identify them bioinformatically?
Thanks. -K.
A simple approach is to use a copy-number or allelic imbalance analysis. Such analyses will almost always show significant abnormalities in the tumor sample. While there will also be copy number and apparent blocks of loss of heterozygosity in a "normal" genome, tumors typically have this to a much larger extent. There are many tools to do copy number analysis; the particular choice will probably not make much difference for such a broad question.
Yes,
You can analyze their expression value following other Human lines which are available and you want to compare with.
Then cluster their transcriptome on spearman's correlation, where ever its clustering, sample belongs to the same
Use gene expression analysis approach by counting the reads in these two samples. Compare the fold change in expression in between them.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Good idea. In fact, I did this analysis and saw one of them had huge variations (in particular loss). I inferred it should be the tumor since the other one was clean. I used Control-FREEC. I was thinking of some analysis that goes along with this to doubly confirm. For instance, if we call somatic mutations between actual normal (which we don't know) treating it as tumor and vice versa for the actual tumor treating it as normal, we should have less mutations since by theory a real tumor should contain all SNPs found in the germline plus de novo purely somatic mutations. But given the noise found in NGS, this seems tricky.
I would suggest pairing your copy number analysis with an analysis of regions of allelic imbalance. Your suggestion of doing a comparison of somatic variants should work, in theory, but somatic variant calling is, in my experience, not as quantitative as one might hope. However, allelic imbalance is fairly robust and should be present in the vast majority of tumor samples. Note that Control-FREEC should have this information readily available.