Entering edit mode
7.2 years ago
haiying.kong
▴
360
I have a pair of samples (normal and blood) that are whole exome sequenced. I checked contamination level of the samples. Tumor is contaminated at level of 30%. If I do validation experiment for any findings from this pair of samples, can I get my work published?
Tumor is contaminated with what?
cross-individual contamination.
How did you determine that?
run ContEst in GATK tools
It might be an idea to add this into your OP along with commands ran, and your experimental context.
No, that is not the point!
My point is not how to find contamination level. Assuming the computation is correct!
Then if a tumor sample has such high level of contamination, can I still use any findings from the sample? The sample is WES, and if I validate with extra experiment on other samples for any findings from this highly contaminated tumor sample, can I still publish?
What do you think? If you were reviewing such a paper would you consider this acceptable? How did the contamination occur in first place?
I am asking for someone else. Because I saw it published on IF 5+ journal. They did not mention contamination level,.
I don't think we can answer this question without full context as @andrew said above. In general, if something was contaminated then your conclusions are always going to have a cloud hanging over them (unless there is a clear experimental case/explanation for presence of that contamination).
The point is:
whatever they find from the contaminated sample is just suggestion for possible finding. All suggested findings are validated on other samples.
Does this make the work qualified for publication?
If there is independent experimental validation across many samples then it may be acceptable but you would still have to explain why contamination exists at that high level.
That said, consider this quote from ContEst paper
If you had 30% contamination then ...
If I were a reviewer, I would consider data with 30% cross-contamination to be completely useless and evidence of a lack of concern for accuracy, so I'd reject it.
That said, just because some tool reported 30% cross-contamination does not mean you actually have 30% cross-contamination.
in fact i used ContEST and verifyBamID to estimate contamination. both gave abot 30%. i used same software tested other samples. none are bad at this level.
If you have a good number of samples, you can then discard the problematic sample and proceed with downstream analyses.
The question whether a sample with 30% contamination is still publishable is not a bioinformatics question, but if there is one thing we learned from the last few years is anything is publishable, you just have to find the "appropriate" venue.
The thing is that there is only one pair of samples, normal and tumor, for this study whole exome sequenced. After identifying interesting mutations, these mutations are validated on larger number of samples with sanger sequencing which is much cheaper than WES. You are right about "appropriate" venue. This is exactly what I saw. Some people do research to make living, I think.