Different NGS runs for paired samples
2
0
Entering edit mode
4.7 years ago
gab ▴ 20

Hello everyone

I ask you help because I'm in a situation I struggle with. I am working o a project with the aim to find somatic mutation in a certain tumor. To achieve this whole exome seq. was performed on paired tomor-blood samples. I got into the project after the actual sequencing was performed and my duty was to analyse the data. After a while I was still working on them because despite the apparent good quality of the experiment something seemed wrong in the mutations I called.

Asking questions to the lab's boss I find out that the WES was not run with pairing blood-tumor samples, but rather They did 2 different runs: one with all the bloods from all the patients and one with all the tumor DNA samples.

I guess the intrinsic bias of the machine is relevant in this case and I really do not know for sure how to handle it. I guess many of the somatic mutations I called did not pass the validation test (q-PCR and ddPCR) for this reason.

Does anyone have some tips on how to "clean" the data? Thank you so much

WES tumor-blood exome NGS • 1.1k views
ADD COMMENT
0
Entering edit mode

Sounds like poor experimental design. Do you know if they at least processed all the samples identically, so same DNA extraction kit, same library prep kit? Were the sequencing platforms and sequencing depths comparable between the groups? How did you cann the mutations?

ADD REPLY
0
Entering edit mode

The libraries were prapared the same day despite being run at different times. All the reagents were the same. The seq depth is sometimes similar and sometimes difefrent in the two groups, depending on genome region. I used Varscan to call tumor somatic mutations, after generating mpileups with samtools from bwa-generated bam files.

ADD REPLY
1
Entering edit mode

Ah ok, so then the batch effect is probably smaller than I thought. Differences between runs are typically small, the most important thing is that they used the same kits for everything and did it on the same day. Did you use the fpfilter from VarScan? It contains many heuristic filters that are recommended. You can also try different tools. VarScan is ok but old and not maintained anymore.

ADD REPLY
1
Entering edit mode
4.7 years ago

First of all, I'm not sure that batch effect is important for DNASeq applications. RNA-Seq, it absolutely is.

But even for RNA-Seq, the day that a sample is run on the instrument is irrelevant. Day of RNA extraction, day of library prep, those matter, for variant assessment I'd worry if different kinds of instrument were used, but day of running samples on the same instrument? Not a source of technical variation.

ADD COMMENT
1
Entering edit mode
4.7 years ago
bruce.moran ▴ 970

You can't know the technical bias unless you run technical replicates (i.e. the same sample, but made from a second DNA extraction) which is a waste of resources unless it is a specific experimental design (i.e. multi-region sequencing, or to test a new library prep method etc.).

DNA is a robust molecule, unlike RNA as stated above. The library prep methods typically either work or fail, e.g. for exome your hybridisation/capture either works and you have a library, or it doesn't and you don't. This is immediately obvious as your fastq files are small and nothing really aligns.

I would be much more concerned that you are working on clinical data from humans. I would very strongly suggest using an available reproducible pipeline such as https://nf-co.re/sarek instead of trying to come up with your own. Best practice is typically using multiple variant callers to reduce false positives.

I would also suggest you find a person locally who knows the field and can advise you. It isn't really a good area to be trying to get to grips with on your own. Hard to find sometimes but I would try to reach out, it will make a huge difference although sometimes not possible.

ADD COMMENT

Login before adding your answer.

Traffic: 1938 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6