As part of a study on cancer relapse upon treatment, we performed single cell RNAseq and single cell DNA seq on primary tumor tissue before treatment and after relapse. Now, my collaborators would like to know whether certain variants that are present in a set of cancer driving genes in the relapsed tumor were already present as sub clones before treatment or alternatively, that new variants of these genes have emerged.
The single cell sequencing libraries were made according to two different protocols: SmartSeq2 or G&T seq. The coverage in the SmartSeq2 samples is low compared to the samples produced with the G&T protocol. So, I now have a few hundred BAM files from samples made according to either one of these protocols but also quite a few question marks on how to tackle this request since I'm new to variant calling…
Naïvely, my first approach would be to merge and realign or recalibrate all the BAM files from each timepoint separately and run a variant calling pipeline on the resulting datasets. Can this work or am I missing something?
More in general, I haven't found many studies trying this particular approach so is it even possible to do variant calling with single cell data or will the number of artifacts from the amplification process make it impossible to detect any true variants? I would think that any variant found in multiple cells has a high probability of being a true variant? The fact that the G&T seq returns data on the genome and the transcriptome of the same cell should help in detecting true positives so this could be used as a sort of extra validation?
Thanks for your input!