I'm wondering what the best procedure for calling serial plasma samples from the same patients with a single normal sample would be.
For instance, running the samtools-mpileup-varscan2 pipeline with the normal sample first and the serial samples after gives genotype calls of 0/0 when I'd expect a variant to be called, such as here:
chr1 1471992 . T C . PASS ADP=14;WT=2;HET=2;HOM=0;NC=0 GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR 0/1:23:23:23:16:7:30.43%:4.5803E-3:50:38:11:5:2:5 0/1:21:13:13:7:6:46.15%:7.4534E-3:34:35:2:5:1:5 0/0:3:10:10:6:4:40%:4.3344E-2:35:30:2:4:0:4 0/0:6:13:13:9:4:30.77%:4.7826E-2:30:30:5:4:1:3
in the fourth column (3rd serial plasma sample), when the read statistics are very similar to that in the second (1st serial sample) where the genotype has been called as 0/1.
Is this the best way of calling variants on multiple samples, or is it better to do normal/p1, normal/p2, normal/p3 etc, and then merge the variant sets at the end?
not sure about "best practice" but I generally run all the variant calling per-sample or per-pair in parallel, then convert to .tsv with GATK VariantsToTable, add sample labels to the .tsv, then concatenate the .tsvs into a single table for review. If you have tumor-normal pairs then be sure to use variant callers that support that, I use MuTect2 and LoFreq Somatic for that right now but there are plenty others. If you are asking about the technical aspects of how to run them in parrallel then you would want either something basic like GNU
parallel
or a workflow manager like Snakemake or Nextflow.It was more whether to run through varscan (or equivalent) all at once, leading to calls I think are incorrect like the one above, or whether to run the (single) normal v each serial plasma sample in pairs, so N v P1, N v P2, N v P3, N v P4 etc... then join the variants together as you suggest.
if your plasma samples were collected independently then I think you would definitely want to run the variant calling independently for each tumor-normal pair. They would be considered separate biological samples.
It was more whether to run through varscan (or equivalent) all at once, leading to calls I think are incorrect like the one above, or whether to run the (single) normal v each serial plasma sample in pairs, so N v P1, N v P2, N v P3, N v P4 etc... then join the variants together as you suggest.