Leung et al., 2017 paper mentioned in Fig 1 data processing for CRC patients was sequenced as single cell for both SNV (with MDA WGA) and CNA (with DOP-PCR) parallelly. But from the SRA accession especially for CRC2(CO8) patient having 240 FASTQ files and two types of Library Selection:
- PCR Library Selection having 198 FASTQ files
- Random PCR Library Selection having 42 FASTQ files
I don't understand which single cells were sequenced for SNV analysis and which were for CNA analysis. I want to reproduce the following for the SNV mutation matrix from the scDNAseq FASTQ files from the SRA accession
- variant read count matrix,
- total read count matrix and
- binary/ternary mutation matrix
upon generating the matrices the goal is to reconstruct Tumor Phylogeny.
NB: Followup question : I tried to download FASTQ files and then align each file with the human reference genome with BWA/Bowtie2 and then create each alignment as bam file using samtools.
Now while variant calling should I go with single cell specific variant caller or just use the variant caller that generally calls from bulk DNA sequencing datasets such as Mutect2, VarScan, GATK etc?