I have a scDNAseq dataset having multiple FASTQ files for multiple single cells.
samtools was used after aligning FASTQ files with BWA to hg19 reference to produce bam files.
I have already identified 36 SNV mutation sites and
I want to use mpileup
to extract read counts (Total read count and variant read count) at those 36 positions.
How do I do that? Or Which mpileup
command should I use to achieve that?
PS: what I already did so far:
For TP53-chr17-7577548
variant position, I searched one of the single cell bam files using grep
:
bcftools view SRR3472634_CO8-MA-91.bcf | grep "chr17" | grep "7577548"
And that resulted in this variant is present:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SRR3472634.sorted.bam
chr17 7577548 . C T 225 . DP=73;VDB=0.154722;SGB=-0.693147;MQSB=1;MQ0F=0;AC=2;AN=2;DP4=0,0,28,30;MQ=60 GT:PL 1/1:255,175,0
I used the following command for TP53-chr17-7577548
variant position with mpileup
command:
samtools mpileup -f reference.fa -r chr17:7577548-7577548 alignments.bam
And that resulted:
[mpileup] 1 samples in 1 input files chr17 7577548 C 58 TtttTTTtTTTTtttTtttTTtttTTTtTtttTTTtTtTTtTTTtttTttTtTttTtt uJ<JJuFKKuJsFKKuJJJA<JKAJFKAKJJKFKKFJJKKJuuKJFFpJKAJJKJJKF
Then I used the following for more precise:
samtools mpileup -f reference.fa -r chr17:7577548-7577548 alignments.bam | cut -f 5 | tr '[a-z]' '[A-Z]' | fold -w 1 | sort | uniq -c
And that resulted:
[mpileup] 1 samples in 1 input files
58 T
I think that showed me 58 alternate read counts of T
see also : How to generate alignment report from bam files , Finding SNPs in population data: help with pipeline , How to retrieve the SNP data from Bam file