Entering edit mode
4.1 years ago
wenbinm
▴
40
Hi there, Does bcftools use adjacent read information when calling SNPs?
I want to call SNP from whole exome sequencing data from TCGA using bcftools. Here is what I did:
bcftools mpileup -Ou -f $REF raw_data/*.bam | bcftools call -mv -o processed/BRCA_variant_calls.vcf
I have a lot of SNP to call. To save storage space, I subset bam files to include reads only covering my interested SNP site (i.e. reads overlapping 3bp upstream and downstream window of the SNP site). I realized that the window size impacted the genotypes I got. If I included reads overlapping 2kb window around the SNP, 10% of the genotypes I call were different. Looks like the adjacent reads influence the genotype I called.
Thank you!