For a set of RNASeq bam files aligned to hg19, I would like to calculate the number of reads mapping to all possible (high-confidence) alleles for each position. What is the best way to do this? I have tried using samtools mpileup and bcftools as was recommended in a few solutions here, but the output either gives me Error: Could not parse --min-ac g
, or gives a vcf file which is deprecated and shows me a vcf file whose output I am not sure is correct. Here is a sample:
chr2 100002198 . T . 32.9952 . DP=1;MQ0F=0;AF1=0;AC1=0;DP4=1,0,0,0;MQ=60;FQ=-29.991 GT:PL:DV 0/0:0:0
chr2 100002199 . A . 32.9952 . DP=1;MQ0F=0;AF1=0;AC1=0;DP4=1,0,0,0;MQ=60;FQ=-29.991 GT:PL:DV 0/0:0:0
chr2 100002200 . T . 32.9955 . DP=1;MQ0F=0;AF1=0;AC1=0;DP4=1,0,0,0;MQ=60;FQ=-29.9906 GT:PL:DV 0/0:0:0
chr2 100002201 . C . 32.9955 . DP=1;MQ0F=0;AF1=0;AC1=0;DP4=1,0,0,0;MQ=60;FQ=-29.9906 GT:PL:DV 0/0:0:0
What I need is a file that shows me 1) each base position, 2) the reference nucleotide at that position, 3) the range of alleles shown by my sample at that position, and 4) the frequency of each allele.
Thanks Chris, this what I was looking but I got confused from the documentation about the format of regions file. Can you please explain the right format it accepts?
Tab-delimited, one based coordinates
Column 4+ will be ignored