Get multiple snps coverage (allelic depth) from bam files
3
2
Entering edit mode
9.1 years ago
MAPK ★ 2.1k

I have more than 500 samples (or thier bam files). I also have more than 1000 snps in snp table as shown below. I need to get the number of reads (allelic depth) for reference allele and alternate allele from all the bam files. Is there any program to get that directly from bam files given the snp information below? I know pile up package in R samtools does that, but for some reason I often miss either positive strand read or - strand counts.

snp table:

chr    position    REF    ALT    TYPE
3      23322333    T      C      snp
2      22322223    A      T      snp
bam sequencing genotype • 4.5k views
ADD COMMENT
1
Entering edit mode
9.1 years ago

This tool provides a script ComputePileupFreqs.pl which can be used for counting ref and alternate allele from pileup output.

Not tested.

ADD COMMENT
1
Entering edit mode
9.1 years ago
mkulecka ▴ 360

Bam-readcount is ideal for this. Apart from raw read depth of every base present it gives you many useful information, inluding average quality mapping, average base quality etc. Its output is also used for variant filtering in fpfilter script - https://github.com/ckandoth/variant-filter.

In order to obtain information for every alleles present, you'll have to provide script with reference file in fasta format. The command looks like this:

bam-readcount -q1 -l sites_list -f reference.fasta your_file.bam > your_file.readcount

-q1 option means minimum mapping quality of 1.

Sites list is supposed to look like this (tab delimited):

chr1 56777 56777

Creators of the fpfilter script mentioned above provided a nice one-liner to convert your vcf file into sites_list file:

perl -ane 'print join("\t",@F[0,1,1])."\n" unless(m/^#/)' your.vcf > sites_list

I tested it on human and mice data and it worked fine in both cases. Sample output from mice data looks like this:

chr1    8816113    T    146    =:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00    A:3:4.00:26.00:4.00:0:3:0.71:0.01:26.00:0:0.00:121.33:0.65    C:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00    G:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00    T:143:63.80:24.38:63.80:79:64:0.57:0.01:21.76:79:0.62:120.83:0.50    N:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00
chr1    8816178    G    121    =:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00    A:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00    C:120:71.14:26.27:71.14:68:52:0.44:0.02:28.25:68:0.24:122.28:0.48    G:1:73.00:22.00:73.00:0:1:0.31:0.03:8.00:0:0.00:116.00:0.84    T:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00    N:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00 .
ADD COMMENT
1
Entering edit mode

Does it also gives the number of reads for each reference and alternate allele over a SNP ?

ADD REPLY
0
Entering edit mode

Yes, exactly.

ADD REPLY
0
Entering edit mode

what would be the command if i have to calculate the reference and alternate allele reads counts for a bam file in.bam and a vcf file in.vcf ?

ADD REPLY
0
Entering edit mode

Yes what would be the command for that and If there is anythig to pipeline or loop through multiple bam files for multiple snp? I also could not get ALT base count using mock data. I am using their mock data and with the following command(which doesn't seem to work):

./bam-readcount -q 0 -l /Downloads/site_list.txt /Downloads/test.bam
ADD REPLY
0
Entering edit mode

Well, I forgot to mention you have to have a reference for that. I'll put that in my response.

ADD REPLY
0
Entering edit mode

Thanks!

ADD REPLY

Login before adding your answer.

Traffic: 2274 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6