Question

Bam to nucleotide frequencies

3

Entering edit mode

10.3 years ago

auser_27 ▴ 40

I have whole genome sequences of several clinical isolates and I want to get the nucleotide frequencies (% A,C,T,G) at a given position. Among novel SNPs we could discover, there are some very specific positions that I want to look at. There are enough clinical isolates that I don't wish to this manually using IGV.

I can do this, but it's extremely inelegant and I think that there is some more elegant way that I am missing. I have summarize that there are probably two approaches.

Approach 1

1 . Run the basic samtools SNP pipeline, outputting all positions without a VCF ( remove -V) and outputting all possible reference alleles.

Write a script that parses the info column in the VCF file, get the depth info for each nucleotide, calculate the frequencies (for my variants of interest only); luckily I already have this script.

Approach 2

Use bedtools and convert that bam to fastq
Use seqtk and convert fastq to fasta
Use bedtools nuc command, specifying position of interest

I feel like this is too common a task to require the use of so many tools and file format conversions and even custom scripts, so I ask : what's a better way to do this?

sequence SNP next-gen • 5.4k views

ADD COMMENT • link updated 2.9 years ago by Ram 44k • written 10.3 years ago by auser_27 ▴ 40

2

Entering edit mode

10.3 years ago

Adrian Pelin ★ 2.6k

Install popoolation, and run snp-diff script. You will get a table, a column for chromosome, a column for nucleotide position in that chromosome, and then columns for each of your samples that contain allele frequencies.

ADD COMMENT • link 10.3 years ago by Adrian Pelin ★ 2.6k

1

Entering edit mode

10.3 years ago

Sean Davis 27k

In R, the bam2R function in the deepSNV package is pretty useful for this.

ADD COMMENT • link updated 2.9 years ago by Ram 44k • written 10.3 years ago by Sean Davis 27k

0

Entering edit mode

10.3 years ago

poisonAlien ★ 3.2k

bam-readcount is one option.

ADD COMMENT • link 10.3 years ago by poisonAlien ★ 3.2k

Ram · Accepted Answer · 2014-08-19

4

Entering edit mode

10.3 years ago

swbarnes2 14k

What about parsing the relevant line from samtools mpileup?

ADD COMMENT • link updated 3.0 years ago by Ram 44k • written 10.3 years ago by swbarnes2 14k

0

Entering edit mode

I think using mpileup with a given genomic coordinate is the simplest solution. It supports filters by MAPQ and you can specify genomic coordinates (just be careful about the off-by-one error if youre zero-counting vs one-counting bases).

ADD REPLY • link updated 3.0 years ago by Ram 44k • written 10.3 years ago by karl.stamm 4.1k

0

Entering edit mode

This was the fastest way to do it.

ADD REPLY • link 10.3 years ago by auser_27 ▴ 40