Bam to nucleotide frequencies
4
3
Entering edit mode
10.3 years ago
auser_27 ▴ 40

I have whole genome sequences of several clinical isolates and I want to get the nucleotide frequencies (% A,C,T,G) at a given position. Among novel SNPs we could discover, there are some very specific positions that I want to look at. There are enough clinical isolates that I don't wish to this manually using IGV.

I can do this, but it's extremely inelegant and I think that there is some more elegant way that I am missing. I have summarize that there are probably two approaches.

Approach 1

1 . Run the basic samtools SNP pipeline, outputting all positions without a VCF ( remove -V) and outputting all possible reference alleles.

  1. Write a script that parses the info column in the VCF file, get the depth info for each nucleotide, calculate the frequencies (for my variants of interest only); luckily I already have this script.

Approach 2

  1. Use bedtools and convert that bam to fastq
  2. Use seqtk and convert fastq to fasta
  3. Use bedtools nuc command, specifying position of interest

I feel like this is too common a task to require the use of so many tools and file format conversions and even custom scripts, so I ask : what's a better way to do this?

sequence SNP next-gen • 5.4k views
ADD COMMENT
4
Entering edit mode
10.3 years ago

What about parsing the relevant line from samtools mpileup?

ADD COMMENT
0
Entering edit mode

I think using mpileup with a given genomic coordinate is the simplest solution. It supports filters by MAPQ and you can specify genomic coordinates (just be careful about the off-by-one error if youre zero-counting vs one-counting bases).

ADD REPLY
0
Entering edit mode

This was the fastest way to do it.

ADD REPLY
2
Entering edit mode
10.3 years ago
Adrian Pelin ★ 2.6k

Install popoolation, and run snp-diff script. You will get a table, a column for chromosome, a column for nucleotide position in that chromosome, and then columns for each of your samples that contain allele frequencies.

ADD COMMENT
1
Entering edit mode
10.3 years ago

In R, the bam2R function in the deepSNV package is pretty useful for this.

ADD COMMENT
0
Entering edit mode
10.3 years ago
poisonAlien ★ 3.2k

bam-readcount is one option.

ADD COMMENT

Login before adding your answer.

Traffic: 1634 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6