Entering edit mode
2.7 years ago
Joel Wallenius
▴
210
Hello, I'm wondering if there is a program out there that transforms a bam file (and a reference genome file) into a Sanger-like plot, e.g. this one:
https://brcf.medicine.umich.edu/wp-content/uploads/2018/02/dna_no_noise_2018.gif
I've been googling but find nothing. :(
Thanks in advance!
it's like creating a cow from a steak. You would need the phred-score for each base A/T/G/C of the fastq but you only have the qualities of the called base.
Pierre, on the other hand, a Sanger profile is the readout of several DNA molecules pooled together, whereas in a bam you have the readout of the individual molecules. So you could emulate a sanger profile from the pileup format. E.g if at position X you have
AACC
with qualitiesaabb
you could create Sanger-like peak from that. Whether that is useful I don't know...ah yes, you're right. Then it could be easy but I'm missing the (java) code to write a AB1 file format.
You would need to assume some kind of "distribution" ja? The Sanger peaks look a bit like thin bell curves but they're probably not...
Perhaps you get better feedback if you explain why you want to create Sanger-like chromatograms from bams... It's a very unusual task and maybe there are better solutions to your problem.
The goal is to showcase a heterozygous mutation in a WGS bam file... for a poster presentation. I could've sworn I wrote it in the OP but seems I didn't. My bad. The poster presenter is oldschool and would like a Sanger-like diagram, rather than a pile of reads.
I would try to persuade the poster presenter that bam-to-chromatogram is not really meaningful conversion. Failing that, you could call genotypes at each position using standard samtools/bcftools pipeline. Then draw little bell-curves with height proportional to the allele frequencies or phred-scaled gentoype likelihoods. It shouldn't be too difficult if you are ok with R. But it's kind of fake of course. By the way, Sanger profiles are a continuous signal, really, that in good quality sequences looks like bell-shaped peaks (perhaps the central limit theorem kicks in somehow...) but in poor sequences is more erratic.
Yeah it was something like this that I had in mind, but Pierre's solution is much simpler and most likely good enough. Big thanks regardless for your thoughts!
Hmm, alright. I understand. Bummer. Thank you Pierre!