Question

Convert Bam File To Fasta File

4

Entering edit mode

14.1 years ago

Sazzad ▴ 40

How can I convert bam file to fasta file using Biopython command? please share if some one have any idea.

Thanks in advance

sazzad

conversion bam fasta biopython • 34k views

ADD COMMENT • link updated 7.2 years ago by WouterDeCoster 47k • written 14.1 years ago by Sazzad ▴ 40

0

Entering edit mode

Hi Sazzad. Can you tell us a little more about your problem and why you are choosing to solve it using biopython?

ADD REPLY • link 14.1 years ago by Casey Bergman 18k

0

Entering edit mode

It will be easy if you clear what you did so far!

ADD REPLY • link 14.1 years ago by Thaman ★ 3.3k

Ram · Answer 1 · 2011-03-30

14

Entering edit mode

14.1 years ago

Brad Chapman 9.7k

You can do this with a combination of Biopython for writing the Fasta files and pysam for reading the BAM files:

import os
import sys

import pysam
from Bio import SeqIO, Seq, SeqRecord

def main(in_file):
    out_file = "%s.fa" % os.path.splitext(in_file)[0]
    with open(out_file, "w") as out_handle:
        # Write records from the BAM file one at a time to the output file.
        # Works lazily as BAM sequences are read so will handle large files.
        SeqIO.write(bam_to_rec(in_file), out_handle, "fasta")

def bam_to_rec(in_file):
    """Generator to convert BAM files into Biopython SeqRecords.
    """
    bam_file = pysam.Samfile(in_file, "rb")
    for read in bam_file:
        seq = Seq.Seq(read.seq)
        if read.is_reverse:
            seq = seq.reverse_complement()
        rec = SeqRecord.SeqRecord(seq, read.qname, "", "")
        yield rec

if __name__ == "__main__":
    main(*sys.argv[1:])

Run it with:

% python bam_to_fasta.py your_file.bam

and the output will be in your_file.fa

ADD COMMENT • link 14.1 years ago by Brad Chapman 9.7k

0

Entering edit mode

This works well, thanks Brad. However I found a couple of reasons to consider Picard SamToFastq suggested by Steve Lianoglou instead.

SamToFastq will properly split paired-end reads into two fastq files.

SamToFastq seems to be much faster (about 1/5th the time in my informal testing).

ADD REPLY • link 14.0 years ago by Lance ▴ 30

0

Entering edit mode

One more thing to note: SamToFastq actually outputs FastQ with quality scores, while this solution outputs Fasta files (no qualities). I actually modified Brad's solution to output FastQ for my testing, since that is what I needed.

ADD REPLY • link 14.0 years ago by Lance ▴ 30

0

Entering edit mode

Lance, that makes good sense. Picard's SamToFastq is great; I even wrote a Galaxy wrapper for it in the community tool shed: http://community.g2.bx.psu.edu/tool/browse_tools?sort=name&f-state=approved&id=91f644c6d57806f8&webapp=community&operation=view_tool&f-Category.name=SAM

ADD REPLY • link updated 5.6 years ago by Ram 45k • written 14.0 years ago by Brad Chapman 9.7k

0

Entering edit mode

How to add sequencing quality and mapped or unmapped reads as parameters?

Thanks

ADD REPLY • link 7.7 years ago by Dave123 • 0

score 11 · Answer 2 · 2011-03-30

11

Entering edit mode

14.1 years ago

Michael Schubert ★ 7.1k

Alternative approach using SAMtools (source: SeqAnswers)

samtools view filename.bam | \
awk '{OFS="\t"; print ">"$1"\n"$10}' - > filename.fasta

ADD COMMENT • link 14.1 years ago by Michael Schubert ★ 7.1k

3

Entering edit mode

The problem with this is that the original strand of the sequence is lost, as SAM always reports sequences on the REF strand. To recreate the original sequence that was aligned, one needs to interrogate the FLAG field and ask if the read was actually aligned to the "-" strand. If so, it needs to be RC'ed. See line 6 of Brad's "bam_to_rec" function above.

ADD REPLY • link 14.1 years ago by Aaronquinlan 12k

0

Entering edit mode

As always, the usefulness of the approach depends on your intended use. But thanks for pointing that out, I didn't think of it.

ADD REPLY • link 14.1 years ago by Michael Schubert ★ 7.1k

0

Entering edit mode

I created an account to up vote this!

thanks for the help.

ADD REPLY • link 13.9 years ago by User 3325 • 0

0

Entering edit mode

In certain cases, re-generating fastq is possible with this small modification: samtools view filename.bam | awk '{OFS="\t"; print "@"$1"\n"$10"\n+\n"$11}' >| filename.fastq I have used this successfully with bam files generated by Tophat (on unstranded data).

ADD REPLY • link 12.1 years ago by stu.andrews • 0

score 4 · Answer 3 · 2011-03-31

4

Entering edit mode

14.1 years ago

Steve Lianoglou 5.2k

If you're willing to consider using something that's not Python-maden, Picard comes with the SamToFastq utility you can use from the command line that works on SAM and BAM files.

ADD COMMENT • link 14.1 years ago by Steve Lianoglou 5.2k

Ram · Answer 4 · 2011-04-02

0

Entering edit mode

14.1 years ago

Ben Lange ▴ 210

BAM is not in the list but OpenBabel seems like a pretty good converter application. Maybe someone can come up with a plugin if BAM is common enough.

ADD COMMENT • link 14.1 years ago by Ben Lange ▴ 210

0

Entering edit mode

but it is mostly designed to handle the chemical file formats.

ADD REPLY • link 14.1 years ago by Pierre Lindenbaum 166k

0

Entering edit mode

I'm learning so let's see if I understand:

*BAM is an animated graphics file format. http://iesdp.gibberlings3.net/file_formats/ie_formats/bam_v1.htm

*Fasta is a bioinformatics specific format. http://en.wikipedia.org/wiki/FASTA_format

Doesn't it make more sense to go from Fasta to BAM than the other way around? If you had a GIF file it would be pretty tough to reverse that file back to a language that described what was going on in the image. Unless there's quite a bit of metadata built into the BAM file.

ADD REPLY • link 14.1 years ago by Ben Lange ▴ 210

0

Entering edit mode

I'm learning about file formats

BAM - is compressed SAM.
SAM – Sequence Alignment/Map format, in which the results of the 1000 Genomes Project will be released.
FASTA – The FASTA file format, for sequence data. Sometimes also given as FNA or FAA (Fasta Nucleic Acid or Fasta Amino Acid).

Both seem to be oriented towards sequencing data. Why would FASTA get lumped in with chemical data but not SAM?

ADD REPLY • link updated 5.6 years ago by Ram 45k • written 14.1 years ago by Ben Lange ▴ 210

score 0 · Answer 5 · 2018-02-18

0

Entering edit mode

7.2 years ago

WouterDeCoster 47k

And in more recent versions of samtools, you could just use samtools fasta

ADD COMMENT • link 7.2 years ago by WouterDeCoster 47k