Help With Maf'S In Bx-Python
1
4
Entering edit mode
14.0 years ago
Noah ▴ 40

I'm looking to extract some regions from some maf files, and it seems the way to do it in python would be using bx-python. I haven't been able to find a whole lot of tutorial-style documentation on using bx-python. I've got the indexed maf stuff working as explained on the bx-python wiki.

Say I've got a genomic coordinate in hg18 -- how would I get a python dictionary mapping each mammalian species to the corresponding sequence out of the indexed maf?

(I'm also willing to accept non-bx-python based answers if they don't require much extra coding on my part. Also, if there's any documentation on bx-python I'm missing, please let me know! All I'm really seeing is the docstrings and some scripts distributed with bx-python.)

python maf alignment • 5.3k views
ADD COMMENT
0
Entering edit mode

sorry I don't get it. Can you provide an example ?

ADD REPLY
5
Entering edit mode
14.0 years ago

Reading the source and example scripts is the best way to get familiar with bx-python. It has a lot of useful functionality so it's well worth the time. Here's a script that:

  • Indexes the MAF file
  • Retrieves all alignments in a region (organism, chromosome, start, end)
  • Slices the alignments by the region of interest
  • Builds a dictionary mapping organism+chromosome to sequence

Hope this helps with getting started:

import os
import sys
import subprocess

from bx.align import maf

def main(in_file, org, chrom, start, end):
    index_file = in_file + ".index"
    if not os.path.exists(index_file):
        build_index(in_file)

    region_name = "%s.%s" % (org, chrom)
    start = int(start)
    end = int(end)
    index = maf.Indexed(in_file, index_file)
    for align in index.get(region_name, start, end):
        region_align = align.slice_by_component(region_name, start, end)
        seqs_by_org = dict()
        for component in region_align.components:
            seqs_by_org[component.src] = component.text
        print seqs_by_org

def build_index(in_file):
    cl = ["maf_build_index.py", in_file]
    subprocess.check_call(cl)

if __name__ == "__main__":
    main(*sys.argv[1:])

Usage is:

% python maf_retrieve_region.py chr22.maf hg18 chr22 14430574 14430584
{'otoGar1.scaffold_92626.1-103747': 'tctttagaag',
 'tupBel1.scaffold_3803.1-85889': 'TTTTTAGGAG',
 'hg18.chr22': 'tcttaaaaag',
 'panTro2.chrUn': 'ttttaaaaag'}
ADD COMMENT
0
Entering edit mode

thanks, that's a good place to start from

ADD REPLY
0
Entering edit mode

In this short example code, the index.get() function will return a list if the queried region covers more than one alignment blocks, which will throw an error in the for cycle when executing .slice_by_component().

ADD REPLY

Login before adding your answer.

Traffic: 2640 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6