I'm looking to extract some regions from some maf files, and it seems the way to do it in python would be using bx-python. I haven't been able to find a whole lot of tutorial-style documentation on using bx-python. I've got the indexed maf stuff working as explained on the bx-python wiki.
Say I've got a genomic coordinate in hg18 -- how would I get a python dictionary mapping each mammalian species to the corresponding sequence out of the indexed maf?
(I'm also willing to accept non-bx-python based answers if they don't require much extra coding on my part. Also, if there's any documentation on bx-python I'm missing, please let me know! All I'm really seeing is the docstrings and some scripts distributed with bx-python.)
Reading the source and example scripts is the best way to get familiar with bx-python. It has a lot of useful functionality so it's well worth the time. Here's a script that:
Indexes the MAF file
Retrieves all alignments in a region (organism, chromosome, start, end)
Slices the alignments by the region of interest
Builds a dictionary mapping organism+chromosome to sequence
Hope this helps with getting started:
import os
import sys
import subprocess
from bx.align import maf
def main(in_file, org, chrom, start, end):
index_file = in_file + ".index"
if not os.path.exists(index_file):
build_index(in_file)
region_name = "%s.%s" % (org, chrom)
start = int(start)
end = int(end)
index = maf.Indexed(in_file, index_file)
for align in index.get(region_name, start, end):
region_align = align.slice_by_component(region_name, start, end)
seqs_by_org = dict()
for component in region_align.components:
seqs_by_org[component.src] = component.text
print seqs_by_org
def build_index(in_file):
cl = ["maf_build_index.py", in_file]
subprocess.check_call(cl)
if __name__ == "__main__":
main(*sys.argv[1:])
In this short example code, the index.get() function will return a list if the queried region covers more than one alignment blocks, which will throw an error in the for cycle when executing .slice_by_component().
ADD REPLY
• link
updated 5.2 years ago by
Ram
44k
•
written 9.1 years ago by
mt1022
▴
310
sorry I don't get it. Can you provide an example ?