Inquiry Problem Of Bx-Python When I Extract Block From Different Strand In Maf Files.
1
1
Entering edit mode
13.2 years ago
Ning-Yi Shao ▴ 390

Here I met a problem when I just followed this post to extract blocks from MAF files. Saying, the content of MAF named chr1_fake.maf is:

##maf version=1
a score=0.0
s SpeciesA.chr1      0 50 +    547496 acACTTGTGGCAGCATTGAAGGCTTCACTCTTCCCCAAGGGATATTATTA
s SpeciesB.chr1      0 50 -    547496 acACTTGTGGCAGCATTGAAGGCTTCACTCTTCCCCAAGGGATATTATTA

The strands of the block are different. And I indexed it as:

maf_build_index.py chr1_fake.maf

Then I followed the abovementioned post to do the inquiry:

from bx.align import maf

in_file = "chr1_fake.maf"
index_file = "chr1_fake.maf.index"

start = 0
end = 30
region_name = "SpeciesB.chr1"
index = maf.MAFIndexedAccess(in_file, index_file)
for align in index.get(region_name, start, end):
    region_align = align.slice_by_component(region_name, start, end)
    for component in region_align.components:
        print component.src, component.start, component.end,\
        component.text

Then bx-python report no hit at all. Then I modified the strand of second alignment to "+", and indexed again, and it worked. New content of MAF:

##maf version=1
a score=0.0
s SpeciesA.chr1      0 50 +    547496 acACTTGTGGCAGCATTGAAGGCTTCACTCTTCCCCAAGGGATATTATTA
s SpeciesB.chr1      0 50 +    547496 acACTTGTGGCAGCATTGAAGGCTTCACTCTTCCCCAAGGGATATTATTA

And the result is:

-bash-3.00$ maf_build_index.py chr1_fake.maf 
-bash-3.00$ python test.py
SpeciesA.chr1 0 30 acACTTGTGGCAGCATTGAAGGCTTCACTC
SpeciesB.chr1 0 30 acACTTGTGGCAGCATTGAAGGCTTCACTC

So my question is that how can I do the inquiry of the alignment by bx-python even it is on the reversed strand? I know this question is naiive, and any reply and hint are welcome, thank you.

python alignment multiple • 2.6k views
ADD COMMENT
1
Entering edit mode
13.2 years ago

When the alignment is on the reverse strand, then the start coordinate is relative to the end of the sequence; from the UCSC MAF format description:

start -- The start of the non-aligning region in the source sequence. This is a zero-based number. If the strand field is '-' then this is the start relative to the reverse-complemented source sequence (see Coordinate Transforms).

Changing the code to:

start = 547466
end = 547496

extracts the region from your test file.

ADD COMMENT
0
Entering edit mode

Thank you! Yes, it is done!

ADD REPLY

Login before adding your answer.

Traffic: 2234 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6