I am trying to extract blocks of alignments from Ensembl or UCSC whole genome alignment files in MAF format given an organism, chromosome and start-end position. For example, I want to extract the block from a maf file that encompass rat chromosome 1 from sequence position 236456 to 236723. And it should output the block alignment of this rat region with all the species that are aligned. Note please that my organism is not the reference organism in MAF and I have sequence positions.
Basically I have a bed file of some genomic regions in rat (sequence coordinates), and I want to extract the alignment that cover these regions in the maf files in other species.
I found a few utilities but couldn't get them working for my problem.
maf_parse in PHAST http://compgen.cshl.edu/phast/help-pages/maf_parse.txt
WGA_Bed https://github.com/henryjuho/maf_parse
Any help will be much appreciated. Thanks.
If you aren't wedded to MAF format, you can get FASTA alignments of large sequences from
Kalign
, and you may well find more tools to give you back 'slices' of sequence with a fasta than with MAF - I know I've struggled with MAF in the past.