I am interested in looking at the conservation of the Xenopus genome at certain positions. After that I want to be able to extract the entire alignment of this region (preferably as a fasta file). Suppose I am looking at the tyr gene in the Genome Browser, if I click on one of the exons, I get to a page that gives me the option for 'CDS Fasta Alignment'. Clicking on it, I look at an alignment but this alignment is much shorter than the tyr gene itself.
Am I doing something wrong ? Is there another way of getting the multiple alignment for regions of interest for a genome ?
I would want alignments for certain regions I am interested in. Just like I can view the positions I am interested in the genome browser and then click DNA to get the sequence. So, the whole files available for download dont exactly help me.
You can do this on the command line. Convert the alignments to BED with an
awk
/Perl/whatever statement, sort them withsort-bed
and run them throughbedops --element-of -1
to get elements within a particular genomic range. See the BEDOPS docs for binaries and examples: http://code.google.com/p/bedops/As a for instance:
All the needed data are there to turn this into a UCSC BED file with a Perl script (or a more convoluted
awk
script). You can then filter the BED file withgrep
and run set operations on it withbedops
to get elements of the desired genome and genomic range.