Hi list,
I am trying to replicate something similar to Region miner (Genomatix) does when looking for orthologous sequences:
In a first step, homologous loci in the target organisms are searched in the ElDorado database (see Comparative Genomics). If no such loci are found, the flanking genes (up to 20 loci in both directions) are considered to find a syntenic region in the target organism. For the definition of a syntenic region, the two homologous genes in the target organism need to be on the same contig and must show the same relative strand orientation as the genes in the source organism.
In a second step, the input sequence is aligned to the syntenic region using a Smith-Waterman alignment. If the alignment fulfills the following criteria, the target region is listed in the output:
- the alignment contains a highly conserved 50 bp stretch
- the alignment must not be longer than 1.5x the length of the input region
- a sufficient overall alignment quality is reached"
I have a fasta file with different human regions (not all of them lies on coding regions). My approach would be:
- Map the fasta file using Blast to get genomic coordinates and create a bed file
- Using Rest-ensembl-API alignment/region/:species/:region to get the alignment of orthologous regions in different mammal species
- Filter the alignments based on the Genomatix criteria
Do you think that this is a valid approach?
Thanks