Question

how to find a particular orthologous gene in a set of eukaryotic genomes

0

Entering edit mode

10.9 years ago

natasha.sernova ★ 4.0k

Dear all,

I would like to understand how to find a particular gene (the orthologous genes) in a set of eukaryotic genomes. The simplest way I see is to divide each chromosome into LOCUS-blocks, then read each block line by line, and if a particular name was encountered, save that block.

But I see several problems. Is it correct that all these orthologs will have the same name?

I am not completely sure.

Another thing is the following - when I am reading the LOCUS block, I cannot stop reading after the first gene name appearance. I have to finish the reading line by line without paying attention to the next chance to see the same name - a particular name can be encountered several times per block. If I see it for the second time, the block should acquire another weight.

The gene name is usually in quotes. It doesn't matter for Linux search, isn't it?

my $block="";
my $blockisgood=0;
# Zero for false, 1 for true
# reading from input-file
while($line=<IN>) {
            next if $line =~ /^\s$/; # skip empty line
            if($line =~ /^LOCUS/) { #starting a new block
            # but print old block if it was good
                       if($blockisgood) {
                                   print OUT $block;
                       }
            # and reset
                       $blockisgood=0;
                       $block=""; 

            # now check for blocks we are looking for
            if($line =~MYGENE/) {$blockisgood=1;}                
            }
            $block .= $line;
}

#the last block printing to output
print OUT ($block) if $blockisgood;

This doesn't want to work for my gene. Please, help me!

Thank you very much!

Natasha

genome gene • 3.0k views

ADD COMMENT • link updated 3.6 years ago by Ram 45k • written 10.9 years ago by natasha.sernova ★ 4.0k

1

Entering edit mode

Hi,

Genes have not necessarily the same name (even if annotators try to do that the most as possible). It will depend of genomes you use.

If i well understand you try to do analyse on synthenic regions. If information (i.e gene name) are not common between your different genomes you have to verify if the genes are orthologs. To do that, the most accurate (but the most difficult) is to use a phylogenetic approach. Most of people prefer use the approach of similarity between the sequences in order to define (assume) the relationship between the sequences. It is really easier to setting up but is bit less accurate.

If your genomes are known (as example present in Ensembl database) you can also use their relationship annotations between the sequences of different species.

ADD REPLY • link updated 3.6 years ago by Ram 45k • written 10.9 years ago by Juke34 9.2k

0

Entering edit mode

Yes, you are absolutely right!

I try to analyse synthenic regions. Could you, please, give me some details - how to use a phylogenetic approach as the most reliable one? What tools do exist for doing that?

Will Ensembl help with different ortholog names? I don't need just the closes right-left neghbour, I would like to see at least a few genes to the left and to the right. How to do it correctly?

Many thanks!

ADD REPLY • link updated 3.6 years ago by Ram 45k • written 10.9 years ago by natasha.sernova ★ 4.0k

1

Entering edit mode

I can advise you to read this: http://www.ncbi.nlm.nih.gov/pubmed/19740451

It seems to me that I already saw automated tools for syntenic region analysis/detection in several congress. I think you should to spend more time on your bibliography.

If your genomes are in Ensembl Database and you use the Perl programmation, it should not be to difficult to program a pipeline that does what you want. For each gene it is possible to know the localisation and the list of ortholog/paralog genes.

ADD REPLY • link updated 3.6 years ago by Ram 45k • written 10.9 years ago by Juke34 9.2k

0

Entering edit mode

Thank you, it's a very nice paper! I will ask for their code.

ADD REPLY • link updated 3.6 years ago by Ram 45k • written 10.9 years ago by natasha.sernova ★ 4.0k

Ram · Answer 1 · 2014-07-22

1

Entering edit mode

10.9 years ago

Josh Herr 5.8k

This is a common question here; it looks like you didn't search much before posting your question.

This question is a great place to start -- you'll want to craft the answers here to accommodate different genomes but a one-to-one BLAST would be the way to go.

These questions may help you also:

You should also check out OrthoMCL, among other tools.

ADD COMMENT • link updated 3.6 years ago by Ram 45k • written 10.9 years ago by Josh Herr 5.8k

0

Entering edit mode

Yes, you are right, I didn't search a lot, sorry! You gave me a perfect link as an example, I will try to use their approaches.

Many thanks!

ADD REPLY • link updated 3.6 years ago by Ram 45k • written 10.9 years ago by natasha.sernova ★ 4.0k