Entering edit mode
8.8 years ago
Mehmet
▴
820
Dear all:
I have completed orthomcl step. now I need to know single copy genes and their sequences to build a phylogenetic tree.
Is there any bioinformatics tool/script for this?
Why? Please specify what exactly you mean by single copy genes and why you think you need to restrict your analysis to those. Do you mean gene without paralogs in any species, or a subset of species? Do you want to remove genes with paralogs completely, or only per taxon? If so can't you simply filter your output by taxon? Those genes having only 1 homolog per taxon are then defined single-copy.
I need to use single copy genes for phylogenetic tree building. I mean genes without paralogous completely. I couldn't find any good explanation to do so. Some people used custom scripts to get single copy genes and their protein sequences to build phylogenetic tree. I tried the scripts but I got many errors.Do you have any solution?
Can you give me an example output of orthomlc? Still I am asking you why you only want to use single-copy genes, a few paralogues won't do any harm to phylogeny.
This is head part of the out file
groups.txt
.This is tail part of the
groups.txt
file.This is head of
orthologus.txt
file:I have seen on papers on which people used only single copy genes to build phylogenetic tree. By the way, thank you so much for your help.
Sorry, I do not recognize all of these identifiers (TCAS tribolium, SMED is maybe smedGD database, but too much guessing), maybe someone who understands the orthomlc output more can help you better. Doesn't the software also predict paralogs? If not you have to try to get the paralogs from e.g. ensembl biomart using these identifiers.