Entering edit mode
6.4 years ago
Zhenpeng Yu
▴
20
orthofinder -fg RESULTS_DIR -M msa -oa
The command described above does not yield a single copy of the homologous gene.
I also have a question here, that is, we use protein sequences for alignment and clustering analysis, and if I finally want to get the corresponding nucleotide sequence, what should I do?
Thanks in advance!
As you already used protein sequences of 8 marine species and found 8000 single copy genes (proteins) in result directory with orthogroups assigned by orthofinder, you can pick gene ids of each species from
SingleCopyOrthogroups.txt
file,like this:
then you can retrieve nucleotide sequences of
OrthologsIDS.txt
from species`s nucleotide files.You're amazing. The
grep
command is so cool that the problem is solved instantly. I sincerely thank you.The grep command is not working I tried for my data. Help is needed @ Mehmet.
We could do with a little more info here. Eg. what species are you working with? what data do you have at hand? What is the ultimate goal?
what do you mean with
thx
I'm interested in eight marine mammals, and now I've found more than 8000 single-copy homologous genes through orthofinder, and I want to extract the results through programs or scripts, but I'm not very good at programming myself. So ask if there is a corresponding software or script.
Thanks!
OK, so you want to obtain all the single-copy genes that are present in each of the 8 mammals , or simply all single copy genes (which can be only present in a few species) ?
Without going into detail: all this info can be parsed from the CSV output files you should get. Do you get those CSV files?
Can you also explain why you use that specific orthofinder cmdline? It's not wrong but it's not the 'default' one either
The command line I described above is from the orthofinder description document.
This issue has also been discussed, with the following links https://github.com/davidemms/OrthoFinder/issues/154
Hello, I want to know how do you sort out the sequence IDs contained in each homology group after you get the OrthologsIDS.txt file? thank you very much
Can you define your question with more details? What do you want to do? In OrthologsIDS.txt file there are ortholog families with one gene in each ortholog family from each species.
I have obtained single copy orthogroups and corresponding amino acid sequence files containing orthogroups. How can I obtain the corresponding nucleotide sequence based on the id of these amino acid sequence files (such as XPxxxx of species A)? Thanks in advance.
You need something like this?
extract sequences based on ids file
Thank you very much. It solved my problem.