Get orthologous sequences between 2 files containing a set of seq fasta
2
0
Entering edit mode
6.6 years ago
Chvatil ▴ 130

Hi all the community! I explain what I need to do.

I actually have 2 files containing a set of genes sequences corresponding of 2 differentes species and what I need to do is to know between all those sequences, which are orthologous to be able to compare each pair of sequence (dN and dS).

Here is a hypotetical exemple of my file:

File 1 :

>seqB  (real name is seq 1)
AAAACCCCGGGGTTTTT
>seqE  (real name is seq 2)
ACCGGTTGACGGATGGAG
>seqC  (real name is seq 3)
AGGATTAGGATTAGGAAT

File 2:

>seqC  (real name is seq 1)
AGGACTAGGATTAGGAAA
>seqE (real name is seq 2)
ACGGGTTGACGGACGGAG
>seqB  (real name is seq 3)
AAAACCGCGGGGTTTAT

of course, none of those sequences has the same name.

And what I would like to do is to know which of them are orthologous, for exemple a file giving:

Orthologous genes between sp1 : sp2 
seq1 : seq3
seq2 : seq2
seq3 : seq1

Thank you very much for you help.

orthologous clustering gene • 1.6k views
ADD COMMENT
1
Entering edit mode
6.6 years ago
Sishuo Wang ▴ 230

For protein coding genes, you can try inparanoid, orthomcl, get_homologues,, orthofinder, and many other tools. I think for your purpose, you can translate them into amino acids first and then run family clustering using the above tool(s), as you mention that you were going to calculate dN and dS.

ADD COMMENT
0
Entering edit mode
6.6 years ago
Buffo ★ 2.4k

You need to use CD-HIT, especifically cd-hit-2d

ADD COMMENT

Login before adding your answer.

Traffic: 2929 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6