Identify homolog sequences that appear in multiple RefSeq files
1
0
Entering edit mode
9.6 years ago
sam • 0

Hello,

I have a set of 4 RefSeq fasta files belonging to 4 different species. I'm trying to identify homologs that appear in all 4 RefSeq files. Whats the best and most efficient way to do that? Please note that I only want to identify sequences that are homologous in all 4 files.

genome • 1.9k views
ADD COMMENT
0
Entering edit mode

OrthoMCL might be helpful.

It might take a long time to run the whole analysis for all of your species.

ADD REPLY
0
Entering edit mode
9.2 years ago
Siva ★ 1.9k

Depending on the species in your dataset, you can use any of the precomputed resources from NCBI

These resources provide FTP files that have either protein GI or accessions.

Or you could use one of the sequence clustering tools (BLASTClust, UCLUST or CD-HIT), though they can be less accurate.

ADD COMMENT

Login before adding your answer.

Traffic: 1827 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6