Merge "NCBI species" data within Ensembl alignment
0
2
Entering edit mode
8.3 years ago
tlorin ▴ 370

Apologies if this is a really naive question, but I cannot figure out how to do this easily. Here is a related post regarding the best method to find orthologous genes of a species.


Let's say I have a protein alignment downloaded from Ensembl (coming, for instance, from this Ensembl tree).

This gene is present in some other "NCBI species" that I would like to include in my tree (for instance, Stegastes partitus, with available genome and present in the NCBI database but NOT in the Ensembl database). Indeed, if I manually blastp asip protein sequence of D. rerio (extracted from my Ensembl multifasta protein alignment) onto nr database parsed for S. partitus, I find this sequence, corresponding to the first blast hit. Perfect! And I can manually append it to my initial protein tree.

Where the problem starts is that I don't have one gene and one NCBI species but many of them (let's say p genes and n NCBI species). I already have an Ensembl protein multifasta file for each of my p genes.

My question is: is there an easy way to append to each of my p multifasta files the corresponding homologous protein sequence(s) of the n "NCBI species"?

Thanks for any insight!

ensembl ncbi blast phylogeny • 1.6k views
ADD COMMENT
0
Entering edit mode

Since no one has said anything I will take a stab.

I don't think it would be possible to easily script what you are asking for. There are decisions that need to be made about what to select (from a different site/database) and then add that information to a second site.

ADD REPLY
0
Entering edit mode

Like genomax said, this isn't trivial.

What you may want to do is consider working the other way, take your known aisp example (e.g. zebrafish) and blast against some list of species you are interested in. I'm picturing something where you would have a list of taxonomy IDs for species you're interested in, then blast your reference against the NR database filtered against each species. If I recall correctly, you can setup the blast output format to include the sequence of the hit.

After that, you'd have to reconstruct the tree, which is a whole different issue. I'm not sure how you are manually adding things to your tree.

ADD REPLY

Login before adding your answer.

Traffic: 1326 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6