Dear Friends, Hi ( I'm not native in English so, be ready for some possible language flaws).
I have used blastx to find the protein related of my de novo fish RNA-seq transcripts with well-annotated fish protein fasta files from Ensembl.
There is about 13 fish species there and I have run blastx against all of them.
Now I want to check that how many unique hit are there in total , (e.g. I have about 25000 hit in Zebrafish and 23900 in Medaka that I imagine that most of them must be same proteins BUT may be it is a protein hit with Zebrafish that is absent in Medaka fish and vice versa).
I intend to use Venn diagram for this purpose but as the IDs of ensemble protein (and gene) is uniqly formed for each species, this approach is useless.
Medaka Zebrafish
ENSORLP00000000001 ..................................................... ENSDARP00000002953
Please help me in this regard and thank you in advance
By the way, I need these all unique protein hits as "background" list for pathway analysis for example in GOrilla or DAVID.
One suggestion would be to make clusters of orthologs and paralogs. I have used inparanoid for this http://inparanoid.sbc.su.se/cgi-bin/faq.cgi#how (standalone version).
Dear microfuge, Hi and thank you
I have checked the website and I could not realize that how it can convert 13 Esembl species protein long list of IDs into just one species ID (e.g zebrafish) that I can find out which of them are unique and which are repeats?
Hi There is a multispecies version of inparanoid called multiparanoid http://multiparanoid.sbc.su.se/help.html It says it can take multiple species. To be frank I have not used it and have only used pairwise inparanoid only.