However I am unsure how to go about it. Should I used my genome as a blast db then blast the two known genomes against this?
If so do I then take the genes p
resent and absent in both known genomes and blast them against each other?
One option would be to query the proteins of the 3 genomes against e.g. pfam (with hmmer) and then extract the number of shared features between the proteomes from that. Also, maybe they tell in the text or MM what they actually did there..
ADD COMMENT
• link
updated 2.5 years ago by
Ram
44k
•
written 9.7 years ago by
HG
★
1.2k
0
Entering edit mode
Hi HG,
Thanks for your reply.
The genome is annotated however all three genomes have different gene names.
Can you explain step two in more detail?
ADD REPLY
• link
updated 2.5 years ago by
Ram
44k
•
written 9.7 years ago by
kxd419
▴
10
0
Entering edit mode
Extract all the gene from each file > blast all vs all (with your desire cutoff value using cdhit) > You will get a list unique sequence and share sequence > count the number and plot
ADD REPLY
• link
updated 2.5 years ago by
Ram
44k
•
written 9.7 years ago by
HG
★
1.2k
0
Entering edit mode
I don't see this for a set of three transcriptomes, it only presents option for comparing two nucleotide databases, can you explain how you do this if you have three databases?
Thank you
and what is the difference for you? It still consists of fasta files with sequences right?
The question is how do you do it for three sets of 'genes' (if you want) instead of two. The problem is that what you propose doesn't work when the gene names are not the same, and when there is gene expansion number in one genotype versus another. Also you need to do best reciprocal blast not just all_vs_all
Hi HG,
Thanks for your reply.
The genome is annotated however all three genomes have different gene names.
Can you explain step two in more detail?
Extract all the gene from each file > blast all vs all (with your desire cutoff value using cdhit) > You will get a list unique sequence and share sequence > count the number and plot
http://weizhongli-lab.org/cd-hit/
I don't see this for a set of three transcriptomes, it only presents option for comparing two nucleotide databases, can you explain how you do this if you have three databases? Thank you
we are talking here "bacterial genome" not transcriptomes!!!!
and what is the difference for you? It still consists of fasta files with sequences right? The question is how do you do it for three sets of 'genes' (if you want) instead of two. The problem is that what you propose doesn't work when the gene names are not the same, and when there is gene expansion number in one genotype versus another. Also you need to do best reciprocal blast not just all_vs_all