Question

Constructing Phylogenetic Tree Based On Pathogenicity Genes Of A Fungus

1

Entering edit mode

11.8 years ago

Natasya Azwin ▴ 10

Hi biostars,

I need to know if what I'm doing is makes sense. I am working on a fungus for my undergraduate thesis. I do literature review as well as bioinformatics analysis. I am not that familiar with bioinformatics stuff, so I really in need of help here.

For bioinformatics analysis, I'm interested in doing phylogenetic tree of some pathogenicity genes of this one fungus species and see if they are similar or related to other organisms. So here is what I do so far/ or the way that I can think of right now:

I selected 5 pathogenicity genes which are myosin-related (got the sequences from UniProt). Then, I BLAST them with the other organisms' databases (ie human, mouse, other fungi, bacterium etc.) Then, for each of the database 'blasted', I pick the sequence that has lowest E value or the first one appear on the list. Then, I aligned the sequences of the 5 pathogenicity genes with other collective sequences from other organisms using ClustalOmega. Followed by the construction of the phylogeny tree using the ClustalW2. I would really appreciate if anyone can tell me if I'm on the right track or not.

Also, seems that some of the 'first ranked' sequence from the databases have high E value (ie: closed to 1 instead of 0). So, are they reliable if I were to use them in constructing the phylogenetic tree?

On the other hand, I did try and 'play' with the BLAST. It appears that the 5 pathogenicity genes that are myosin-related have no significant similarity with human. Is anyone know what might have caused this? Is it explainable? Or it is because they really have no connection to each other? I have another 5 sets of pathogenicity genes but they are from various domain/types (all mixed up). If I use them to construct the phylogenetic tree, is it makes sense? (considering they are not from the same domains/types)

I really appreciate if anyone can comment on these questions and help me out here.

Thank you in advance!

phylogeny domain genbank genome phylogenetics • 5.7k views

ADD COMMENT • link updated 6.8 years ago by Biostar 20 • written 11.8 years ago by Natasya Azwin ▴ 10

score 3 · Answer 1 · 2013-03-01

Congratulations on your research start! I also got my start in research as an undergraduate studying fungi. It sounds like you are off to a good start. I hope you are working with someone to help guide you along the way.

You didn't give us any reason why you chose to study myosin-related pathogenicity genes, but I am sure there is a reason this will fit into a "broader research scheme" here. I wouldn't be surprised that you are not seing any strong homology with human genes; the fungi and animals diverged from each other a long, long time ago and have vastly different life strategies. If your phylogenetic trees are robust you will be able to eliminate sequences which should not be in your analysis. This is not necessarily easy to do, so I would work with your research advisor to figure this out. It's a subtle type of analysis which is hard to describe. You will need to have have a well-supported phylogenetic tree that you are able to discuss and inspect.

I'm not certain you should be using ClustalW to make a phylogenetic tree; you may be able to "make a tree" but it will not be the best tree you can make. I would choose a phylogenetic program (see below) which uses (at least) a maximum-likelihood approach, but more than one approach (such as adding a Bayesian approach) will help.

This is a highly simplified version of what I do when I am constructing phylogenetic trees: (1) first choose sequences representing your gene/proteins (you did this through BLAST), (2) align the sequences using a program and inspect the alignment by eye, (3) make sure the sequences are then in a text format that phylogenetic programs can work with (such as NEXUS or XML), (4) let the algorithms run on one or more (see here and here) phylogenetic programs, and (5) visualize your data, and repeat until you are comfortable with your phylogenetic tree (your evolutionary hypothesis).

Since you're new to phylogenetics I would recommend two books: Tree Thinking, more on the theory side of things, and Phylogenetic Trees Made Easy, more on the practical data analysis side of things.

Best of luck with you research and let me know how it's going! I'll keep my eyes open for your published paper!

score 1 · Answer 2 · 2013-03-02

1

Entering edit mode

11.8 years ago

aidan-budd 1.9k

From your description of the question you're trying to address, I'm not so sure that building phylogenetic trees is the best way (at least initially) for you to be going.

It sounds to me as though the key question is "Which biological functions are likely involved in the pathogenicity of this fungus?", and/or, more specifically, "What might be the function of these pathogenicity genes?"

In this case, the way I'd begin addressing this, similar to what you've done, is to run BLAST (and then if that doesn't get me significant matches, PSI-BLAST/HMMER/Interpro scan) to identify proteins, or regions of other proteins, that are likely to have similar structures/functions as your pathogenicity proteins. These functions (of the proteins which have been studied in more detail elsewhere) then become hypotheses for the function of your pathogenicity proteins.

Thus, I'd be focusing on carrying out sequence-similarity focused database searches.

When you draw a tree using a set of sequences, you are assuming/asserting already that the sequences have a similar structure/are "evolutionary related", and are then asking "what is the pattern of transfer of genetic information between the sequences I observe" - and it doesn't sound, to me, as though that kind of question/answer is going to be so useful in addressing the main questions I've suggested above.

Another issue - if the sequence similarity between sequences you are including in a multiple sequence alignment (MSA) that you then estimate a tree from are so divergent that you get such "high" E-values from them in BLAST, then any phylogeny-like structure you estimate from them is likely to be full of errors. You can still use this thing which looks like a phylogeny to make inferences about the degree of similarity between different sequences, but I would strongly recommend not making inferences about degree of relatedness etc. between the sequences.

And a final point - and E-value close to 1 (e.g. 0.5) from BLAST suggests that the sequences may well not have similar structure/function. When similarity is so low, then one thing to do is to turn to more sensitive methods (PSI-BLAST, HMMER, as I suggest above).

Hope this helps.

ADD COMMENT • link 11.8 years ago by aidan-budd 1.9k

0

Entering edit mode

Just to add to my answer above - before doing anything else, I'd spend some time specifying as clearly as possible the aim of the analysis, as what constitutes a good (or good enough) way to do the analysis depends crucially on the specific question of interest.

Feel free to discuss also what the main aim of the analysis is here - in my experience, clarifying this is often not easier, particularly for people just getting started with this kind of work, and we're happy to help also with this issue.

ADD REPLY • link 11.8 years ago by aidan-budd 1.9k

0

Entering edit mode

Hi Aidan,

Thank you so much for your responses!

After spend some time to figure out on my hypothesis' uncertainty, I think my question would be "Are chitin synthase pathogenicity genes of M. oryzae carried over/ conserved in other organisms?"- analyzed by performing phylogenetic tree and see the relationship/distant between them. Does it reliable/make any senses by evaluating the phylogenetic tree to know and determined the conservation of pathogenicity genes in a particular species? From my readings so far, it seems legit.

Looking forward on your comments and suggestions. Thank you in advance!

ADD REPLY • link 11.8 years ago by Natasya Azwin ▴ 10

0

Entering edit mode

Thanks, That helps.

However, I'm still unclear on:

the overall aim of the analysis - is it to identify potential pathogenicity genes in M oryzae? It sounds like you already know that particular chitin synthase genes in that organism are important for pathogenicity, in which case I guess you're not interested in identifying pathogenicity genes, as you/someone else has done that already. Do I understand that right?

2.what biological insight you expect to get from addressing the question "Are chitin synthase pathogenicity genes of M. oryzae carried over/ conserved in other organisms?" Is the idea that, if you find very similar sequences to these in other funghi, then they could also be pathogenicicity genes in those organisms?

ADD REPLY • link 11.8 years ago by aidan-budd 1.9k

0

Entering edit mode

Yes. I already have the list of chitin synthase pathogenicity genes along with their sequences and putative functions (Found them after doing some literature research and through UniProt)
I am sorry but I have difficulties to answer your question about biological insights of the proposed question. Can you enlighten me please? And yes, that what was I think of--- the one with the highest score from BLAST search from other organisms databases (say homo sapiens) could be carrying the pathogenicity genes (like the one in M. oryzae) as well.

ADD REPLY • link 11.8 years ago by Natasya Azwin ▴ 10

0

Entering edit mode

Do you mean that you might expect to find pathogenicity genes in homo sapiens? I assume not. I'm sorry, I'm just having trouble understanding the aim/focus of the analysis, and that makes it hard to give advice. Have you been given a title for this project? If you could post that, or something similar, it might help.

ADD REPLY • link 11.8 years ago by aidan-budd 1.9k

score 0 · Answer 3 · 2013-03-02

0

Entering edit mode

11.8 years ago

Rahul Sharma ▴ 660

HI,

I would download the protein sequences of all the Fungal species from JGI. Then find the Orthologs using OrthoMCL and grep out the secreted effectors or pathogenic genes, then use RAxML and/Or MrBayes for the phylogenetic analysis on the predicted OrthoMCL orthologs of pathogenic genes. I would do the whole analysis locally on Unix machine. Hope this would help you.

Best, Rahul

ADD COMMENT • link 11.8 years ago by Rahul Sharma ▴ 660

0

Entering edit mode

Thank you Rahul. I will check on your suggestions!

ADD REPLY • link 11.8 years ago by Natasya Azwin ▴ 10