Hi, Im a begginer in MEGA software. I have 89 protein sequence for which I need to construct a phylogenetic tree using bootstrap method with 1000 replication with data set parameter with complete deletion. But I am not able to construct a tree because of 3 sequence whose protein length is very less when compared to other 86 sequence. Even I tried by deleting non conserved regions in all protein sequence but still I am not able to get a tree because the size of the smaller proteins become smaller and smaller. Kindly help me out in solving this problem.
The 3 short protein sequence are upregulated in abiotic stresses. Is it ok if i omit the sequence because they have role in abiotic stresses?
Moreover, I have selected these 3 proteins for my experiments and its ongoing with RT-PCR and Real Time PCR. So is there any possibilty to include these 3 sequence?
Regardless of the approach or program you are using, the input for any phylogenetic estimation approach is an alignment, i.e., an inference of homology. Therefore, by necessity, your sequences must have a shared ancestry to even begin to infer a phylogeny. If the sequences are shorter but homologous, a multiple sequence alignment (of nucleotides or amino acids or both via a translation alignment for protein-coding sequences) ought to resolve the sequences by introducing gaps - insertions or deletions. It sounds like you're not doing this; when you say
The 3 short protein sequence are upregulated in abiotic stresses. Is it ok if I omit the sequence because they have role in abiotic stresses?
It suggests that your dataset may consist of multiple proteins, not the same protein across samples, which is a completely inappropriate input for phylogenetic techniques.
In other words, your workflow would be:
Construct a dataset of the same locus across all samples
Align the amino acids or nucleotides
Model selection for ML analysis or NJ distance corrections/uncorrected NJ/UPGMA/etc.
[If you decide to use a model: With an appropriate model, any likelihood (maximum likelihood or Bayesian) approach.]
Bootstrapping etc. for support.
If you do have sequences with a shared history, I would follow Istvan Albert's recommendation and remove the short sequences if they are truly unalignable.
Thank you Mr. Brice. I will follow ur guidelines and I will try to solve this issue. I will get back to you if i still face the same problem again and again. Once again thanks for ur idea.
get rid of the short proteins ... if they are short they cannot be aligned and do not contribute information anyhow
The 3 short protein sequence are upregulated in abiotic stresses. Is it ok if i omit the sequence because they have role in abiotic stresses?
Moreover, I have selected these 3 proteins for my experiments and its ongoing with RT-PCR and Real Time PCR. So is there any possibilty to include these 3 sequence?
when doing science you can easily end up with unsolvable situations - in that case you have to find something else to move forward