Hi,
I have a file containing aligned dna sequences for a gene with 3 species :
>species1
ATGCTGATTCGATTGCGATTAGCTAGCTG
>species2
ATGCTGTTTCGATTTTTATTAGCTAGCTG
>species3
ATGCTCCTTCGATTGCGATTAGTTAGCTG
I would like to recover every sites that are different between the species 1 and species 2 (and is it a synonymous or non synonymous mutation) and also know if this position have more likely beed mutated on species 1 or species 2 based on the species 3 sequence (so if 1=A, 2=C and 3=A, then the mutation arised in species 2).
I tried to write a script in python but it doesn’t make differences between synonymous and non synonymous mutations .. I would like to know if something more reliable and commonly used exist ? Here is my script :
seq1=Species1
seq2=Species2
seq3=Species3
if seq3!="":
for i in range(0,len(seq1)):
if seq1[i]!=seq2[i]:
if seq3[i]==seq1[i]:
print("Mutation on seq2 : "+seq1[i]+"->"+seq2[i], i)
elif seq3[i]==seq2[i]:
print("Mutation on seq1 : "+seq2[i]+"->"+seq1[i], i)
else :
print("Non orientable mutation, seq1 : "+seq1[i]+" vs seq2 : "+seq2[i], i)
Thanks for the help,
Maxime Policarpo
Hello Maxime,
Whether the mutation is synonymous or non synonymous depends on where the translation starts and where introns are located. Without knowing that it will be impossible to say something about this. Or are the sequences you have cDNA (which might be as they all starts with ATG)?
fin swimmer
Hi,
Some of my genes are truncated so they don't all starts by an ATG but they are all in phase (by that i mean that the first letter of every sequences if the first base of the first codon) so there should not be any problem concerning reading frame !
Thanks
Ok,
there should be a way to do this using BioPython. Could you please provide an example of how your output should like for the input example you gave above?
Is it possible that both, species1 on and species2, have different bases compared to species3 on the same position? Could all species have different bases on the same position?
fin swimmer
Yes it is possible that the three species have a different base. In this case, i consider that this mutation is not orientable.
I think i can deal with every outputs you suggest. In my script i have an output that tells me the mutation and on which species it occured (A->T on species 2) but it dosn't really matter as long as i have the informations i want (synonymous or non synonymous + in which species)
Maxime
One more question. If it's important to know whether it is a synonymous change or not, wouldn't it be better to compare the whole triplet rather the single bases?
Yeah you are totally right, i aim to also study the impact of those mutations on the protein but for the moment, i am only interested in the number of those mutation rather than their impact on amino acids (in fact, my first aim was to calculate ka/ks on a set of genes, but i study two species that are so close that the task is impossible)
Hm,
I really think you must think more about if what information you need and how your output should look like.
With BioPython it is easy to check whether a base change will also lead to aminoacid change. But one have to look at all bases of a triplet at once, as you can have more than one change in that triplet. E.g. the Ref is AAA and your species have TTT. If I only check each position without looking at the other I came to a false result (TAA != ATA != AAT != TTT).
fin swimmer
Yeah i understand ! You are totally right, i think you are pointing the fact that telling if a mutation is synonymous or non-synonymous is very hard in the case of multiples mutations in the same codon ? Actually, i think that it would not be a problem in my datas, as two mutations in the same codon is almost impossible due to the divergence time between my species. (I believe that when calculating ka/ks ratios, a statistic model is used to deal with those problems right ? )
Here is my ideal output :
If bases of species 1 = species 2 --> print nothing If base of species 1 != species 2 --> print the base of species 1 and species 2 and species 3 and tell if it bring a amino acid change or not and if in the codon, multiple mutations arised, then don't print the non-synonymous/synonymous information
Thanks for you dedication to my problem !
Ok,
I will write something in python later. I'm short in time now.
Are all the sequences in one fasta file and is the id always "species1", "species2" and "species3" as you show in your first post? Or how does your input look like?
BTW: I removed your identical answer post to keep this discussion in the right order.
fin swimmer
Yeah thanks i didn't know how to delete it ^^. Yes my sequences are all in fasta with the same names !
(And the sequences are already aligned)
Thanks,
Maxime
You can delete your own post, by clicking on "moderate" below your post.
fin swimmer