I agree with chrisamiller and PhiS. I'll just add that it also greatly depends on what you will do with your sequence. I understand from your question that:
- You have picked only 2 [bacterial] colonies for sequencing
- These colonies result from the cloning of a PCR product (?)
- They were sequenced using Sanger sequencing
[NOTE: when describing your problem it is very important to give these kind of details, so please correct me if my assumptions are wrong.]
I am guessing that:
- You might want to check that the sequence is correct (maybe verifying that your qPCR product is correct)?
- You might be cloning a gene (or fragment thereof) in order to express a protein?
[NOTE: here again, these kind of details are crucial in determining if you can accept an ambiguous base or not. Please add a comment or edit your post if it is yet another purpose]
Finally, as Istvan has asked, you need to be clear as to what the difference is: are you looking at a different base call between the two sequenced colonies or between the forward and reverse sequencing events?
If it is the first (i.e. difference between the two colonies) then you need to check the quality of the call at that base (quality scores if you have them, or look at the chromatogram to see if there's a mistake or a double pic etc.). If they are good quality, then you probably have at least these two different variants of the sequence you're targeting.
If it is the second (i.e. difference between the forward and reverse) then you should also look at the quality in each read. If they are bad quality, sequence again. If they are good quality, then I'm scratching my head making a funny face. Start over from scratch.
Now to your question about leaving it ambiguous or not:
- If you just wanted to check that the sequence is "fairly" OK, then fine, leave it as a Y.
- If you're checking the amplicon of a qPCR event, then it is crucial to know if you have only one sequence or two different ones (even if it's a SNP). This will change your interpretation.
- If you want to express a protein from this sequence, then you need to check if the difference (T or C) changes the resulting protein sequence: if yes, you need to choose the correct clone. If not, you can go with either.
Thanks all, yes I had two good reads on each strand and the single bp on one of the colonies was different from that of the other colony (I'd picked 3 originally, but one was just an insert). I went with picking another 2 colonies to be sure. I'm sequencing ~100 markers though, so I was trying to weigh up the extra $$ / time in sequencing another colony with the extra information a C or T gives me over a Y. This is only the 15th sequence or so and the first time this has happened, so I'll see how the others turn out before deciding on a general policy.
Actually, what I'd really like to know is when you go to publish a sequence like this, how much coverage should you have? Is it acceptable to put a sequence with a Y into genbank, because you didn't go to the effort / cost of re-sequencing to resolve it? Or does the Y represent natural variation... and how many would you need to sequence to answer that question... :)
In this case I sequenced more colonies and found a consensus sequence, however, I'm cloning PCR products, so I don't think it is possible to say that there could not be natural variation in the PCR amplicon pool. One good example would be a bacterium with 2 different 16s inside a single cell, this could produce 2 populations of PCR products. If the ratio were 1:3, how often would you have to sample cloned colonies in order to observe this natural variation?
Just to clarify the situation: in each of your two sequencing runs you found a single base difference between the two strands, and that difference was in the same position in both cases?
You should not put unreliable data into Genbank! Either you prove there is natural variation and you submit all the variants or you make sure you have reliable data and you don't have the problem.
In this particular case, you can not talk about natural variation: you are sequencing fragments you've cloned into a plasmid! Unless you've contaminated your prep with two colonies from the plate, all the plasmids in one prep should be identical. If you have ambivalent bases, then it's because your sequencing is of bad quality. You should never submit bad quality data to Genbank.