Entering edit mode
7.6 years ago
felipe-bita
•
0
Hi guys, i am trying to identify single copy genes in rice, what i've done so far was blast the CDS against itself and parsing the results with bioperl. My problem now is that i don't know what i use to consider a hit as a gene copy, i am calculating the proportions, alignmentlength/querylength and alignmentlength/hitlength, but what value should i use to define as a copy? I have not found anything in the literature, if someone has information to give me or a better strategy i would be very grateful.
You can use blast e-values to discern homology. If there is homology, there was a gene duplication (or domain shuffling) at some point. However, I am not sure you want to remove ALL duplicated genes, may be only those which duplicated after some point in history (e.g. divergence of rice from another plant species)?