Hello everyone. I am new here so forgive me if I am not doing things in a most correct way. After successfully performing genome assembly ( I hope) from a particular species, I need to provide an estimation of how many cytochromes genes are present in my assembly. I really struggling to come up with a strategy to approach this task. Would you use BLAST to compare/find regions of similarity with other known sequences? But how do I do that with a particular gene (CYP in this case)?. Any help would be very much appreciated. Thank you very much.
ps - this is related to some MSc coursework
Gonçalo
Is there a related genome available that is annotated? You could start with that and compare. It would also give you some idea of how good your assembly is.
BLAST would be a good way to start to look at individual genes. Have you done gene predictions? Preferably do your comparisons at protein level to have confidence in the results.
If the gene is expected to be multi-copy then your assembly may have collapsed those copies if you did not have long read data. So keep that in consideration.
Thank you very much for your reply. I didn't do gene predictions yet. Would you suggest using something like MAKER for gene prediction and then use BLAST to find regions of similarity with the reference genome? My task is basically to perform a de novo assembly for the fire ant Solenopsis Invicta as, apparently, the official genome assembly is quite fragmented. Then, as part of the same assessment, I am being asked to estimate the number of CYP genes in my assembly.
Running MAKER on your assembly would be fine. Since it is an annotation pipeline it will produce valued added results for the whole genome. You may also want to run BUSCO to see if your assembly is reasonably complete.