Entering edit mode
3 months ago
雨
▴
20
As shown in NCBI, we have a lot of genomic data. We want to focus on the CDS of the coding region of a gene we are interested in in different species. For annotated species, I can find it in the genome annotation file or on the NCBI website, but for unannotated genomes, we have encountered some difficulties. For example, for the unannotated genomic data in the figure, can anyone give suggestions? (Plus we may need a streamlined script to achieve this, so we can't search and compare manually)
Does not seem like there is any option but to download the genomes locally and blast against them. This could be a huge task. You may be able to use web blast against the
wgs
database but there will be limits on number of results you can retrieve.so it is seems that the only way to do this things is maybe the blast? blast in the local maybe more easier? what do you think?
With unannonated genomes yes. You have eukaryotes selected in the screenshot above? Is that what you are interested in? You could pare the list down to one representative per genus/family to reduce the search space. "local" as on your infrastructure (truly local or cloud) may be the only viable option.