Try this search term:
esearch -db nuccore -query 'txid7157[ORGN] AND ("cytochrome c oxidase 1"[Title] OR "cytochrome oxidase subunit I"[Title] OR COI[Title] OR COXI[Title] OR COX1[Title] OR "COX 1"[Title] OR "COX I"[Title] OR CO1[Title] OR C01[Title] OR "cytochrome oxidase I"[Title] OR "cytochrome oxidase subunit I"[Title] OR "cytochrome oxidase subunit 1"[Title] OR "cytochrome oxidase 1"[Title] OR "cytochrome c oxidase subunit I"[Title] OR "cytochrome c oxidase subunit 1"[Title])' | efetch -format fasta
You might run into an issue I've had before and described on my blog with Entrez and bash evaluation.
Edit: (
Easiest way to get the Entrez-tools is to use conda:
conda install -c bioconda entrez-direct
Explaining the search term:
I search for taxonomy ID 7157, Culicidae https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi
Then I AND that taxonomy ID with a bunch of different spellings of COI I've seen for COI; sometimes it's COX1, sometimes it's just the long name, sometimes it's COX 1, etc.
Then whatever esearch returns is piped to efetch, which does the job of actual downloading: in this case I ask for the format fasta, so give me fasta files. You can also ask for XML etc. pp.
)
One alternative is to use BOLD, as that is COI only.
https://boldsystems.org/index.php/databases
Culicidae is even the example term :) Type it into the search field, then on the Results page click on the blue FASTA button on the top right to download. The BOLD sequences get mirrored to NCBI so you'll have a subset of COI genes that you'd get via Entrez, but you'll also have less noise, incomplete COIs, etc.
One option to try: https://www.ncbi.nlm.nih.gov/gene?term=cox1%5BAll%20Fields%5D%20AND%20Anopheles%20%5Borgn%5D%20AND%20alive%5Bprop%5D&cmd=DetailsSearch
Do you need the actual sequence?
yes. I need the actual sequence. As I am trying to build a customized kraken2 database with them.
Thanks for your response!