Question

cox1 genes Mosquitoes

0

Entering edit mode

19 months ago

doppelganger1030 • 0

Hallo!

Please I am having difficulty trying to download specific genes from the ncbi. Context: I need to download the cox1 genes from several mosquito species for example : the anopheles species - https://www.ncbi.nlm.nih.gov/gene/?term=anopheles+cox1

Manually downloading them is tasking, is there a more simplified way to go about this. I tried with entrez, but somehow the genes downloaded were repeated and some were not relevant.

I will appreciate help from anyone. Thank you

python metagenomics • 1.5k views

ADD COMMENT • link 19 months ago by doppelganger1030 • 0

1

Entering edit mode

One option to try: https://www.ncbi.nlm.nih.gov/gene?term=cox1%5BAll%20Fields%5D%20AND%20Anopheles%20%5Borgn%5D%20AND%20alive%5Bprop%5D&cmd=DetailsSearch

Do you need the actual sequence?

ADD REPLY • link 19 months ago by GenoMax 151k

0

Entering edit mode

yes. I need the actual sequence. As I am trying to build a customized kraken2 database with them.

Thanks for your response!

ADD REPLY • link 19 months ago by doppelganger1030 • 0

score 2 · Answer 1 · 2023-10-29

Try this search term:

esearch -db nuccore -query 'txid7157[ORGN] AND ("cytochrome c oxidase 1"[Title] OR "cytochrome oxidase subunit I"[Title] OR COI[Title] OR COXI[Title] OR COX1[Title] OR "COX 1"[Title] OR "COX I"[Title] OR CO1[Title] OR C01[Title] OR "cytochrome oxidase I"[Title] OR "cytochrome oxidase subunit I"[Title] OR "cytochrome oxidase subunit 1"[Title] OR "cytochrome oxidase 1"[Title] OR "cytochrome c oxidase subunit I"[Title] OR "cytochrome c oxidase subunit 1"[Title])' | efetch -format fasta

You might run into an issue I've had before and described on my blog with Entrez and bash evaluation.

Edit: ( Easiest way to get the Entrez-tools is to use conda:

conda install -c bioconda entrez-direct

Explaining the search term:

I search for taxonomy ID 7157, Culicidae https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi Then I AND that taxonomy ID with a bunch of different spellings of COI I've seen for COI; sometimes it's COX1, sometimes it's just the long name, sometimes it's COX 1, etc. Then whatever esearch returns is piped to efetch, which does the job of actual downloading: in this case I ask for the format fasta, so give me fasta files. You can also ask for XML etc. pp. )

One alternative is to use BOLD, as that is COI only. https://boldsystems.org/index.php/databases

Culicidae is even the example term :) Type it into the search field, then on the Results page click on the blue FASTA button on the top right to download. The BOLD sequences get mirrored to NCBI so you'll have a subset of COI genes that you'd get via Entrez, but you'll also have less noise, incomplete COIs, etc.