cox1 genes Mosquitoes
1
0
Entering edit mode
12 months ago

Hallo!

Please I am having difficulty trying to download specific genes from the ncbi. Context: I need to download the cox1 genes from several mosquito species for example : the anopheles species - https://www.ncbi.nlm.nih.gov/gene/?term=anopheles+cox1

Manually downloading them is tasking, is there a more simplified way to go about this. I tried with entrez, but somehow the genes downloaded were repeated and some were not relevant.

I will appreciate help from anyone. Thank you

python metagenomics • 1.1k views
ADD COMMENT
0
Entering edit mode

yes. I need the actual sequence. As I am trying to build a customized kraken2 database with them.

Thanks for your response!

ADD REPLY
2
Entering edit mode
12 months ago

Try this search term:

esearch -db nuccore -query 'txid7157[ORGN] AND ("cytochrome c oxidase 1"[Title] OR "cytochrome oxidase subunit I"[Title] OR COI[Title] OR COXI[Title] OR COX1[Title] OR "COX 1"[Title] OR "COX I"[Title] OR CO1[Title] OR C01[Title] OR "cytochrome oxidase I"[Title] OR "cytochrome oxidase subunit I"[Title] OR "cytochrome oxidase subunit 1"[Title] OR "cytochrome oxidase 1"[Title] OR "cytochrome c oxidase subunit I"[Title] OR "cytochrome c oxidase subunit 1"[Title])' | efetch -format fasta

You might run into an issue I've had before and described on my blog with Entrez and bash evaluation.

Edit: ( Easiest way to get the Entrez-tools is to use conda:

conda install -c bioconda entrez-direct

Explaining the search term:

I search for taxonomy ID 7157, Culicidae https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi Then I AND that taxonomy ID with a bunch of different spellings of COI I've seen for COI; sometimes it's COX1, sometimes it's just the long name, sometimes it's COX 1, etc. Then whatever esearch returns is piped to efetch, which does the job of actual downloading: in this case I ask for the format fasta, so give me fasta files. You can also ask for XML etc. pp. )

One alternative is to use BOLD, as that is COI only. https://boldsystems.org/index.php/databases

Culicidae is even the example term :) Type it into the search field, then on the Results page click on the blue FASTA button on the top right to download. The BOLD sequences get mirrored to NCBI so you'll have a subset of COI genes that you'd get via Entrez, but you'll also have less noise, incomplete COIs, etc.

ADD COMMENT
1
Entering edit mode

I think it may help to assume that the OP doesn't know about e-utilities and doesn't have them installed, so providing a link or explanation about them would be helpful. In my experience a minority of users on this forum use those programs, and they are not in a casual-user category.

ADD REPLY
2
Entering edit mode

Thanks, I added a paragraph

ADD REPLY
1
Entering edit mode

Link for information --> EntrezDirect utilities.

ADD REPLY
0
Entering edit mode

Great! I will try it and give a feedback! Thanks so much for taking time out to explain it!

ADD REPLY

Login before adding your answer.

Traffic: 1995 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6