I am using esearch query as $query = "SS1G_03709+AND+gene[filter]";
, but it gives me all (gene+ mRNA+genome sequences). What filter do I need to use so I only get gene sequences in my search? I tried a few filters from here, but couldn't find anything to limit my search for genes.
@genomax Thank you, but my interest is to get the fasta for these, the actual gene records: https://www.ncbi.nlm.nih.gov/gene/?term=SS1G_01676
OR https://www.ncbi.nlm.nih.gov/nuccore/NW_001820834.1?report=fasta&from=1555069&to=1556099&strand=true
Is it possible?
You can use the
-format gene_fasta
option ofefetch
to get the FASTA sequence of the genes annotated on a genomic RefSeq as shown below:One issue that you may notice is that we are downloading the sequences of every gene annotated in FASTA format. While this may not be such a big deal for a handful of gene queries, this can become a performance issue with many many queries. In a situation like that, you can use the
-format ft
ofefetch
to first get the feature table; extract the coordinates for the gene of your interest and use bash scripting withefetch -seq_start ### -seq_stop ### -format fasta
to finally get the sequence of just the gene of your interest.Wonder if @MAPK wants is this kind of header
instead of
since the sequence should be identical.
Slightly off-topic: You clearly have deep knowledge about eUtils! Do you work at/for NCBI?
@vkkodali Thank you, this is what I wanted.
This should get you the gene record multi-fasta.