Search genbank for records that have a specific field
0
0
Entering edit mode
2.2 years ago
tbayer ▴ 50

HI!

Can I use andvanced search on NCBI (or some other method) to find Genbank records that have a specific field, independently of what that field contains?

Specifically I would like to filter records that have the "specimen_voucher" field under Features/Source. Or even more specifically I have a list of Genbank IDs and would like to filter out those that do not have this field.

genbank NCBI • 794 views
ADD COMMENT
0
Entering edit mode

Can you provide example of GenBank ID. Without that information this is example query. Count field below shows how many entries.

Using Entrezdirect:

$ esearch -db biosample -query "specimen_voucher"
<ENTREZ_DIRECT>
  <Db>biosample</Db>
  <WebEnv>MCID_631896eb1fc13d1b281a6e9a</WebEnv>
  <QueryKey>1</QueryKey>
  <Count>184220</Count>
  <Step>1</Step>

</ENTREZ_DIRECT>

For nucleotide database

$ esearch -db nuccore -query "specimen_voucher"
<ENTREZ_DIRECT>
  <Db>nuccore</Db>
  <WebEnv>MCID_6318970a4cbf63655f3cbfe7</WebEnv>
  <QueryKey>1</QueryKey>
  <Count>65890</Count>
  <Step>1</Step>
</ENTREZ_DIRECT>
ADD REPLY
0
Entering edit mode

Hi, thanks for the reply. An example record would be AF301461.1 with the specimen_voucher field.

Your example esearch -db nuccore -query "specimen_voucher would also find records that do not have the field, but where the text "specimen_voucher" appears anywhere in the title, no?

ADD REPLY
0
Entering edit mode

While that field appears in GenBank format record it is not a directly queryable attribute AFAI see. You can find specific records that have a value in that field by

$ esearch -db nuccore -query "AF301461" | efetch -format gb | grep "/specimen_voucher"
                     /specimen_voucher="LSUMNS B-18658"

That field may also be empty

$ esearch -db nuccore -query "specimen_voucher" | efetch -format gb | grep -e "/specimen_voucher" -e ACCESSION
    ACCESSION   LC723704
                         /specimen_voucher="TNS:S. Chantanaorrapint & O. Suwanmala
    ACCESSION   LC723703
                         /specimen_voucher="TNS:S. Chantanaorrapint & O. Suwanmala
    ACCESSION   LC723702
                         /specimen_voucher="TNS:S. Chantanaorrapint & O. Suwanmala
    ACCESSION   LC723701
                         /specimen_voucher="TNS:S. Chantanaorrapint & O. Suwanmala
    ACCESSION   LC723700
                         /specimen_voucher="TNS:S. Chantanaorrapint & O. Suwanmala
    ACCESSION   LC723699
                         /specimen_voucher="TNS:S. Chantanaorrapint & O. Suwanmala
    ACCESSION   LC278102
                         /specimen_voucher="FRLM:34902"
    ACCESSION   LC036918
                         /specimen_voucher="HUMZ:198113"
    ACCESSION   LC021154
                         /specimen_voucher="FAKU:73708"
    ACCESSION   AB969934
                         /specimen_voucher="BSKU:62255"
ADD REPLY
0
Entering edit mode

Thanks! As i wanted to retrieve all Genbank records that have this field downloading the full genbank format for grepping may take a while... I think I'll just search for that keyword via the online search for now and download all IDs in a file.

ADD REPLY

Login before adding your answer.

Traffic: 1747 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6