How To Retrieve All Sequences, From Ncbi, That Belong To A Specific Txid And Its Sub Txids?
3
0
Entering edit mode
12.0 years ago
Nicojo ★ 1.1k

I am trying to retrieve all the nucleotide sequences from Genbank that belong to a txid and its "children" txids, e.g. txid5833. As you can see, there are many txids under this one...

I am using eutils for this, but I can't figure out how to make it <del>retrieve</del> search for all the sequences, including the children txids. Any suggestions on how to do this using eutils?

Edit: detailed the question.

Edit2: strikethrough text replaced with italic text

genbank • 9.1k views
ADD COMMENT
0
Entering edit mode

Sorry to all: my question was initially badly worded.

ADD REPLY
6
Entering edit mode
ADD COMMENT
1
Entering edit mode

@Andrzej (sorry for the name mix up), your link did help me find the answer, so thanks again for contributing in a constructive manner!

ADD REPLY
0
Entering edit mode

I'm glad I could help, I'm Andrzej by the way:)

ADD REPLY
0
Entering edit mode

I do apologize; misread the university as your name :-)

ADD REPLY
0
Entering edit mode

Thanks for the answer, unfortunately I wasn't precise enough in my question (I edited it now). I would like to know how to do so with eutils...

ADD REPLY
3
Entering edit mode
12.0 years ago
Neilfws 49k

Do you really need to use eutils? If your starting point is a file of sequences, Adam's answer is perfectly good.

If you do want to use eutils, as part of a larger program, you need to read the documentation and decide on a programming language. In particular, you need to understand efetch. The documentation includes an example in Perl for fetching many sequences from a given organism.

ADD COMMENT
0
Entering edit mode

@Neilfws, I'm sorry but your reply is not an answer to my question; I think it would have been better as a comment. I am realizing now that my question was very badly written. Sorry for that. To answer your comments, first of all, yes, I have read the documentation. Second, the example code you point to has nothing to do with my question, since it deals with one txid, and not the expanded txids. Finally, if I had just wanted to know how to retrieve all those sequences, I would have asked that and marked Adam's answer as the winner. But that was not my question.

ADD REPLY
1
Entering edit mode
12.0 years ago
Nicojo ★ 1.1k

Thanks to Adam's suggestion, I have found a way to do this. It seems that I can use 4 different "term" expressions in my esearch query:

  • term=txid5833 will result in just retrieving sequences with this txid.
  • term=txid5833[Organism] will return all the sequences with this txid, and all its sub-txids.

Interestingly, it is possible to add modifiers to the tag:

  • term=txid5833[Organism:exp] has the same result as the tag without the modifier.
  • term=txid5833[Organism:noexp] has the same result as not having the tag at all.

So by adding the field tag (with or without the modifier) to the txid is what I needed: term=txid583[Organism:exp]. I'm guessing the "exp" modifier stands for "expanded". These tag modifiers are probably documented somewhere, unfortunately, I can't find it anywhere. If someone does, please add a link to it in the comments.

The resultant query is this: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=nuccore&term=txid5833[Organism:exp]&usehistory=y

This retrieves the list of all the sequences I am looking for. I can then use efetch on that list to retrieve all the records I want.

ADD COMMENT

Login before adding your answer.

Traffic: 2723 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6