Hi Biostars community,
I want to use epost and esummary (NCBIs eutils) to obtain information on the lineage.
But I have some problems with accession numbers not starting like WP_
.
While
cat "$ListWithAccessionNumbers" | epost -db protein |\
esummary -db taxonomy -format xml | \
xtract -pattern Seq-entry -element Org-ref_taxname, OrgName_lineage, NCBIeaa, Textseq-id_accession \
> SummaryTable.tsv
gives me a tsv file indeed, some cells are not filled with the requested information.
For the accession numbers not starting with WP_
the accession number and sequence are not printed out, this will only be printed for the accession numbers starting with WP_
.
So my current question is how can I obtain lineage information for those accession numbers using epost and esummary that do not start with WP_
but still also get the accession number and sequence printed out? Is there anyone with experience regarding this?
If though you have some suggestions on how to use epost
and esummary
differently instead or know of an alternative way of using the NCBIs e-utilities to solve this problem, I am grateful for your ideas and help! Thank you!
Please use the formatting bar (especially the
code
option) to present your post better. You can use backticks for inline code (`text` becomestext
), or select a chunk of text and use the highlighted button to format it as a code block. I've done it for you this time.Thank you! I didn't know that, very helpful :)