Entering edit mode
5.0 years ago
lucslapping
▴
20
I was wondering which ways are available for getting metadata from accession numbers. I have seen other tools such as Nextstrain make use of a so called "metadata" file to describe used sequences. The file looks something like this:
It shows various data from NCBI for the accession numbers such as virus strain, country, date, URL, etc. For me the most import ones are strain, country and date. Are there ways to download such data automatically when you have a list of accession numbers?
Any help is appreciated.
Thanks you, this brings up some desired fields that I mentioned, however is there a way I can submit a list of accession numbers and save the output to a csv, tsv or txt file?
Use
epost
with your accession numbers of interest in a file (one per line).Thanks again, this worked for me, however some records appear to be in the wrong order for my case. Could this be due to mistakes in the database?
What do you mean by wrong order? Can you provide an example? We are doing a direct databaseq query so the information should be what is in the db.
I have a input text file with accession numbers and here is what the first few lines look like:
MK419834.1
MK230890.1
MK230891.1
MK230892.1
MK230893.1
In the output CSV file I see that some entries dont have all 6 fields that you specified:
Some entries only have 4 out of those 6 fields for example. In the CSV output I see for certain entries that the country is in the second column and that the host is in the third column, this is a different order than what most entries have in the output file. I would like to have each result in the right column basically.
Unfortunately it is possible that blank fields from some of those records are messing up the output. You could leave the output as is, bring the data into excel (breaking records on
|
) and then check if the fields stay aligned.