Entering edit mode
5.2 years ago
beginner_problem
▴
10
Hello!
I am trying to get a set of assemblies from one species, lets say for example Bacillus cereus, for which also the according read sets are available.
Is that somehow possible in NCBI - to get the direct connection?
I tried sofar by searching in the database "Assembly" for B.cereus, but if I then choose some assembly, there is no connection to the read set, from which this assembly was created? DOes someone know how to do the trick?
Thanks
This question has been answered multiple times in past. See the answers and the links I posted in this thread: more elegant way to bulk download genomes from the NCBI
In short, NCBI genome download tool mentioned in @jrj.healey's answer should do the trick.
You will need to look through the
biosamples
accessions associated with read assemblies to get the read data. Usesra-explorer
tool from Phil Ewels for that.Thank you for the answer, but which @jrj.healey answer do you mean? I did not see anyone named like that.
and which accessions do you mean? I tried to use the Biosample ids, or the assembly accessions but that did not work out.
That user changed his screen name to @Joe. So that would be the answer to look for.
Second answer in the thread I linked above can be used for an example. Using that if you did this search at NCBI you are going to see some assemblies for Lactobacillus. Select
sort by date refseq assembly released
(at top of page, newer assemblies are likely to have NGS data) you will see this first result.. Clicking on associated biosample gives you the SRA accession.You can probably use
EntrezDirect
to get some of this information. I may look it up later today.