How many ID's exist in BioSample?
3
2
Entering edit mode
7.4 years ago
pedrorvc ▴ 30

Hello everyone!

I would like to know if there is a way to get all the id's from BioSample. I already tried a link that i saw in another post that works for BioProject (https://www.ncbi.nlm.nih.gov/bioproject/browse/) but it doesn´t work for BioSample.

I also tried to download the summary of BioSample using this example from BioProject (ftp://ftp.ncbi.nlm.nih.gov/bioproject/summary.txt)

Also, can this be solved programmatically, i.e. using Eutils or EDirect?

What i really want to know is, simply, how many id's exist and how can i search for a list of them.

Thank you very much!

NCBI BioSample Eutils BioProject • 3.9k views
ADD COMMENT
2
Entering edit mode
7.4 years ago

using my tool XsltStream http://lindenb.github.io/jvarkit/XsltStream.html and the NCBI Biosample XML dump

How many Biosamples ?

curl -s "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi?db=biosample" 

<eInfoResult>
  <DbInfo>
    <DbName>biosample</DbName>
    <Count>7211809</Count>
(...)

get the BIosample and SRA identifiers:

curl -s "ftp://ftp.ncbi.nlm.nih.gov/biosample/biosample_set.xml.gz" |\
  gunzip -c |\
  java -jar dist/xsltstream.jar -n BioSample -t transform.xsl |\
  nl

output:

(....)
7211789	SAMN07945461	SRS2643051
7211790	SAMN07945462	SRS2643052
7211791	SAMN07945463	SRS2643049
7211792	SAMN07945464	SRS2643050
7211793	SAMN07945465	
7211794	SAMN07945466	
7211795	SAMN07945467	
7211796	SAMN07945468	
7211797	SAMN07945470	
7211798	SAMN07945471	
7211799	SAMN07945472	
7211800	SAMN07945473	
7211801	SAMN07945474	
7211802	SAMN07945475	
7211803	SAMN07945476	
7211804	SAMN07945477	
7211805	SAMN07945478	
7211806	SAMN07945678	
7211807	SAMN07945679	
7211808	SAMN07945680	
7211809	SAMN07945728	
7211810	SAMN07945729	
7211811	SAMN07945740	
7211812	SAMN07945742	
7211813	SAMN07945748	
7211814	SAMN07945751	
7211815	SAMN07945752	
7211816	SAMN07945753	
7211817	SAMN07945754	
7211818	SAMN07945755	
7211819	SAMN07945756	
7211820	SAMN07945757	
7211821	SAMN07945786	
7211822	SAMN07945787	
view raw README.md hosted with ❤ by GitHub
<?xml version='1.0' encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform' version='1.0'>
<xsl:output method="text" encoding="UTF-8"/>
<xsl:template match="BioSample">
<xsl:copy>
<xsl:apply-templates select="Ids"/>
</xsl:copy>
</xsl:template>
<xsl:template match="Ids">
<xsl:value-of select="Id[@db='BioSample']/text()"/>
<xsl:text> </xsl:text>
<xsl:value-of select="Id[@db='SRA']/text()"/>
<xsl:text>
</xsl:text>
</xsl:template>
</xsl:stylesheet>
view raw transform.xsl hosted with ❤ by GitHub

ADD COMMENT
2
Entering edit mode
7.4 years ago
LLTommy ★ 1.2k

May I ask you why you need a list of all samples? And just the id's without any other information? I don't understand what you want to accomplish.

However, if you have a problem with ncbi's biosamples, you can try EBI's Biosample database (It should be synchronized with the ncbi one as far as I know). In this post I link to the API and the documentation, you might find that useful!

Of course, you could also access the data via RDF/SPARQL but that is a whole different story.

ADD COMMENT
0
Entering edit mode

Thank you very much for your explanation. I am trying to get the BioSample id's associated with some BioProjects and i just wanted to know how much records existed and a way to list them.

ADD REPLY
1
Entering edit mode

If you know the specific BioProject ID then use this (replace proj_ID with a real ID): esearch -db bioproject -query "proj_ID" | elink -target biosample | efetch -format docsum | xtract -pattern DocumentSummary -block Accession -element Accession

ADD REPLY
2
Entering edit mode
7.4 years ago
GenoMax 150k

You could get this file and then grep for sample accession #.

Edit: For specific BioProject ID's

esearch -db bioproject -query "BioProj_ID" | elink -target biosample | efetch -format docsum | xtract -pattern DocumentSummary -block Accession -element Accession
ADD COMMENT

Login before adding your answer.

Traffic: 1290 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6