If you specify the verbose=TRUE
attribute to the getBM()
function, it will print the XML query that it constructs from your request to the R console:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Query>
<Query virtualSchemaName="default" uniqueRows="1" count="0" datasetConfigVersion="0.6" header="1" formatter="TSV" requestid="biomaRt">
<Dataset name="hsapiens_snp">
<Attribute name="refsnp_id" />
<Attribute name="chr_name" />
<Attribute name="chrom_start" />
<Attribute name="chrom_end" />
<Filter name="snp_filter" value="rs12081925" />
</Dataset>
</Query>
You can also submit this query directly, e.g, with wget
:
wget -O result.txt 'http://www.ensembl.org/biomart/martservice?query=<?xml version='1.0' encoding='UTF-8'?><!DOCTYPE Query><Query virtualSchemaName = 'default' uniqueRows = '1' count='0' datasetConfigVersion='0.6' header='1' formatter='TSV' requestid='biomaRt'> <Dataset name = 'hsapiens_snp'><Attribute name = 'refsnp_id'/><Attribute name = 'chr_name'/><Attribute name = 'chrom_start'/><Attribute name = 'chrom_end'/><Filter name = "snp_filter" value = "rs12081925" /></Dataset></Query>'
If you check the response returned from the API, it reads:
Query ERROR: caught BioMart::Exception: non-BioMart die():
XML declaration not well-formed at line 1, column 14, byte 14 at /nfs/public/ro/ensweb-software/sharedsw/2022_01_17_ct7/linuxbrew/Cellar/perl/5.34.0/lib/perl5/site_perl/5.34.0/x86_64-linux-thread-multi/XML/Parser.pm line 187.
XML::Simple called at /nfs/public/ro/ensweb/live/mart/www_107/biomart-perl/lib/BioMart/Query.pm line 1935.
This response can evidently not be parsed into a table, and therefore the error message. It is a bit unfortunate that getBM()
doesn't return the raw response as a message to the console, if it fails to parse the response into a table.
A working XML query e.g. looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Query>
<Query virtualSchemaName = "default" formatter = "TSV" header = "0" uniqueRows = "0" count = "" datasetConfigVersion = "0.6" >
<Dataset name = "hsapiens_snp" interface = "default" >
<Filter name = "snp_filter" value = "rs12081925"/>
<Attribute name = "refsnp_id" />
<Attribute name = "chr_name" />
<Attribute name = "chrom_start" />
<Attribute name = "chrom_end" />
</Dataset>
</Query>
If you send this, you get the proper response from the API:
wget -O result2.txt 'http://www.ensembl.org/biomart/martservice?query=<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE Query><Query virtualSchemaName = "default" formatter = "TSV" header = "0" uniqueRows = "0" count = "" datasetConfigVersion = "0.6" ><Dataset name = "hsapiens_snp" interface = "default" ><Filter name = "snp_filter" value = "rs12081925"/><Attribute name = "refsnp_id" /><Attribute name = "chr_name" /><Attribute name = "chrom_start" /><Attribute name = "chrom_end" /></Dataset></Query>'
Response in result2.txt:
rs12081925 1 214298841 214298841
So it is a bug within Biomart itself: Either the XML is not correctly generated or not correctly parsed. Until the bug is fixed, I would suggest retrieving the data with a manual XML query (can also be generated e.g. with the Biomart Web Interface on the Ensembl Website) and then read the .tsv
file into R.
Tagging: Mike Smith