I cannot help you with the real juice, but you can access BioMart in several programmatic ways.
A very simple example of how to access BioMart 0.7 can be found here:
http://joachimbaran.wordpress.com/2011/06/17/bioknacks-pubmed2ensembl-query-wrapper/
https://github.com/joejimbo/bioknack/blob/master/bk_pubmed2ensembl.rb
There I use Darren Oakley's Ruby API, which I find very easy and straightforward to use.
For BioMart 0.8, you can either have a look at the docs (http://www.biomart.org/rc6_documentation.pdf) for the various methods to access the new BioMart; or fiddle around with the basic SPARQL-interface that was introduced in BioMart 0.8rc6.
For example, there are SPARQL-endpoints for the new Central Portal marts (http://central.biomart.org), such as the Ensembl Gene mart:
http://central.biomart.org/martwizard/#!/Genome?mart=gene_ensembl_config_4
When you get to the results page of a query, there is a "SPARQL" button that shows you the equivalent SPARQL-query that you can use to programmatically obtain the results via the SPARQL-endpoint. I have attached an example query below that I clicked together on the second link (gene_ensembl_config_4). I just added a "LIMIT 5" at the end manually. The endpoint for submitting the query is:
http://central.biomart.org/martsemantics/gene_ensembl_config_4/SPARQLXML/get/?query=*urlencoded SPARQL-query*
A list of queryable attributes for creating manual queries can be obtained from the ontology of the Ensembl mart. If you use a tool such as Protégé (http://protege.stanford.edu/) then you can load the ontology via the URI:
http://central.biomart.org/martsemantics/gene_ensembl_config_4/ontology
Using SPARQL to retrieve internal nodes or other meta-data is not possible (yet) with our implementation. For example, querying "select ?x ?y ?z where {?x ?y ?z}" to get all triples will not work. You need to consider the ontology for figuring out such things.
Example Query
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX accesspoint: <http://central.biomart.org/martsemantics/gene_ensembl_config_4/ontology#>
PREFIX class: <biomart: central.biomart.org="" martsemantics="" gene_ensembl_config_4="" ontology="" class#="">
PREFIX dataset: <biomart: central.biomart.org="" martsemantics="" gene_ensembl_config_4="" ontology="" dataset#="">
PREFIX attribute:<biomart: central.biomart.org="" martsemantics="" gene_ensembl_config_4="" ontology="" attribute#="">
SELECT ?a0 ?a1 ?a2
FROM dataset:hsapiens_gene_ensembl
WHERE {
?mart attribute:biotype "protein_coding" .
?mart attribute:atlas_celltype "germ cell" .
?mart attribute:ensembl_gene_id ?a0 .
?mart attribute:ensembl_peptide_id ?a1 .
?mart attribute:ensembl_exon_id ?a2
}
LIMIT 5
magic ? :-)