Different filters for wheat in biomaRt package and website
2
1
Entering edit mode
5.9 years ago

Hello,

I'm looking to download the Wheat TILLING & SNP data from Ensembl from the biomaRt package in Bioconductor. However, I've noticed that the filters available in biomart website are more than what you have in the Bioconductor package. For e.g.

plantsDatabase <- useMart(biomart = 'plants_variations', host = 'plants.ensembl.org')
plantsDatasets <- listDatasets(plantsDatabase)    
mainOrgIndex <- GetDatasetIndex(organism = mainOrganism,plantsDatasets$description)
mainOrgDataset <- useDataset(mart = plantsDatabase,dataset = plantsDatasets$dataset[[mainOrgIndex]])
mainOrgFilters <- listFilters(mainOrgDataset)
mainOrgAttributes <- listAttributes(mainOrgDataset)
attributes=mainOrgAttributes$name[c(1:2,4:6,20:21,24,34:35)]
filter=c('variation_source','variation_set_name','chr_name')
values=list('EMS-induced mutation','EMS (Cadenza)','1A')

The GetDatasetIndex is a nifty function to fetch the index of the organism for which you are querying biomart, in this case 'Triticum aestivum'.

I wanted to filter the data by 'Variant consequence', filter available on biomart web portal but not in the biomaRt package (listFilters for this mart doesn't have this filter). Any pointers?

Best, Sandeep

biomart ensembl getBM • 1.9k views
ADD COMMENT
0
Entering edit mode

Tagging: Mike Smith

ADD REPLY
3
Entering edit mode
5.9 years ago
Emily 24k

The filter is "so_mini_parent_name". No, it's not obvious.

A cheat you can use: from the web-based BioMart results page, click on the XML button. The coded versions of the filter names will appear in the XML, eg:

<Filter name = "so_mini_parent_name" value = "feature_ablation"/>
ADD COMMENT
0
Entering edit mode

Perfect example of why you are vital to biostars. No chance anyone would have figured that one out.

Is functionality available via web BioMart completely equivalent to biomaRt?

ADD REPLY
1
Entering edit mode

Yes, it should be the same.

ADD REPLY
0
Entering edit mode

Thanks a lot Emily, I'll try it out!

ADD REPLY
3
Entering edit mode
5.9 years ago
Mike Smith ★ 2.1k

Emily's answer is exactly how I go about diagnosing problems with the biomaRt package. Checking the XML via the web interface is always my first port of call for something like this.

I thought I'd advertise the recently added the searchDatasets(), searchFilters() and searchAttributes() functions that try and make finding these a little easier. Rather than simply listing all the available properties for a mart, you can provide a search term and it will find relevant results. For example, to find the name of the dataset you want you could do something like:

> searchDatasets(mart = plantsDatabase, 'aestivum')
            dataset                                                                           description version
12 taestivum_eg_snp Triticum aestivum Short Variants (SNPs and indels excluding flagged variants) (IWGSC)   IWGSC

However they're useless in this instance, since none of the information behind the scenes regarding this filter mentions 'Variant' or 'consequence' so you wouldn't know what to search for!


It's also worth pointing out that the filter you're using isn't a free text filter, but takes a specific set of values (they're provided in a list when using the web interface). You can see the list of possible search terms in R using the function filterOptions() e.g.

filterOptions('so_mini_parent_name', mart = mainOrgDataset)
[1] "[3_prime_UTR_variant,5_prime_UTR_variant,coding_sequence_variant,coding_transcript_variant,downstream_gene_variant,exon_variant,feature_ablation,feature_amplification,feature_elongation,feature_truncation,feature_variant,frameshift_variant,gene_variant,incomplete_terminal_codon_variant,inframe_deletion,inframe_indel,inframe_insertion,inframe_variant,intergenic_variant,internal_feature_elongation,intron_variant,mature_miRNA_variant,missense_variant,NMD_transcript_variant,nonsynonymous_variant,non_coding_transcript_exon_variant,non_coding_transcript_variant,protein_altering_variant,sequence_comparison,sequence_variant,splice_acceptor_variant,splice_donor_variant,splice_region_variant,splice_site_variant,splicing_variant,start_lost,stop_gained,stop_lost,stop_retained_variant,structural_variant,synonymous_variant,terminator_codon_variant,transcript_ablation,transcript_amplification,transcript_variant,upstream_gene_variant,UTR_variant]"

I might need to improve the formatting here!

ADD COMMENT

Login before adding your answer.

Traffic: 1955 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6