Obtaining list of snps using chromosome postion with BiomaRt
2
0
Entering edit mode
8.1 years ago

Hi,

I'm having difficulty using the getBM function. I'm trying to download the names of snps located within a certain region (of chromosome 15 for a particular transcript) I've tried several versions but to no avail. I did notice that the example in the biomart vignette did not work either (third example down).

snpmart <- useMart(host="www.ensembl.org", biomart="ENSEMBL_MART_SNP", dataset="hsapiens_snp")

snps <- getBM(attributes=c("refsnp_id","allele","chrom_start","chrom_strand"),
                        filters = c("chr_name","start","end"),
                        values = list(15,67065845,67195195), mart = snpmart)

snps <- getBM(attributes=c("refsnp_id","allele","chrom_start","chrom_strand"),
                        filters = c("chromosomal_region"),
                        values = list(1:67065845:67065900), mart = snpmart)

 getBM(c('refsnp_id','allele','chrom_start','chrom_strand'), 
          filters = c('chr_name','chrom_start','chrom_end'), 
          values = list(8,148350,148612), mart = snpmart)


"Error in getBM(c("refsnp_id", "allele", "chrom_start", "chrom_strand"),  : 
  Invalid filters(s): chrom_start, chrom_end 
Please use the function 'listFilters' to get valid filter names"
SNP R biomart • 4.6k views
ADD COMMENT
0
Entering edit mode

I'm talking to our BioMart team and I'll get back to you when I know more.

ADD REPLY
0
Entering edit mode

Many thanks for this information - sorry to hear it is a big job for the variants.

ADD REPLY
0
Entering edit mode
8.1 years ago
Neilfws 49k

The answer is right there in the error message:

filters <- listFilters(snpmart)
filters[grep("start|end|strand", filters$name),]
          name  description
2        start        Start
3          end          End
4   band_start   Band Start
5     band_end     Band End
6 marker_start Marker Start
7   marker_end   Marker End
9       strand       Strand

Looks like you want start, end, strand without the chrom_ prefix. Furthermore, there is no key named allele and refsnp_id should probably be snp_filter.

You can also use the chromosomal_region filter e.g. 1:100:10000:-1 (chrom:start:end:strand).

ADD COMMENT
0
Entering edit mode
8.1 years ago
Emily 23k

We have a correction to your third query, it should be:

getBM(c('refsnp_id','allele','chrom_start','chrom_strand'), 
      filters = c('chr_name','start','end'), 
      values = list(8,148350,148612), mart = snpmart)"

We generally have a problem with the variation mart, which is down to our variation database being so incredibly massive. This is not going to be a quick fix and is going to take a lot of work from our end.

You could instead consider using the filterVcf tool from bioconductor along with our VCF files: filterVcf https://www.bioconductor.org/packages/devel/bioc/vignettes/VariantAnnotation/inst/doc/filterVcf.pdf

Here are the VCF files: ftp://ftp.ensembl.org/pub/current_variation/vcf/homo_sapiens/

Another option would be VCF tools: http://vcftools.sourceforge.net/.

ADD COMMENT

Login before adding your answer.

Traffic: 2503 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6