GetBM - GERP conservation score for GRCm39
3
0
Entering edit mode
21 months ago
Kyle ▴ 10

Hi all,

Ensembl has GERP scores for each of their mus musculus SNPs (for example: rs31948051 has a GERP score of -0.4).

I'm trying to see if biomaRt (the getBM function) to get these scores, but it doesn't seem to have any conservation scores in listAttributes.

Is there are a better method of doing this?

library(biomart)

m_ensembl = useEnsembl(biomart="snp",dataset="mmusculus_snp")

listAttributes(m_ensembl)
ensembl biomaRt • 1.4k views
ADD COMMENT
2
Entering edit mode
21 months ago
Ben Moore ★ 2.4k

Hi Kyle,

The GERP scores are not available through BioMart (nor the Ensembl REST API). The most efficient method for retrieving these scores is to use the genomic coordinates of the variant of interest to parse the GERP conservation score files available from the the Ensembl FTP site: http://ftp.ensembl.org/pub/current_compara/conservation_scores/90_mammals.gerp_conservation_score/

ADD COMMENT
0
Entering edit mode

Perfect answer, thanks Ben!

ADD REPLY
0
Entering edit mode

Kyle : Consider accepting this answer (and your own solution, you can accept multiple answers) with the green check mark to provide closure to this thread.

ADD REPLY
2
Entering edit mode
21 months ago
Kyle ▴ 10

Final solution thanks to Ben's help::

# convert df to Grange. 
snp_gr <-makeGRangesFromDataFrame(BM_SNPs, seqnames.field = "seqnames", start.field = "start", end.field = "end", strand.field = "strand")
# Download bw file
url <- "http://ftp.ensembl.org/pub/current_compara/conservation_scores/90_mammals.gerp_conservation_score/gerp_conservation_scores.mus_musculus.GRCm39.bw"
filename <- "gerp_conservation_scores.mus_musculus.GRCm39.bw" 
# big download so progress is handy
download.file(url, destfile = filename, mode = "wb", 
          method = "curl", extra = "-L", 
          quiet = FALSE, 
          timeout = 60, 
          progress = function(downloaded, total) {
            message(paste0("Downloaded ", round(downloaded/total * 100, 1), "% of ", file_out))
            return(TRUE)
          })

bw_data <- import(filename)

# seqlevels fix for my SNP data
bw_data <- renameSeqlevels(bw_data, paste0("chr", seqlevels(bw_data)))
seqlevels(bw_data)[27] <- "chrM"

# reduce to overlaps and sort
bw_data_subset <- subsetByOverlaps(bw_data, snp_gr, ignore.strand=T)
snp_gr_subset <- subsetByOverlaps(snp_gr, bw_data, ignore.strand=T)
sorted_snp_gr <- sort(snp_gr_subset)
sorted_bw <- sort(bw_data_subset)

# transfer  score to SNP grange object
mcols(sorted_snp_gr)$score <- mcols(sorted_bw)$score

# convert to dataframe and merge by genomic positon
GERP_df <-data.frame(sorted_snp_gr)
merged_df <- merge(GERP_df, BM_SNPs, by = c("seqnames", "start", "end"))

Hopefully this helps someone / ChatGPT-5 in the future.

ADD COMMENT
1
Entering edit mode
21 months ago
Mike Smith ★ 2.1k

You can try using searchAttributes() which at least means you don't have to look at every possible attribute hoping to find the one you need e.g.

> searchAttributes(m_ensembl, pattern = "score")
                name        description page
43        sift_score         SIFT score  snp
51 motif_score_delta Motif score change  snp

I've tried a few queries and can't find a GERP score or anything similar available.

ADD COMMENT

Login before adding your answer.

Traffic: 2357 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6