HI everyone.
I have a set of mouse SNPs (~974) from GRCm39 that I'm trying to get either GERP or UCSC Conservation scores on.
To do this, I'm using rtracklayer to try to query the ranges of the SNP and return the multiz35way conservation score.
However, when I do this, the ranges are completed ignored and it returns just max possible entries for chromosome one.
Is it ignoring the ranges due to:
1) the SNPs only being 1 bp in length (while conservation scores are calculated in regions)
or 2) my query (for query) being incorecctly formatted.
Code below:
Kind Regards,
Kyle Drover
library(rtracklayer)
session <- browserSession("UCSC")
#set to mouse genome
genome(session) <- "mm39"
# setting up dataset
clone_for_snp_gr <- snps_filtered_from_FVB
clone_for_snp_gr$chr_name <- unlist(clone_for_snp_gr$chr_name)
clone_for_snp_gr$chrom_start <- unlist(clone_for_snp_gr$chrom_start)
clone_for_snp_gr$chrom_end <- unlist(clone_for_snp_gr$chrom_end)
# make Grange object and convert it into GRangesForUCSCGenome
snp_gr <-makeGRangesFromDataFrame(clone_for_snp_gr, seqnames.field = "chr_name", start.field = "chrom_start", end.field = "chrom_end")
snp_gr <- renameSeqlevels(snp_gr, paste0("chr", seqlevels(snp_gr)))
for_query <- GRangesForUCSCGenome("mm39", chrom = snp_gr@seqnames, ranges = snp_gr@ranges)
for_query
GRanges object with 974 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr9 119800452 *
[2] chr7 44199769 *
[3] chr7 132616737 *
[4] chr4 136275390 *
[5] chr4 136275455 *
... ... ... ...
[970] chr9 120868329 *
[971] chr9 120868349 *
[972] chr2 165916862 *
[973] chr9 71837860 *
[974] chrX 71962690 *
-------
seqinfo: 61 sequences (1 circular) from mm39 genome
test <- ucscTableQuery(session, track="cons35way", range=for_query, table="multiz35way")
con_scores <- getTable(test)
con_scores
bin chrom chromStart chromEnd extFile offset score
1 0 chr1 67108799 67108920 1 3567081335 0.501910
2 0 chr1 134217621 134217740 1 7240893127 0.886047
3 1 chr1 8387705 8389370 1 172116144 0.510487
4 1 chr1 16777183 16777248 1 618514000 0.501355
5 1 chr1 25165823 25165834 1 1118818193 0.727571
6 1 chr1 33554431 33554447 1 1422195359 0.547642
7 1 chr1 41942996 41943218 1 1932068494 0.501428
8 1 chr1 50331618 50332996 1 2328581995 0.500599
9 1 chr1 58719908 58722745 1 2933344225 0.500000
10 2 chr1 75497420 75497476 1 4222662685 0.499302
11 2 chr1 83886033 83886164 1 4835125740 0.502277
12 2 chr1 92274310 92275035 1 5366193570 0.502864
13 2 chr1 100663007 100663507 1 5709210931 0.503867
14 2 chr1 109049041 109054290 1 6051152786 0.500000
15 2 chr1 117440159 117440543 1 6238167617 0.500000
16 2 chr1 125829054 125829155 1 6626874554 0.500635
[ reached 'max' / getOption("max.print") -- omitted 999858 rows ]
Session Info:
R version 4.2.2 Patched (2022-11-10 r83330)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.6 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
[6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] rtracklayer_1.58.0 TxDb.Mmusculus.UCSC.mm10.knownGene_3.10.0 GenomicFeatures_1.50.4
[4] org.Mm.eg.db_3.16.0 AnnotationDbi_1.60.0 VariantAnnotation_1.44.0
[7] Rsamtools_2.14.0 Biostrings_2.66.0 XVector_0.38.0
[10] SummarizedExperiment_1.28.0 Biobase_2.58.0 GenomicRanges_1.50.2
[13] GenomeInfoDb_1.34.7 IRanges_2.32.0 S4Vectors_0.36.1
[16] MatrixGenerics_1.10.0 matrixStats_0.63.0 BiocGenerics_0.44.0
[19] biomaRt_2.54.0 data.table_1.14.6 tidyr_1.3.0
[22] dplyr_1.0.10 stringr_1.5.0 BiocManager_1.30.19
loaded via a namespace (and not attached):
[1] httr_1.4.4 bit64_4.0.5 assertthat_0.2.1 BiocFileCache_2.6.0 blob_1.2.3
[6] BSgenome_1.66.2 GenomeInfoDbData_1.2.9 yaml_2.3.7 progress_1.2.2 pillar_1.8.1
[11] RSQLite_2.2.20 lattice_0.20-45 glue_1.6.2 digest_0.6.31 Matrix_1.5-1
[16] XML_3.99-0.13 pkgconfig_2.0.3 zlibbioc_1.44.0 purrr_1.0.1 BiocParallel_1.32.5
[21] tibble_3.1.8 KEGGREST_1.38.0 generics_0.1.3 ellipsis_0.3.2 withr_2.5.0
[26] cachem_1.0.6 cli_3.6.0 magrittr_2.0.3 crayon_1.5.2 memoise_2.0.1
[31] fansi_1.0.4 xml2_1.3.3 tools_4.2.2 prettyunits_1.1.1 hms_1.1.2
[36] BiocIO_1.8.0 lifecycle_1.0.3 DelayedArray_0.24.0 compiler_4.2.2 rlang_1.0.6
[41] grid_4.2.2 RCurl_1.98-1.10 rjson_0.2.21 rappdirs_0.3.3 bitops_1.0-7
[46] restfulr_0.0.15 codetools_0.2-18 DBI_1.1.3 curl_5.0.0 R6_2.5.1
[51] GenomicAlignments_1.34.0 fastmap_1.1.0 bit_4.0.5 utf8_1.2.2 filelock_1.0.2
[56] stringi_1.7.12 parallel_4.2.2 Rcpp_1.0.10 vctrs_0.5.2 png_0.1-8
[61] dbplyr_2.3.0 tidyselect_1.2.0