getTable ignores query ranges
1
0
Entering edit mode
23 months ago
Kyle ▴ 10

HI everyone.

I have a set of mouse SNPs (~974) from GRCm39 that I'm trying to get either GERP or UCSC Conservation scores on.

To do this, I'm using rtracklayer to try to query the ranges of the SNP and return the multiz35way conservation score.

However, when I do this, the ranges are completed ignored and it returns just max possible entries for chromosome one.

Is it ignoring the ranges due to:

1) the SNPs only being 1 bp in length (while conservation scores are calculated in regions)

or 2) my query (for query) being incorecctly formatted.

Code below:

Kind Regards,

Kyle Drover

library(rtracklayer)

session <- browserSession("UCSC")

#set to mouse genome
genome(session) <- "mm39"

# setting up dataset 
clone_for_snp_gr <- snps_filtered_from_FVB
clone_for_snp_gr$chr_name <- unlist(clone_for_snp_gr$chr_name)
clone_for_snp_gr$chrom_start <- unlist(clone_for_snp_gr$chrom_start)
clone_for_snp_gr$chrom_end <- unlist(clone_for_snp_gr$chrom_end)

# make Grange object and convert it into GRangesForUCSCGenome
snp_gr <-makeGRangesFromDataFrame(clone_for_snp_gr, seqnames.field = "chr_name", start.field = "chrom_start", end.field = "chrom_end")
snp_gr <- renameSeqlevels(snp_gr, paste0("chr", seqlevels(snp_gr)))

for_query <- GRangesForUCSCGenome("mm39", chrom = snp_gr@seqnames, ranges = snp_gr@ranges)

for_query
GRanges object with 974 ranges and 0 metadata columns:
        seqnames    ranges strand
           <Rle> <IRanges>  <Rle>
    [1]     chr9 119800452      *
    [2]     chr7  44199769      *
    [3]     chr7 132616737      *
    [4]     chr4 136275390      *
    [5]     chr4 136275455      *
    ...      ...       ...    ...
  [970]     chr9 120868329      *
  [971]     chr9 120868349      *
  [972]     chr2 165916862      *
  [973]     chr9  71837860      *
  [974]     chrX  71962690      *
  -------
  seqinfo: 61 sequences (1 circular) from mm39 genome

test <- ucscTableQuery(session, track="cons35way", range=for_query, table="multiz35way")

con_scores <- getTable(test)

con_scores
    bin chrom chromStart  chromEnd extFile      offset    score
1     0  chr1   67108799  67108920       1  3567081335 0.501910
2     0  chr1  134217621 134217740       1  7240893127 0.886047
3     1  chr1    8387705   8389370       1   172116144 0.510487
4     1  chr1   16777183  16777248       1   618514000 0.501355
5     1  chr1   25165823  25165834       1  1118818193 0.727571
6     1  chr1   33554431  33554447       1  1422195359 0.547642
7     1  chr1   41942996  41943218       1  1932068494 0.501428
8     1  chr1   50331618  50332996       1  2328581995 0.500599
9     1  chr1   58719908  58722745       1  2933344225 0.500000
10    2  chr1   75497420  75497476       1  4222662685 0.499302
11    2  chr1   83886033  83886164       1  4835125740 0.502277
12    2  chr1   92274310  92275035       1  5366193570 0.502864
13    2  chr1  100663007 100663507       1  5709210931 0.503867
14    2  chr1  109049041 109054290       1  6051152786 0.500000
15    2  chr1  117440159 117440543       1  6238167617 0.500000
16    2  chr1  125829054 125829155       1  6626874554 0.500635
 [ reached 'max' / getOption("max.print") -- omitted 999858 rows ]

Session Info:

R version 4.2.2 Patched (2022-11-10 r83330)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.6 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] rtracklayer_1.58.0                        TxDb.Mmusculus.UCSC.mm10.knownGene_3.10.0 GenomicFeatures_1.50.4                   
 [4] org.Mm.eg.db_3.16.0                       AnnotationDbi_1.60.0                      VariantAnnotation_1.44.0                 
 [7] Rsamtools_2.14.0                          Biostrings_2.66.0                         XVector_0.38.0                           
[10] SummarizedExperiment_1.28.0               Biobase_2.58.0                            GenomicRanges_1.50.2                     
[13] GenomeInfoDb_1.34.7                       IRanges_2.32.0                            S4Vectors_0.36.1                         
[16] MatrixGenerics_1.10.0                     matrixStats_0.63.0                        BiocGenerics_0.44.0                      
[19] biomaRt_2.54.0                            data.table_1.14.6                         tidyr_1.3.0                              
[22] dplyr_1.0.10                              stringr_1.5.0                             BiocManager_1.30.19                      

loaded via a namespace (and not attached):
 [1] httr_1.4.4               bit64_4.0.5              assertthat_0.2.1         BiocFileCache_2.6.0      blob_1.2.3              
 [6] BSgenome_1.66.2          GenomeInfoDbData_1.2.9   yaml_2.3.7               progress_1.2.2           pillar_1.8.1            
[11] RSQLite_2.2.20           lattice_0.20-45          glue_1.6.2               digest_0.6.31            Matrix_1.5-1            
[16] XML_3.99-0.13            pkgconfig_2.0.3          zlibbioc_1.44.0          purrr_1.0.1              BiocParallel_1.32.5     
[21] tibble_3.1.8             KEGGREST_1.38.0          generics_0.1.3           ellipsis_0.3.2           withr_2.5.0             
[26] cachem_1.0.6             cli_3.6.0                magrittr_2.0.3           crayon_1.5.2             memoise_2.0.1           
[31] fansi_1.0.4              xml2_1.3.3               tools_4.2.2              prettyunits_1.1.1        hms_1.1.2               
[36] BiocIO_1.8.0             lifecycle_1.0.3          DelayedArray_0.24.0      compiler_4.2.2           rlang_1.0.6             
[41] grid_4.2.2               RCurl_1.98-1.10          rjson_0.2.21             rappdirs_0.3.3           bitops_1.0-7            
[46] restfulr_0.0.15          codetools_0.2-18         DBI_1.1.3                curl_5.0.0               R6_2.5.1                
[51] GenomicAlignments_1.34.0 fastmap_1.1.0            bit_4.0.5                utf8_1.2.2               filelock_1.0.2          
[56] stringi_1.7.12           parallel_4.2.2           Rcpp_1.0.10              vctrs_0.5.2              png_0.1-8               
[61] dbplyr_2.3.0             tidyselect_1.2.0
rtracklayer • 618 views
ADD COMMENT
0
Entering edit mode
21 months ago
Kyle ▴ 10

The short answer is you can theoretically do this you just need to a) make your SNPs are two base-pairs in length (i.e. add 1 to the end) and ii) pass individuals Granges via lapply.

Better solution to this problem (avoiding rtracklayer) is to download the bw file and then filter the bw_file for your genome positions of interest. See: GetBM - GERP conservation score for GRCm39

ADD COMMENT

Login before adding your answer.

Traffic: 1935 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6