Hi I am attempting to create an smlSet for use in GGtools for eQTL analysis using a snpmatrix that was manually created using the following code:
library(snpStats)
snpmatrix<-new("SnpMatrix", snpmatrix)
The original snpmatrix is simply a matrix object.
I call the SnpMatrix object, I get the following output:
snpmatrix
A SnpMatrix with 109 rows and 2443179 columns
Row names: XXXX_001 ... XXXX_103
Col names: GA008510 ...VGXS35706
The code generated an object of the class SnpMatrix with dimensions of 109 x 2443179. There are 109 samples and 2443179 snps in the object.
When I attempt to make a smlSet using the following code, I get an error as follows: "Error in make_smlSet(esetforeqtl, snpmatrix, organism="Homo sapiens";: sml must belist of SnpMatrix instances from 'snpstats' package"
The code that generated the error is listed below:
library(GGtools)
eqtl_smlset<-make_smlSet(esetforeqtl, snpmatrix, organism="Homo sapiens", harmonizeSamples=FALSE)
In addition to the snpmatrix, I also manually created the supporting objects for snpmatrix (snp.support and subject.support) but I am not sure how to combine all tree into one object. If the instructions are in the documentation I must have missed them in it.
The snpmatrix object and supporting snpmatrix objects were created using the code block at the end of the post (minus the lines that intially scanned the file line by line due to import issues) in case anyone wants to browse through the code.
Does anyone have any insight on how I can properly create a snpmatrix object coupled with the support objects for incorporation into an smlSet along with my expression data (ExpressionSet) using make_smlSet()?
#SNP data preprocessing and coversion to appropriate formats fot GGtools
test<-rbind(test_line2, test_line3, test_line4, test_line5, test_line6, test_line7, test_line8, test_line9, test_line10, test_line11, test_line12, test_line13, test_line14, test_line15, test_line16, test_line17, test_line18, test_line19, test_line20, test_line21, test_line22, test_line23, test_line24, test_line25, test_line26, test_line27, test_line28, test_line29, test_line30, test_line31, test_line32, test_line33, test_line34, test_line35, test_line36, test_line37, test_line38, test_line39, test_line40, test_line41, test_line42, test_line43, test_line44, test_line45, test_line46, test_line47, test_line48, test_line49, test_line50, test_line51, test_line52, test_line53, test_line54, test_line55, test_line56, test_line57, test_line58, test_line59, test_line60, test_line61, test_line62, test_line63, test_line64, test_line65, test_line66, test_line67, test_line68, test_line69, test_line70, test_line71, test_line72, test_line73, test_line74, test_line75, test_line76, test_line77, test_line78, test_line79, test_line80, test_line81)
test<-test[,-22]
test<-rbind(test, test_line82, test_line83, test_line84, test_line85, test_line86, test_line87, test_line88, test_line89, test_line90, test_line91, test_line92, test_line93, test_line94, test_line95, test_line96, test_line97, test_line98, test_line99, test_line100, test_line101, test_line102, test_line103, test_line104, test_line105, test_line106, test_line107, test_line108, test_line109, test_line110)
indexNC<-test=="NC"
indexAA<-test=="AA"
indexAB<-test=="AB"
indexBB<-test=="BB"
test[indexNC]<-"00"
test[indexAA]<-"01"
test[indexAB]<-"02"
test[indexBB]<-"03"
#Remove extra files post rbind
rm(test_line2, test_line3, test_line4, test_line5, test_line6, test_line7, test_line8, test_line9, test_line10, test_line11, test_line12, test_line13, test_line14, test_line15, test_line16, test_line17, test_line18, test_line19, test_line20, test_line21, test_line22, test_line23, test_line24, test_line25, test_line26, test_line27, test_line28, test_line29, test_line30, test_line31, test_line32, test_line33, test_line34, test_line35, test_line36, test_line37, test_line38, test_line39, test_line40, test_line41, test_line42, test_line43, test_line44, test_line45, test_line46, test_line47, test_line48, test_line49, test_line50, test_line51, test_line52, test_line53, test_line54, test_line55, test_line56, test_line57, test_line58, test_line59, test_line60, test_line61, test_line62, test_line63, test_line64, test_line65, test_line66, test_line67, test_line68, test_line69, test_line70, test_line71, test_line72, test_line73, test_line74, test_line75, test_line76, test_line77, test_line78, test_line79, test_line80, test_line81)
rm(test_line82, test_line83, test_line84, test_line85, test_line86, test_line87, test_line88, test_line89, test_line90, test_line91, test_line92, test_line93, test_line94, test_line95, test_line96, test_line97, test_line98, test_line99, test_line100, test_line101, test_line102, test_line103, test_line104, test_line105, test_line106, test_line107, test_line108, test_line109, test_line110)
rm(indexAA, indexAB, indexBB, indexNC)
information<-test[,-1:-]
samplename<-test[,1]
snpnames<-test_line1[,89:2443267]
gender<-test[,2]
genderest<-test[,7]
snpmatrix<-test[1:109,33:2443211]
rownames(snpmatrix) <- samplename
colnames(snpmatrix) <- snpnames
extradata<-test[1:109, 1:32]
#Illumino info from annotation file
annotation <- read.table(pipe("cut -f1,3,4,18 /N/u/bscomer/Karst/XXXX_82_thru_110_omni2_5_Custom_human_omni_2_5.annotation_edit.txt"))
library(splitstackshape)
annotation<-cSplit(indt=annotationtest, splitCols="V4", sep="/", stripWhite=FALSE)
annotation$V4_1<-as.character(annotation$V4_1)
annotation$V4_2<-as.character(annotation$V4_2)
annotation$V4_1<-gsub("\\[", "", annotation$V4_1)
annotation$V4_2<-gsub("\\]", "", annotation$V4_2)
annotation<-annotation[-c(1)]
annotation_snpnames<-annotation$V1
annotation<-data.frame(annotation)
rownames(annotation)<-annotation[,1]
annotation[,1]<-NULL
colnames(annotation)<-c("chromosome", "position", "A1", "A2")
annotation$position<-as.integer(annotation$position)
annotation$A1<-as.factor(annotation$A1)
annotation$A2<-as.factor(annotation$A2)
snp.support<-annotation
dfsamplename<-data.frame(samplename)
#export samples for annotation for case or control in spreadsheet software
write.table(dfsamplename, file="dfsamplenames")
#reimport sample list with control or case annotated in table as dfsamplenames
dfsamplenames <- read.csv("/gpfs/home/b/s/bscomer/Karst/Desktop/dfsamplenames.csv")
subject.support<-dfsamplenames
#stopped here
subject.support<-data.frame(subject.support)
subject.support$X.samplename.<-gsub("'", "", subject.support$X.samplename.)
row.names(subject.support)<-subject.support$X.samplename.
subject.support$X.samplename.<-NULL
colnames(subject.support)<-c("cc", "cc_text", "sex", "race")
subject.support$cc_text<-NULL
subject.support$sex<-NULL
subject.support$race<-NULL
#remove extra files to keep r from crashing
rm(test)
library(snpStats)
snpmatrix<-new("snpMatrix", snpmatrix)
Session information is below: (Note working on computing cluster so I cannot update R version without submitting a request to system administrators for next monthly update)
R version 3.1.1 (2014-07-10)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] BiocInstaller_1.16.5 GGtools_5.2.0 data.table_1.9.6 GGBase_3.28.0
[5] sva_3.12.0 mgcv_1.8-12 nlme_3.1-126 EMA_1.4.4
[9] genefilter_1.48.1 BiocParallel_1.0.3 pd.hugene.1.0.st.v1_3.10.0 RSQLite_1.0.0
[13] DBI_0.3.1 oligo_1.30.0 Biostrings_2.34.1 XVector_0.6.0
[17] snpStats_1.16.0 Matrix_1.2-4 survival_2.38-3 limma_3.22.7
[21] annotate_1.44.0 XML_3.98-1.3 AnnotationDbi_1.28.2 Biobase_2.26.0
[25] rtracklayer_1.26.3 GenomicRanges_1.18.4 GenomeInfoDb_1.2.5 IRanges_2.0.1
[29] S4Vectors_0.4.0 BiocGenerics_0.12.1 oligoClasses_1.28.0
loaded via a namespace (and not attached):
[1] acepack_1.3-3.3 affxparser_1.38.0 affy_1.44.0 affyio_1.34.0
[5] assertthat_0.1 base64enc_0.1-3 BatchJobs_1.6 BBmisc_1.9
[9] biglm_0.9-1 biomaRt_2.22.0 biovizBase_1.14.1 bit_1.1-12
[13] bitops_1.0-6 brew_1.0-6 BSgenome_1.34.1 caTools_1.17.1
[17] checkmate_1.7.1 chron_2.3-47 cluster_2.0.3 codetools_0.2-14
[21] colorspace_1.2-6 dichromat_2.0-0 digest_0.6.9 dplyr_0.4.1
[25] FactoMineR_1.32 fail_1.3 ff_2.2-13 flashClust_1.01-2
[29] foreach_1.4.3 foreign_0.8-66 Formula_1.2-1 gcrma_2.38.0
[33] gdata_2.17.0 GenomicAlignments_1.2.2 GenomicFeatures_1.18.7 ggplot2_2.1.0
[37] gplots_3.0.1 grid_3.1.1 gridExtra_2.2.1 GSA_1.03
[41] gtable_0.2.0 gtools_3.5.0 Gviz_1.10.11 heatmap.plus_1.3
[45] hexbin_1.27.1 Hmisc_3.17-3 iterators_1.0.8 KernSmooth_2.23-15
[49] knitr_1.12.3 lattice_0.20-33 latticeExtra_0.6-28 leaps_2.9
[53] magrittr_1.5 MASS_7.3-45 matrixStats_0.50.1 multtest_2.22.0
[57] munsell_0.4.3 nnet_7.3-12 plyr_1.8.3 preprocessCore_1.28.0
[61] RColorBrewer_1.1-2 Rcpp_0.12.4 RCurl_1.95-4.7 reshape2_1.4.1
[65] ROCR_1.0-7 rpart_4.1-10 Rsamtools_1.18.3 scales_0.4.0
[69] scatterplot3d_0.3-36 sendmailR_1.2-1 siggenes_1.40.0 splines_3.1.1
[73] stringi_1.0-1 stringr_1.0.0 tools_3.1.1 VariantAnnotation_1.12.9
[77] xtable_1.8-2 zlibbioc_1.12.0
Thanks: The following code corrected the issue.
It will take a few hours to run the eQTL analysis in R before I will have confirmation that the manually constructed smlset works downstream.
I would also comment that the gQTLstats package in Bioconductor includes new facilities for eQTL analysis for genomic quantifications in SummarizedExperiment containers, and variant data in VCF files. cisAssoc and AllAssoc functions are available. This is intended to reduce complexity in handling variant data in R.