I am working on ripseq analysis and have used ripseeker tool for the same. I have used main function ripSeek to predict RIP for my data which gives output annotated file with the following headers . Can anyone please provide me description for the headers as i am not able to interpret the results.
Using the help function ?ripSeek, you should see the following information:
The results as GRangesList generated from the RIP peak detection. Each list item represents the RIP peaks on a chromosome accompanied with statistical scores including (read) count, logOddScore, pval, pvalAdj, eFDR for the RIP and control (if available). Please refer to seekRIP for more details.
If you follow the instructions to look up ?seekRIP, you will find the following info:
The RIPScore is compupted in computeLogOdd as the log odd ratio of the posterior for the RIP state (zi = 2) over the posterior for the background state (zi = 1) in RIP library. When control is avail- able, the RIPScore is updated by the difference between the RIPScores between RIP and control. The adjacent bins with hidden states predicted by nbh_vit as the enriched state (corresponding to the NB with larger mean) are merged. The RIPSscores are averaged over the merged bins. To assess the statistical significance of the RIPScore for each region, we assume that the RIPScore follows a Gaussian (Normal) distribution with mean and standard deviation estimated using the RIPScores over all of the bins. The rationale is based on the assumption that most of the RIPScores correspond to the background state and together contribute to a stable estimate of the test statistics (TS) and p-value computed using the R built-in function pnorm.
The p-value is adjusted by p.adjust with BH method by default. The same procedure is applied optionally to the control library. Only when the control is available, is an empirical false discovery rate (eFDR) estimated based on the idea of "sample swap" inspired by MACS (a ChIP-seq algorithm from Zhange el al. (2008). At each p-value, RIPSeeker finds the number of significnat RIP-regions over control (CTL) based on pvalRIP and the number of significant control regions over RIP based on pvalCTL. The eFDR is defined as the ratio of the number of "RIP" (false positive) regions iden- tified from CTL-RIP comparison over the number of RIP regions from the RIP-CTL comparison. The maximum value for eFDR is 1 and minimum value for eFDR is max(p-value, 0). The former takes care of the case where the numerator is bigger than the denominator, and the latter for zero numerator.
Thanq so much for your help, but still i am not able to understand what these following columns refer to :
peak feature start_position end_position feature_strand insideFeature distancetoFeature shortestDistance fromOverlappingOrNearest
ripSeek will return the annotated RIP predictions and the enriched GO terms corresponding to the genomic context of the RIP predictions.
For ?annotateRIP, the documentation states:
Given the genomic coordinates of each predicted RIP regions, query the Ensembl database whether each region is nearby or overlaps any known (noncoding) genes.
To access the up-to-date Ensembl database, RIPSeeker employs useMart and getAnnotation from biomaRt and ChIPpeakAnno Bioconductor packages to dynamically establish internet connection to the database and retrieve the up-to-date annotations. Then, annotatePeakInBatch from ChIP- peakAnno is used to efficiently annotate all of the predicted regions based on the Ensembl anno- tation. A predicted region may overlap multiple genes, all of which will be reported as separate records. Moreover, getEnrichedGO from ChIPpeakAnno is applied to the annotated predictions to discover enriched Gene Ontology (GO) terms involving the protein-associated transcriptome.
In order to use old annotation (e.g., mm9 v.s. mm10), user also needs to specify the host and biomart
arguments accepted within useMart. To access to mouse annotation from Ensembl version 65, for
instance, user needs to call annotateRIP(..., dataset="mmusculus_gene_ensembl", biomart="ENSEMBL_MART_ENSEM host="dec2011.archive.ensembl.org", ...), which will run useMart(dataset="mmusculus_gene_ensembl", biomart="ENSEMBL_MART_ENSEMBL", host="dec2011.archive.ensembl.org", ...) to get the
mm9 annotation from Ensembl (v65).
In short, RIPSeeker not only identifies regions of enrichment ("peaks"), it also does you the favor of telling you which genes are either overlapping or nearby these sites of enrichment. These genes ("features") are defined by their start and end positions in the genome (as well as strand, of course).
I'm using the ripseeker to analyze my RIPseq data. I'm studying on fly. My code as below:
------------------------------------------------------------------------------------------------library(RIPSeeker)library('biomaRt')library("curl")library("org.Dm.eg.db")outDir <- file.path(getwd(), "TEST6")# Parameters settingcNAME <- "Non-Arc-R1.sorted.bam"binSize <- NULL # set to NULL to automatically determine bin sizeminBinSize <- 11000 # min bin size in automatic bin size selectionmaxBinSize <- 12000multicore <- TRUEstrandType <- "-" # set strand type to minus strandbiomart <- "ensembl"biomaRt_dataset <- "dmelanogaster_gene_ensembl" goAnno <- "org.Dm.eg.db" # GO annotation database seekOut.PRC2 <- ripSeek(bamPath = "/home/congxiao/captain/RIPSEQ/Arc_Cop_RIP/test_test/test2/test", cNAME = cNAME, reverseComplement = TURE, strandType = strandType, paired = TRUE, uniqueHit = TRUE, assignMultihits = TRUE, rerunWithDisambiguatedMultihits = TRUE, binSize=binSize, minBinSize = minBinSize, maxBinSize = maxBinSize, biomart = biomart, biomaRt_dataset = biomaRt_dataset, outDir = outDir)
My library is a directional strand library. I used Hisat2, I tried both Dm3/BDGP6 as ref, I changed the strandType with “+”, “-”, “*”, and reverseComplement as TRUE or FALSE.
There are always some problems.
The main errors are: There are no more than 2 genes for annotationRIP;
OR I delete the goAnno, the RIPregions_annotated.gff3 has only a few lines, it should not be like that.
I set the genomeBiuld as 'dm', 'dm3', or just delete it, the final result seems no different.I’m wondering what’s wrong with my parameters.Could you give some suggestions?
Thanq so much for your help, but still i am not able to understand what these following columns refer to : peak feature start_position end_position feature_strand insideFeature distancetoFeature shortestDistance fromOverlappingOrNearest
What do you think this could refer to?
In the documentation it states:
For
?annotateRIP
, the documentation states:In short, RIPSeeker not only identifies regions of enrichment ("peaks"), it also does you the favor of telling you which genes are either overlapping or nearby these sites of enrichment. These genes ("features") are defined by their start and end positions in the genome (as well as strand, of course).