Question About Chippeakanno And Iranges
2
0
Entering edit mode
12.5 years ago
e.karasmani ▴ 140

Dear All,

I have the following code....

library(ChIPpeakAnno)
peak.file =read.table ("D:\\Analysis\\peak.target.genes.list.txt", header=T, sep='\t')
pr <- RangedData(IRanges(peak.file$peak.location.new, width=1), space=peak.file$chromosome, idx=1:nrow(peak.file))

My first question is....is the way that I am making the IRANGE correct? the file that I am using has the following format

chromosome  start     end        peak.location.new     chip.value  target.gene.name   distance.to.gene
  chr1       162990333 162990703     162990519      33    RP11-331H2.3.1               136

then i continue with the following

if (interactive())
{
mart<-useMart(biomart="ensembl",dataset="mmusculus_gene_ensembl")
Annotation = getAnnotation(mart, featureType=c("TSS","miRNA", "Exon", "5utr", "3utr", "ExonPlusUtr", "transcript"))
}


data(pr)
data(Annotation)
annotatedPeak = annotatePeakInBatch(pr, AnnotationData=Annotation)
pk= as.data.frame(  annotatedPeak)

write.table (pk, file="D:\\Analysis\\peak.target.genes.list4.txt",quote = FALSE,col.names =TRUE,row.names=FALSE, sep="\t")

what should I do in order to identify if peaks are in intron exon promoter and 3'utr?

What am I doing wrong here?

Because right now in the output data frame it only gives me only information about if the peak is upstream, downstream or inside.

How can I define if a peak is in promoter, exon or intron?

This is the output that I get

start          end      width  start_position     end_position    insideFeature distancetoFeature

120301248    120301248    1     120285635       120506009     inside           15613
243680061    243680061      1    197054472      197054592   downstream     46625589

Could you please help me???

Thank you in advance

best regards Lena

chip-seq exon intron peak-calling • 5.0k views
ADD COMMENT
0
Entering edit mode

Can you paste some lines from your output (pk)?

ADD REPLY
2
Entering edit mode
12.5 years ago
Ying W ★ 4.3k

When you are making the IRanges, what you are doing is making a 1bp long window of the peak (peak.location.new) instead of from start to end, is this what you want to do?

To get the exon/intron/utr seperately, you could make a different annotation file for each. Currently, when you get annotation you get all types, so you could make a different annotation file for each and then call annotatePeakInBatch() multiple times.

You can find out more about annodatePeaksInBatch() and getAnnotation() in the chippeakanno reference manual and biomaRt user guide

Maybe something like this:

Anno_exon = getAnnotation(mart, featureType=c("Exon"))
Anno_TSS = getAnnotation(mart, featureType=c("TSS"))
Anno_miRNA = getAnnotation(mart, featureType=c("miRNA"))

annotatedPeak_exon = annotatePeakInBatch(pr, AnnotationData=Anno_exon)
annotatedPeak_TSS = annotatePeakInBatch(pr, AnnotationData=Anno_TSS)
annotatedPeak_miRNA = annotatePeakInBatch(pr, AnnotationData=Anno_miRNA)
ADD COMMENT
0
Entering edit mode

thank you very much....could you please give me some more details on what should I do? I am a rookie....

ADD REPLY
1
Entering edit mode

since you are starting out, i would highly recommend you read the ChIPpeakanno Vignette since it walks though several examples. ChIPpeakAnno is a library that simplifies some procedures but by hiding its complexity so you might not understand what is going on. The typical process to annotate peaks is to take a list of peak regions of interest and take a list of annotations and do an overlap. For example, overlap all peaks found by chip-seq with all exons to find all binding peaks that fall in exons for the exon slice of a pie chart. The annotatePeakInBatch() function works like this so what it would need then is the peaks of interest (pr) and the annotation to overlap (Annotation). The Annotation file you chose earlier is from mouse annotation and it includes TSS,miRNA,Exon,etc. You are then overlapping this with pr which is a list of top binding. When you type in data(pr) it tells you when its holding and you will see that it just hold start/end locations. The complication that arises is when you have a chip-seq peak that overlaps multiple features such as both exon and intron. You can avoid this by taking the top binding point instead a region (which you do)

ADD REPLY
0
Entering edit mode

How can you define if the peaks are located in intron? Apparently from the ChIPpeakAnno you can define peaks in "TSS","miRNA", "Exon", "5utr", "3utr", "ExonPlusUtr", "transcript".....but there is no option to define if the peak is in introns.....Do you know a way to do that?

ADD REPLY
0
Entering edit mode

ChIPpeakAnno only provides the tools not the data. You get the data from biomart (see user manual linked above). If the data is not available through there you would need to create your own intron object (probably by using UCSC data) and do a findOverlaps() with it but this would require a bit of work.

ADD REPLY
0
Entering edit mode

Could you please explain to me the difference between nearestStart and overlapping? in the overlapping it is going to give me only the peaks which have an overlap? is my assumption correct?

moreover what does the maxgap exactly? In the manual it says...."will output overlapping features with maximum gap specified as maxgap between peak range and feature range"....what number do you normally use?

this is what I did

annotatedPeak_exon = annotatePeakInBatch(pr, AnnotationData=Anno_exon,output="overlapping",maxgap=100, 
PeakLocForDistance="middle",select="all")

so I was checking the difference that with another analysis where output was NearestStart....and for example I had one peak which in the overlapping it occured twice and in the nearestStart only one....so I don't get the difference between these two options...in my mind it should be the opposite....

Could you please give me some advice???

ADD REPLY
0
Entering edit mode

nearestStart: "will output the nearest features" which is almost always one thing

overlapping: will use look +/- 100bp (since that is what your maxgap is set at) from the feature so it is possible that a peak would overlap two features.

btw maxgap: "Intervals with a separation of maxgap or less are considered to be overlapping" so if you have two peaks close to each other (within 100 bp) then it is considered one peak (this is kinda how overlapping works)

ADD REPLY
0
Entering edit mode

how the program counts the distance with the maxgap??? i mean it takes the middle of the peak and the middle of the exon?

I can define that with the PeakLocForDistance and FeatureLocForDistance option??? Is my assumption correct?

So if I say middle to both of them, the program will see if my center of my peak and the centre of the exon are less than 100bp. If so it will give me the overlap (overlap end or start, or inside or include feature).

If the distance is above 100bp then this peak is going to be excluded??

Are my assumptions correct?

thank you in advance

ADD REPLY
1
Entering edit mode

I havn't used ChIPpeakAnno in a while, i believe it does something like extend the beginning and end of the peak by 100 and same for exon.

If you look at the refence manual on ChIPpeakAnno page, you can find the definition for PeakLocForDistance and FeatureLocForDistance

Since the manual is often ambiguous, I would create fake peaks around the known annotations using RangedData and IRanges and see what the results are. I dont recall offhand how these functions are implemented. You could try something like fake_pr <- RangedData(IRanges(c(10023, 10123), width=1), space=c("chr1", "chr1"))

ADD REPLY

Login before adding your answer.

Traffic: 1631 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6