How do I filter interactions that are more than 100kb and less than 3kb
0
0
Entering edit mode
3.8 years ago

I am working on 4C data where I have a .txt file that contains chromosome, start,end, nReads, RPMs, p.value, q.value and I am only interested in significant interactions in chr15 and later want to filter the interactions that are farther than 100kb and nearer to 3kb.

library(r3Cseq)
library(BSgenome.Hsapiens.UCSC.hg19.masked)
library(GenomicRanges)
library(Homo.sapiens)

kura.int <- read.table("KURA_DpnII.interaction.txt", header = T)
kura_data <- kura.int[kura.int$chromosome == "chr15" & kura.int$q.value > 0.1, ]
kura.int.gr <- makeGRangesFromDataFrame(kura_data, keep.extra.columns = T)

id <- "91433"
rccdGene <- genes(TxDb.Hsapiens.UCSC.hg19.knownGene,
                  filter=list(gene_id=id))

rccdPromoter <- start(rccdGene)
kura_end <- ((rccdPromoter+kura_data$end)/2)

kura <- cbind(rccdPromoter, kura_end)
kura_2 <- cbind(kura, kura_data$chromosome)
colnames(kura_2) <- c("start", "end", "chr")

kura_3 <- kura_2[distance(kura_2$start, kura_2$end)<=100000]

In "kura_2" matrix I have 3 columns namely "chr", "start" and "end" where I have a new start as a promoter of the gene and different endings. So I tried the wrote the above block of code but when I come to the filtering step used function "distance" I am getting this error

Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘distance’ for signature ‘"character", "character"’

Now I have a kura_2 matrix which contains 3 columns namely "chr" "start" "end"

interactions

Now, how do I filter the interactions that are more than 100kb and less than 3kb between the start and end? Is there a better way to filter out the interactions? Thank you in advance

bed rna genome programming genomics • 1.5k views
ADD COMMENT
0
Entering edit mode

Can you provide an example of the data in the post, and what you want the end product to look like? dput(head(kura_2, n=10)) will output it the first 10 rows of the matrix in a form you can share in your post.

ADD REPLY
0
Entering edit mode

I have updated the question attaching the kura_2 data. The output will typically look the same with chr, start and end but without the interactions that are greater than 100kb and less than 3kb.

ADD REPLY
0
Entering edit mode

From your post you say that start is the start of the promoter. What does end represent then, the other points in the genome that were interacting with the promoter?

Also, why are there decimal places in the end column? Genomic coordinates are usually represented as integers.

ADD REPLY
0
Entering edit mode

Yes, the new start is the promoter of the gene and the new end is ((start+end)/2) that's the reason I have float values because in this way it is easy to plot interactions from my promoter (bait)

ADD REPLY
0
Entering edit mode

Alright, and one more question. Right now you are asking to keep interactions that are more than 100kb AND less than 3kb. Do you mean more than 100kb OR less than 3kb, or instead do you mean between 100kb and 3kb?

ADD REPLY
0
Entering edit mode

I want to remove the interactions that are further away than 100kb from the promoter and also that are closer than 3kb from the promoter because in 4C we can't tell if these closer interactions are artifacts or real ones, most likely self-ligation products which I am not interested.

ADD REPLY

Login before adding your answer.

Traffic: 1613 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6