Entering edit mode
7.5 years ago
Chirag Parsania
★
2.0k
Hi,
I have two dataframes (in R) A and B of Granges. feature mentioned in the B is within the feature of A.
> a
GRanges object with 2 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] ChrD_C_glabrata_CBS138 [451956, 454735] +
[2] ChrD_C_glabrata_CBS138 [451956, 454735] +
-------
seqinfo: 14 sequences from an unspecified genome; no seqlengths
> b
GRanges object with 2 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] ChrD_C_glabrata_CBS138 [452667, 454092] +
[2] ChrD_C_glabrata_CBS138 [452667, 454092] +
-------
seqinfo: 14 sequences from an unspecified genome; no seqlengths
findOverlaps from GRanges package gives following output
> findOverlaps(a,b)
Hits object with 4 hits and 0 metadata columns:
queryHits subjectHits
<integer> <integer>
[1] 1 1
[2] 1 2
[3] 2 1
[4] 2 2
-------
queryLength: 2 / subjectLength: 2
I want subject hits only if query covered by > 90 %. I tried minoverlap
argument of findOverplaps
but no success. Or In other words, for a given query feature how to find what % of the query overlapped with subject hits ?
Expected output should not contain any subject hits.
> findOverlaps(a,b)
Hits object with 0 hits and 0 metadata columns:
queryHits subjectHits
<integer> <integer>
-------
queryLength: 2 / subjectLength: 2
~Chirag.
You can use pintersect function from IRanges package.
http://svitsrv25.epfl.ch/R-doc/library/IRanges/html/IRanges-setops.html
One of the solution I found here
https://support.bioconductor.org/p/72656/