Hi:
I have list of peak files in GRanges
object that needed to make specific duplicate removal. Because the condition of duplicate removal for each peaks are different. However, I want to do complete duplicate removal for first list element; for second list element, I need to search the row that appear more than twice (freq >2), and only keep one row; for third list element, search over the row that appear more than three times (freq>3), and keep two rows in that data.frame. I am seeking more programmatic and dynamic solution for this needs. How can make this happen ? Any way to make this type of conditional duplicate removal for list of peak files in data.frame ? Any idea ?
Mini example :
for mini example, I took skeleton of peaks (coordinate of peaks with score value) and represent it in data.frame ;
myList <- list(
bar= data.frame(start=c(9,19,34,54,70,82,136,9,34,70,136,9,82,136),
end=c(14,21,39,61,73,87,153,14,39,73,153,14,87,153),
score=c(48,6,9,8,4,15,38,48,9,4,38,48,15,38)),
cat = data.frame(start=c(7,21,21,72,142,7,16,21,45,72,100,114,142,16,72,114),
end=c(10,34,34,78,147,10,17,34,51,78,103,124,147,17,78,124),
pos=c(53,14,14,20,4,53,20,14,11,20,7,32,4,20,20,32)),
foo= data.frame(start=c(12,12,12,58,58,58,118,12,12,44,58,102,118,12,58,118),
end=c(36,36,36,92,92,92,139,36,36,49,92,109,139,36,92,139),
pos=c(48,48,48,12,12,12,5,48,48,12,12,11,5,48,12,5))
)
I am seeking more programmatic solution to make this specific duplicate removal for my data. How can I make specific duplicate removal if input is list of data.frame ?
This is my desired output :
expectedList <- list(
bar= data.frame(start.pos=c(9,19,34,54,70,82,136),
end.pos=c(14,21,39,61,73,87,153),
pos.score=c(48,6,9,8,4,15,38)),
cat= data.frame(start.pos=c(7,21,72,142,7,16,45,100,114,142,16,114),
end.pos=c(10,34,78,147,10,17,51,103,124,147,17,124),
pos.score=c(53,14,20,4,53,20,11,7,32,4,20,32)),
foo= data.frame(start.pos=c(12,12,44,58,58,118,102,118,118),
end.pos=c(36,36,49,92,92,139,109,139,139),
pos.score=c(48,48,12,12,12,5,11,5,5))
)
Any way to make this happen ? How can I achieve my desired output ? Any idea ? Thanks a lot :)
Dear Alex :
Thanks for your detailed answer, I'll apply your solution.
Best regards :
Jurat