I'm using PileOMeth to create bedGraph-style files reporting the number of methylated and unmethylated bases at C positions in the genome. This is for a bisulfite sequencing experiment.
My data are paired end.
In many cases (but not all) reads from the same fragment overlap.
This means that the same C bases from many sequenced fragments were measured twice - once from the forward read and a second time from the reverse read.
Is there a way to ensure that PileOMeth doesn't double count these bases? Or does it already take overlaps into account when reporting methylation metrics?
Also I'd be interested to know whether you think double-counting is a problem or not.
Unrelated to original post but choice of name for that program seems unfortunate. Perhaps you should consider changing it.
No arguments, it was only meant to be a temporary name. I'm certainly open to better ones :)
Anything else but that may be better (PileTheMethyl, sorry can't come up with anything clever) before that name sticks for good.
Maybe "methylScrum"? Anyone have thoughts on that? At least its not on the main usegalaxy.org site (yet...).
I assume the scrum reference is from rugby? I would go with that instead of you know what :-)
Yeah, I was going for a non-traffic accident version of "pileup".
If you're worried about search indexing, I would suggest going to a memorable name (hopefully completely unique) and then associating that name with the appropriate terms on the GitHub readme file or project site.
I'd note that "methylup" isn't taken (aside from maybe this) and it's kind of catchy.