PileOMeth double counting when paired-end reads overlap?
2
1
Entering edit mode
8.2 years ago
Ann ★ 2.4k

I'm using PileOMeth to create bedGraph-style files reporting the number of methylated and unmethylated bases at C positions in the genome. This is for a bisulfite sequencing experiment.

My data are paired end.

In many cases (but not all) reads from the same fragment overlap.

This means that the same C bases from many sequenced fragments were measured twice - once from the forward read and a second time from the reverse read.

Is there a way to ensure that PileOMeth doesn't double count these bases? Or does it already take overlaps into account when reporting methylation metrics?

Also I'd be interested to know whether you think double-counting is a problem or not.

PileOMeth Bisulfite methylation • 3.0k views
ADD COMMENT
2
Entering edit mode
8.2 years ago

By default, PileOMeth will never double count bases from overlapping alignments of paired-end reads (i.e., it does what you want out of the box) :)

For those curious, you can make it do so by specifying a minimum phred score of 0. These is because it internally does something very similar to samtools mpileup and adjusts base phred qualities in overlapping regions, which results in one of the mates getting bases with scores of 0 there.

Edit: I forgot to address whether I think this is a problem. The answer is yes and I've encouraged other packages (namely bismark) to change their defaults to handle this properly out of the box.

ADD COMMENT
0
Entering edit mode

Unrelated to original post but choice of name for that program seems unfortunate. Perhaps you should consider changing it.

ADD REPLY
0
Entering edit mode

No arguments, it was only meant to be a temporary name. I'm certainly open to better ones :)

ADD REPLY
0
Entering edit mode

Anything else but that may be better (PileTheMethyl, sorry can't come up with anything clever) before that name sticks for good.

ADD REPLY
0
Entering edit mode

Maybe "methylScrum"? Anyone have thoughts on that? At least its not on the main usegalaxy.org site (yet...).

ADD REPLY
0
Entering edit mode

I assume the scrum reference is from rugby? I would go with that instead of you know what :-)

ADD REPLY
0
Entering edit mode

Yeah, I was going for a non-traffic accident version of "pileup".

ADD REPLY
0
Entering edit mode

If you're worried about search indexing, I would suggest going to a memorable name (hopefully completely unique) and then associating that name with the appropriate terms on the GitHub readme file or project site.

I'd note that "methylup" isn't taken (aside from maybe this) and it's kind of catchy.

ADD REPLY
1
Entering edit mode
8.2 years ago
Ann ★ 2.4k

Great news.

I have been using this a lot - together with bwameth. Am getting very satisfying results. Thank you!

As for names ... how about PileOBiseq ? Or BiseqTools?

Where "biseq" is an abbreviation for "bisulfite"

Seems like when you want to publish some software, reviewers will weigh in on the name. It's a bit like how reviewers used to weigh in on gene names. (Do they still do that?) In the end, what really matters (in my opinion) is ease of use for the software. I am happy with whatever name the author has given, provided it's something I can teach in a class without students complaining to my Dept. Chair.

ADD COMMENT
0
Entering edit mode

I've finally gotten around to renaming PileOMeth. It's now "MethylDackel" (Dackel is German for dachshund and rhymes with methyl). Hopefully that name proves less problematic :)

ADD REPLY

Login before adding your answer.

Traffic: 1646 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6