Extracting data from bigWig/BED using threshold?
2
0
Entering edit mode
8.2 years ago
mmmmcandrew ▴ 200

Hi all- I have generated a bigWig file from MNase-seq data using a normalization that results in a maximum y-axis value of 1 (in other words, the coverage has been normalized to 0-100%). I would like to be able to extract all of the data where the y-value is less than or greater than a certain value (say 0.7) AND maintains this value over a continuous stretch (say 1000 bp). Is there a tool available that would be useful for this analysis? I could also easily convert the file to BED format if that would make it easier. Alternatively, I have used bwTools to extract data above or below a certain value threshold, but bwTools does not allow size thresholding. Is there a program that would allow size thresholding of the data that has already been filtered for value? TLDR; need a program that will allow thresholding for value AND size (in bp). Thanks!

next-gen sequencing • 3.8k views
ADD COMMENT
0
Entering edit mode

You'll need to write something to do stuff like this with a single program. You can do it in python with pyBigWig easily enough.

ADD REPLY
3
Entering edit mode
8.2 years ago
mmmmcandrew ▴ 200

Hey guys- Here is what I ended up doing in case anyone encounters a similar issue: -Use bwtools remove to eliminate data above or below a certain threshold (I did this with my y-axis conditions which ranged from 0 - 1, but the same could be accomplished with number of reads) -Convert the bigwig to wig using UCSCtools bigwigtowig -Convert the wig to bed using bedtools convert2bed (be sure to use the correct index!) -Use awk '{ if ($3 - $2 >=threshold) print $0;}' to print lines containing fragments over a certain size

ADD COMMENT
4
Entering edit mode
8.2 years ago

Convert from bigWig to wig using bigWigToWig from UCSC Kent tools:

$ bigWigToWig regions.bw regions.wig

Convert from wig to BED via BEDOPS wig2bed:

$ wig2bed < regions.wig > regions.bed

Use BEDOPS bedmap to retrieve the minimum or maximum score across each region's 1000-base window (centered on the region):

$ bedmap --delim '\t' --echo --min --range 500 regions.bed > min_answer.bed
$ bedmap --delim '\t' --echo --max --range 500 regions.bed > max_answer.bed

Use awk to filter each *_answer.bed file for elements which have a minimum or maximum score threshold.

For example, this gives you all elements with a score value of 0.7 or greater over a 1000-base window centered on each element.

$ awk '$NF >= 0.7' min_answer.bed > min_answer_thresholded.bed

And vice versa:

$ awk '$NF < 0.7' max_answer.bed > max_answer_thresholded.bed
ADD COMMENT
0
Entering edit mode

Thanks so much for taking the time to reply! I think this will be very helpful. I do have one concern, however. Will bedops arbitrarily break the data into 1000-bp windows? If I'm interpreting this correctly, that could interrupt regions that have a score value of 0.7 or greater and are 1000 bp or greater. For instance, if the last 200 bp of one window has a score value of 0.7 or greater and the first 800 bp of the next window has a score value of 0.7 or greater, but bedmaps only looks at each window individually, won't it miss this 1000 bp region with score value of 0.7 or greater? Sorry if I'm misunderstanding, I am quite new to bioinformatic analysis.

ADD REPLY
1
Entering edit mode

Actually, thinking on this a bit, I think you're right. You can just use bedmap with its own --range operator, which maps an element to all elements within the specified number of bases. I edited my answer with the simplified approach.

ADD REPLY
0
Entering edit mode

The bedmap step will look at the score value of any overlapping window. The windows are created from the regions you start with, so potential overlaps should be restricted to windows around those regions you start with. It might help to take some sample inputs and look at the output at each step, to see how this works.

ADD REPLY

Login before adding your answer.

Traffic: 2082 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6