Question

Wig, Bigwig And Bedgraph, Any In Depth Description ?

5

Entering edit mode

12.9 years ago

Radhouane Aniba ▴ 790

Hello everyone,

All of us, at least people working on NGS data analysis, are familiar with these three data types, and we are refering to UCSC documentation to understand their content and meaning. That's nice, but still there is some black boxes not necessary clear especially these questions :

how comes that these files generated for the same experiment by different programs, create different values and different distribution
a question that is more related to these programs now, how are these files created from reads alignments ? What algorithm is behind these files ?
documentation on the format are clear enough, but significance and meaning of the scores inside each file are not explained, what are these values referring to ?
look at this sentence from UCSC " The BedGraph format allows display of continuous-valued data in track format. This display type is useful for probability scores and transcriptome data" : probability score ? Of what ? How ?

I hope i am not merging a lot of questions in a single post, they are all related and i think it is worth mentionning them in block so that we can discuss them in the same time.

Thanks for all.

Rad

wiggle • 13k views

ADD COMMENT • link updated 11.1 years ago by Biostar 20 • written 12.9 years ago by Radhouane Aniba ▴ 790

score 6 · Answer 1 · 2012-01-18

http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html#CustomTracks

http://genome.ucsc.edu/goldenPath/help/bedgraph.html

http://genome.ucsc.edu/goldenPath/help/wiggle.html

http://genome.ucsc.edu/goldenPath/help/bigWig.html

"how comes that these files generated for the same experiment by different programs, create different values and different distribution"

Welcome to bioinformatics.

"how are these files created from reads alignments ? What algorithm is behind these files ?"

samtools pileup, genomeCoverageBed from bedtools are two ways...

"significance and meaning of the scores inside each file are not explained, what are these values referring to ?"

That depends on who made them and what data they are based on and what they did to it, but most likely read count or log2(IP/input) or something like that.

"probability score ? Of what ? How ?"

probability that each nucleotide belongs to a conserved element? The conservation tracks created by UCSC are probability based.

score 5 · Answer 2 · 2012-01-14

5

Entering edit mode

12.9 years ago

Sean Davis 27k

These are formats that are meant to be general. For example, the BedGraph format is meant to store continuous-valued data. Documentation of what those continuous values represent is not a required part of the format. Sometimes, UCSC tracks contain descriptions within the files themselves, but often the interpretation of the values is left to external documentation.

If you asking about specific tracks at UCSC, you can either write to UCSC mailing list for details about tracks of interest or read the track description available by clicking on the bar to the left of the track on the browser.

ADD COMMENT • link 12.9 years ago by Sean Davis 27k

6

Entering edit mode

The values contained in the files will vary. For example, sometimes a wig file will contain p-values, sometimes log10(p-values), sometimes base coverage, sometimes a value between 0 and 1. All of these and any other values are allowed and possible. The algorithm for making the files is also not defined. Sometimes it will be a statistical model, sometimes read counts, sometimes hidden markov model, sometimes something else. So, you MUST specify the specific track or file to get your answer, as there is NO answer to your more general question. Hope that helps.

ADD REPLY • link 12.9 years ago by Sean Davis 27k

0

Entering edit mode

Thanks Sean, actually my question is not about the format itself, but what these files contains and how they are created ? What the values in wig files for example represent for the reads aligned to a genome ? We talk about density, but density of what and how these densities are calculated and why they are not normalized such as probabilities, so that one can know for example if a value of 3e4 is meaningful or not and compared to what background

ADD REPLY • link 12.9 years ago by Radhouane Aniba ▴ 790

0

Entering edit mode

In which tracks are you interested? Perhaps you can edit your original question to include a link to your UCSC session. You could also consider writing to UCSC (or the original source of the files) for details. There is not a standard way of creating wig, bigwig, or bedgraph from NGS data.

ADD REPLY • link 12.9 years ago by Sean Davis 27k

0

Entering edit mode

Thanks Sean, there is no specefic track in mind, my question is co serning these files in general, to make it simple let's say it that way : what are these files containing as information ? What are the values that they contain refer to ? How are they created ?

ADD REPLY • link 12.9 years ago by Radhouane Aniba ▴ 790

0

Entering edit mode

It helps in a sense, to know that we don't have a standardization for these data which could lead to problems comparing different experiments. Thx for your answers.

ADD REPLY • link 12.9 years ago by Radhouane Aniba ▴ 790

0

Entering edit mode

The tradeoff for lack of a standard is total flexibility. I agree that this can lead to confusion.

ADD REPLY • link 12.9 years ago by Sean Davis 27k