Hello everyone,
All of us, at least people working on NGS data analysis, are familiar with these three data types, and we are refering to UCSC documentation to understand their content and meaning. That's nice, but still there is some black boxes not necessary clear especially these questions :
- how comes that these files generated for the same experiment by different programs, create different values and different distribution
- a question that is more related to these programs now, how are these files created from reads alignments ? What algorithm is behind these files ?
- documentation on the format are clear enough, but significance and meaning of the scores inside each file are not explained, what are these values referring to ?
- look at this sentence from UCSC " The BedGraph format allows display of continuous-valued data in track format. This display type is useful for probability scores and transcriptome data" : probability score ? Of what ? How ?
I hope i am not merging a lot of questions in a single post, they are all related and i think it is worth mentionning them in block so that we can discuss them in the same time.
Thanks for all.
Rad
The values contained in the files will vary. For example, sometimes a wig file will contain p-values, sometimes log10(p-values), sometimes base coverage, sometimes a value between 0 and 1. All of these and any other values are allowed and possible. The algorithm for making the files is also not defined. Sometimes it will be a statistical model, sometimes read counts, sometimes hidden markov model, sometimes something else. So, you MUST specify the specific track or file to get your answer, as there is NO answer to your more general question. Hope that helps.
Thanks Sean, actually my question is not about the format itself, but what these files contains and how they are created ? What the values in wig files for example represent for the reads aligned to a genome ? We talk about density, but density of what and how these densities are calculated and why they are not normalized such as probabilities, so that one can know for example if a value of 3e4 is meaningful or not and compared to what background
In which tracks are you interested? Perhaps you can edit your original question to include a link to your UCSC session. You could also consider writing to UCSC (or the original source of the files) for details. There is not a standard way of creating wig, bigwig, or bedgraph from NGS data.
Thanks Sean, there is no specefic track in mind, my question is co serning these files in general, to make it simple let's say it that way : what are these files containing as information ? What are the values that they contain refer to ? How are they created ?
It helps in a sense, to know that we don't have a standardization for these data which could lead to problems comparing different experiments. Thx for your answers.
The tradeoff for lack of a standard is total flexibility. I agree that this can lead to confusion.