Hi,
Question 1:
I was wondering regarding efficiency of fixedStep vs. variableStep in the case of sparse data.
I have a program that generates a wiggle file of scores that I compute for each position in a gene. i.e., I have a vector which is of the length of the gene - some are zeros and some are non zeros.
I don't need to present the non zeros, so it seems to me more efficient to use the variableStep wiggle and to print to the text file only the non zeros positions, e.g.
variableStep chrom=chr19 span=1
13411 1.2
13412 7.5
13416 3.4
13417 11.12
I was wondering whether it would be more efficient to print the entire vector (including zeros) and to use the fixedStep. i.e. for the example above:
fixedStep chrom=chr19 span=1
13411 1.2
13412 7.5
13413 0
13414 0
13415 0
13416 3.4
13417 11.12
A few notes:
- For one gene example that I checked, out of 13,600 entries, only 1100 were non zeros.
- I should carry out this process for each gene in the human genome, and for each gene there will be 4 such vectors.
- I'm using the wigToBigWig after generating the wiggle text file.
- The reason I was wondering regarding the issue is due to the following paragraph from the wiggle documentation page:
Caution for sparse variableStep data
The wiggle format was designed for quickly displaying data that is quite dense. The variableStep format, in particular, becomes very inefficient when there are only a few data points per 1,024 bases. If variableStep data points (i.e., chromStarts) are greater than about 100 bases apart, it is advisable to use BedGraph format.
(but I can't use the BedGraph, I must use wiggle).
Question 2:
When writing a wiggle file (either variableStep/fixedStep) should the data of the same chromosome come in sequence? does it matter, efficiency-wise if we write
variableStep chrom=**chr19 **span=1
13411 1.2
13412 7.5
13416 3.4
13417 11.12
variableStep chrom=chrY span=1
pos1 score1
pos2 score2
...
variableStep chrom=**chr19** span=1
34567 1.2
34580 6.5
34597 13.4
instead of:
variableStep chrom=**chr19 **span=1
13411 1.2
13412 7.5
13416 3.4
13417 11.12
34567 1.2
34580 6.5
34597 13.4
variableStep chrom=chrY span=1
pos1 score1
pos2 score2
...
I'd appreciate any guidance!
I suspect that since you are converting to bigWig whatever advice you read on text wiggle files will not apply to binary wiggle files.
It is quite even possible that both fixed and variable wiggle entries get written to the same format in the bigwig format.