Hello! I am trying to convert wig files to bigwig using UCSC kent module -
grep -hv 'track' in.wig > 1.wig
sed '1d' 1.wig > 2.wig
wigToBigWig 2.wig -clip chrom.sizes 2.bw
I get the error - hashMustFindVal: 'chr2CEN' not found
I don't think this is a genome version error, I tried the most current and a previous version and still get the error.
I already looked for answers in -
Not sure but could this is contig (like chrUN) in the wiggle file that has no match in the reference. If so, i would just delete all occurrences of it with awk/sed.
It seems like not just chr2CEN, when I delete occurrences of this I end up getting error with a different chr. If it is not the genome version, could it be a difference between UCSC and EMBL annotations?
It could be but chr prefix seems to indicate that this is likely UCSC version. Is that what you used for the original alignments? You can't mix and match these files.
The analysis was done by someone else and I am just trying to use their published wig files for some analysis. I can verify with the authors the genome build they used, thanks!
There is no issue at all. This isn't a mistake in the files. As the years go by the reference genomes are updated. This updating involves that changing of the coordinates of many genes such as the start or end positions. As I understand it, one of the products of doing this is that some new or old parts of the genome do not have enough evidence to be included as part of the main chromosomes INT/X/Y/M so rather than they being deleted from the reference, they are added as an addictional "contig" with their own scaffold name i.e. chrUN and other variations.
As I said, this is as I understand it - I;m sure genomax would have a better explanation.
Thank you for the explanation, that makes a lot of sense. I am still trying to figure out the genome build of the files so I can convert them to the new build, instead of deleting the corresponding lines from the files.
You need to make sure that all of the chromosome names in your wiggle file are accounted for in the chrom.sizes file.
For instance, if my wiggle file looks like this:
variableStep chrom=chr2CEN
3003560 0
And my chrom.sizes file looks like this:
chr2CEN 242193529
Then I can still run wigToBigWig just fine:
wigToBigWig test1.wig test.chrom.sizes out1.bw
Now whether that wiggle will actually display in the genome browser is a different story, but it seems to me that your wiggle just has incorrect chromosome names and needs to be fixed.
If you have further questions about UCSC data or tools feel free to send your question to one of the below mailing lists:
Hi Chris,
I am not sure if that is the issue in my case. This is how my wig file format looks -
0
track type=wiggle_0
variableStep chrom=chr2L
I removed the 1st line and track line before running wigToBigWig and I used fetchChromSizes to get the chrom.sizes file. Am I missing something here? Thanks for your help!
Yes what are the other chrom lines like in the wiggle file though? Do all the chromosome names correspond to what is the chrom.sizes file? Try grepping for 'chr2' from your wiggle file and see what shows up. Or something like this to get only the chromosome names:
$ grep chrom userWig2.wig | cut -d'=' -f2
You can also try something like this find chromosomes in the wiggle that aren't in the chrom.sizes file, because you mentioned it fails on different chromosome names if you remove a particular one:
If that doesn't output anything then it would help if you could share a link to the wiggle file you are trying to convert. If the file is private the genome-www address I mentioned in my previous response will only be seen UCSC Genome Browser staff.
In this case the positions 1000-1004 are supposed to have the value 0.56 but then on the next line positions 1001-1005 are supposed to have value 0.55, and since a single position (in this example coordinates 10001-1004) can't have more than one value, you get an error.
You will have to decide for yourself whether or not it is a good idea to remove these redundant lines or not. Getting into contact with whoever made the file and figuring out how the file was made is probably the best option, especially so you can figure out how the strange chromosome names got into the file as well.
Not sure but could this is contig (like chrUN) in the wiggle file that has no match in the reference. If so, i would just delete all occurrences of it with awk/sed.
It seems like not just chr2CEN, when I delete occurrences of this I end up getting error with a different chr. If it is not the genome version, could it be a difference between UCSC and EMBL annotations?
It could be but
chr
prefix seems to indicate that this is likely UCSC version. Is that what you used for the original alignments? You can't mix and match these files.The analysis was done by someone else and I am just trying to use their published wig files for some analysis. I can verify with the authors the genome build they used, thanks!
Yeah....I usually just delete all of those from the file....so basically run an if loop ..
you are not going to be able use those contigs anyway
just check the printed contig.file to make sure u are not deleting anything important
There will only be a few lines (<30 I expect)
Hello @kennethcondon2007, thank you I will try that. Would you happen to know the reason for this issue with wigs?
There is no issue at all. This isn't a mistake in the files. As the years go by the reference genomes are updated. This updating involves that changing of the coordinates of many genes such as the start or end positions. As I understand it, one of the products of doing this is that some new or old parts of the genome do not have enough evidence to be included as part of the main chromosomes INT/X/Y/M so rather than they being deleted from the reference, they are added as an addictional "contig" with their own scaffold name i.e. chrUN and other variations.
As I said, this is as I understand it - I;m sure genomax would have a better explanation.
Thank you for the explanation, that makes a lot of sense. I am still trying to figure out the genome build of the files so I can convert them to the new build, instead of deleting the corresponding lines from the files.