Question

wig files to bigwig using UCSC kent

1

Entering edit mode

6.8 years ago

varsha619 ▴ 90

Hello! I am trying to convert wig files to bigwig using UCSC kent module -

grep -hv 'track' in.wig > 1.wig

sed '1d' 1.wig > 2.wig

wigToBigWig 2.wig -clip chrom.sizes 2.bw

I get the error - hashMustFindVal: 'chr2CEN' not found

I don't think this is a genome version error, I tried the most current and a previous version and still get the error. I already looked for answers in -

https://biostar.usegalaxy.org/p/11115/

bedGrapthToBigWig conversion, xxx is not found in chromosome files

Has anyone else faced this? Please let me know, thank you for your help!

wigtobigwig • 5.7k views

ADD COMMENT • link updated 6.8 years ago by genecats.ucsc ▴ 580 • written 6.8 years ago by varsha619 ▴ 90

3

Entering edit mode

Not sure but could this is contig (like chrUN) in the wiggle file that has no match in the reference. If so, i would just delete all occurrences of it with awk/sed.

ADD REPLY • link 6.8 years ago by BioinfGuru ★ 2.1k

0

Entering edit mode

It seems like not just chr2CEN, when I delete occurrences of this I end up getting error with a different chr. If it is not the genome version, could it be a difference between UCSC and EMBL annotations?

ADD REPLY • link 6.8 years ago by varsha619 ▴ 90

0

Entering edit mode

It could be but chr prefix seems to indicate that this is likely UCSC version. Is that what you used for the original alignments? You can't mix and match these files.

ADD REPLY • link 6.8 years ago by GenoMax 147k

0

Entering edit mode

The analysis was done by someone else and I am just trying to use their published wig files for some analysis. I can verify with the authors the genome build they used, thanks!

ADD REPLY • link 6.8 years ago by varsha619 ▴ 90

1

Entering edit mode

Yeah....I usually just delete all of those from the file....so basically run an if loop ..

if (the line does NOT contain chrINT|chrX|chrY)
{
print the line to contig.file
delete the line from the wiggle file
}

you are not going to be able use those contigs anyway

just check the printed contig.file to make sure u are not deleting anything important

There will only be a few lines (<30 I expect)

ADD REPLY • link 6.8 years ago by BioinfGuru ★ 2.1k

0

Entering edit mode

Hello @kennethcondon2007, thank you I will try that. Would you happen to know the reason for this issue with wigs?

ADD REPLY • link 6.8 years ago by varsha619 ▴ 90

0

Entering edit mode

There is no issue at all. This isn't a mistake in the files. As the years go by the reference genomes are updated. This updating involves that changing of the coordinates of many genes such as the start or end positions. As I understand it, one of the products of doing this is that some new or old parts of the genome do not have enough evidence to be included as part of the main chromosomes INT/X/Y/M so rather than they being deleted from the reference, they are added as an addictional "contig" with their own scaffold name i.e. chrUN and other variations.

As I said, this is as I understand it - I;m sure genomax would have a better explanation.

ADD REPLY • link 6.8 years ago by BioinfGuru ★ 2.1k

0

Entering edit mode

Thank you for the explanation, that makes a lot of sense. I am still trying to figure out the genome build of the files so I can convert them to the new build, instead of deleting the corresponding lines from the files.

ADD REPLY • link 6.8 years ago by varsha619 ▴ 90

score 0 · Answer 1 · 2018-01-18

0

Entering edit mode

6.8 years ago

genecats.ucsc ▴ 580

You need to make sure that all of the chromosome names in your wiggle file are accounted for in the chrom.sizes file.

For instance, if my wiggle file looks like this:

variableStep chrom=chr2CEN
3003560 0

And my chrom.sizes file looks like this:

chr2CEN 242193529

Then I can still run wigToBigWig just fine:

wigToBigWig test1.wig test.chrom.sizes out1.bw

Now whether that wiggle will actually display in the genome browser is a different story, but it seems to me that your wiggle just has incorrect chromosome names and needs to be fixed.

If you have further questions about UCSC data or tools feel free to send your question to one of the below mailing lists:

General questions: genome@soe.ucsc.edu
Questions involving private data: genome-www@soe.ucsc.edu
Questions involving mirror sites: genome-mirror@ose.ucsc.edu

ChrisL from the UCSC Genome Browser

ADD COMMENT • link 6.8 years ago by genecats.ucsc ▴ 580

1

Entering edit mode

Hi Chris, I am not sure if that is the issue in my case. This is how my wig file format looks -

0

track type=wiggle_0

variableStep chrom=chr2L

I removed the 1st line and track line before running wigToBigWig and I used fetchChromSizes to get the chrom.sizes file. Am I missing something here? Thanks for your help!

ADD REPLY • link 6.8 years ago by varsha619 ▴ 90

1

Entering edit mode

Yes what are the other chrom lines like in the wiggle file though? Do all the chromosome names correspond to what is the chrom.sizes file? Try grepping for 'chr2' from your wiggle file and see what shows up. Or something like this to get only the chromosome names:

$ grep chrom userWig2.wig  | cut -d'=' -f2

You can also try something like this find chromosomes in the wiggle that aren't in the chrom.sizes file, because you mentioned it fails on different chromosome names if you remove a particular one:

$ grep -v -Fwf <(cut -f1 dm6.chrom.sizes) <(grep chrom userWig2.wig  | cut -d'=' -f2 | sort -u )

If that doesn't output anything then it would help if you could share a link to the wiggle file you are trying to convert. If the file is private the genome-www address I mentioned in my previous response will only be seen UCSC Genome Browser staff.

ADD REPLY • link 6.8 years ago by genecats.ucsc ▴ 580

0

Entering edit mode

Hello @genecats.ucsc, I was able to grep out the chrom values that did not match the chrom.sizes file using -

grep -vE '(track|chr2CEN|chr3CEN|...|chrU)'

Now when I run wigToBigWig, I get the error - Overlap on chr3. Please remove overlaps and try again.

This makes me worry a little since I am not sure if removing the redundant chr location lines is a good idea. Please advise.

ADD REPLY • link 6.8 years ago by varsha619 ▴ 90

0

Entering edit mode

This error happens when you have wiggle lines like the following:

variableStep chrom=chromName span=5
1000 0.56
1001 0.55

In this case the positions 1000-1004 are supposed to have the value 0.56 but then on the next line positions 1001-1005 are supposed to have value 0.55, and since a single position (in this example coordinates 10001-1004) can't have more than one value, you get an error.

You will have to decide for yourself whether or not it is a good idea to remove these redundant lines or not. Getting into contact with whoever made the file and figuring out how the file was made is probably the best option, especially so you can figure out how the strange chromosome names got into the file as well.