bedGraphtoBigWig- Converting INSDC chromosome names to UCSC names
2
1
Entering edit mode
2.8 years ago

Hello,

I am following this guide (bedGraphToBigWig Tutorial and Report) to try and convert my bedGraph files to bigwig, but keep coming across errors like this:

GL456211.1 is not found in chromosome sizes file

After looking into UCSC's goldenpath, I found the names to convert these types of files such as:

chr1_GL456210_random    0   169725  GL456210.1
chr1_GL456211_random    0   241735  GL456211.1
chr1_GL456212_random    0   153618  GL456212.1
chr1_GL456213_random    0   39340   GL456213.1
chr1_GL456221_random    0   206961  GL456221.1

Where GL456211.1 will be converted to chr1_GL456211_random

However, since the bedGraph file is 1.8GB, I'm wondering if there is a way to scan through the entire file to convert many different names at the same time? I've used sed twice, but it seems like this file might have many iterations that need to be converted.

Thank you!

Cheers, Connor

UCSC bedgraphtobigwig ChIPSeq • 2.1k views
ADD COMMENT
1
Entering edit mode
2.8 years ago
kashiff007 ★ 1.9k

keep it simple and make your 'genome size' file as following:

Chr1    30427671
Chr2    19698289
Chr3    23459830
Chr4    18585056
Chr5    26975502

Make sure your bedgraph file contains chromosome name exactly the same as in 'genome size' file (also case sensitive).

To convert your existing genome size to simple one use:

awk '{print$4"\t"$3}' your_genome_size.txt > simple_genome_size.txt
ADD COMMENT
0
Entering edit mode

Thank you for the reply! This makes sense to me, but my main issue is that within my bedgraph file there are lines that contain other chromosome names such as GL456211.1, and I'd need to either convert or get rid of those.

So even if I quickly converted the genome_size.txt file, there would still be mislabeled coordinates in the bedgraph file.

ADD REPLY
1
Entering edit mode

Why dont you convert your mislabeled chr name with your desired one.

sed -e -i 's/old_chr_name/new_chr_name/g' your_genome_size.txt > simple_genome_size.txt
ADD REPLY
0
Entering edit mode

Is there a way to streamline this? I would need to do this to around 20 unique chr names per file, and have around 100 files to do this for. That's the main issue.

ADD REPLY
1
Entering edit mode
2.8 years ago
Luis Nassar ▴ 670

Hello,

We offer a utility called chromToUcsc that should be able to convert these for you. You can download it here:

http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/chromToUcsc

Or here for MacOS:

http://hgdownload.soe.ucsc.edu/admin/exe/macOSX.x86_64/chromToUcsc

You can run it without any arguments for a help message, but in essence it searches most UCSC format files for NCBI or Ensembl chromosome names, and coverts any found to UCSC convention. It does so by using our chromAlias tables.

First you want to download that chromAlias file, for example hg19:

chromToUcsc --get hg19

Then run the command on your file for the conversions:

chromToUcsc -i in.bedGraph -o out.bedGraph -a hg19.chromAlias.tsv

After that you should be able to convert to bigWig without issue.

If you have any follow up questions, our public help desk can always be reached at genome@soe.ucsc.edu. You may also send questions to genome-www@soe.ucsc.edu if they contain sensitive data. For any Genome Browser questions on Biostars, the UCSC tag is the best way to ensure visibility by the team.

ADD COMMENT

Login before adding your answer.

Traffic: 2006 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6