Question

bedGraphToBigWig error - end coordinate bigger than chr

0

Entering edit mode

7.5 years ago

varsha619 ▴ 90

Does anyone know a way to fix the bedGraphToBigWig error - end coordinate bigger than chr? My input is a bedGraph generated using MACS2. This link suggests using bedClip - https://groups.google.com/forum/embed/#!topic/macs-announcement/gXdf115Xy5Q. But I would like to know if there is a command line option to fix it. Thank you for your help.

bedGraphToBigWig macs2 • 4.8k views

ADD COMMENT • link updated 7.5 years ago by Alex Reynolds 36k • written 7.5 years ago by varsha619 ▴ 90

0

Entering edit mode

Fixed it with bedClip, thank you for your help!

ADD REPLY • link 7.4 years ago by varsha619 ▴ 90

score 1 · Answer 1 · 2017-07-03

bedClip is a command line program to do this:

bedClip input.bed http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes output.bed

You can download bedClip from the directory appropriate to your operating system within our directory of utilities.

If you have questions about running bedClip, feel free to send a question to one of our mailing lists:

genome@soe.ucsc.edu for general questions
genome-mirror@soe.ucsc.edu for questions involving mirrors or gbibs
genome-www@soe.ucsc.edu for questions involving private data

ChrisL from the UCSC Genome Browser

score 0 · Answer 2 · 2017-07-03

generate a awk script that will clip your bed records:

mysql --user=genome -N --host=genome-mysql.cse.ucsc.edu -A -D hg19  -e 'select chrom,size from chromInfo '  |\
awk '{printf("($1==\"%s\") {L=%d;B=int($2);E=int($3);B=(B>=L?L:B);E=(E>=L?L:E);printf(\"%s\\t%%d\\t%%d\\n\",B,E);next;}\n",$1,$2,$1);}' > script.awk



$ head  script.awk
($1=="chr1") {L=249250621;B=int($2);E=int($3);B=(B>=L?L:B);E=(E>=L?L:E);printf("chr1\t%d\t%d\n",B,E);next;}
($1=="chr2") {L=243199373;B=int($2);E=int($3);B=(B>=L?L:B);E=(E>=L?L:E);printf("chr2\t%d\t%d\n",B,E);next;}
($1=="chr3") {L=198022430;B=int($2);E=int($3);B=(B>=L?L:B);E=(E>=L?L:E);printf("chr3\t%d\t%d\n",B,E);next;}
($1=="chr4") {L=191154276;B=int($2);E=int($3);B=(B>=L?L:B);E=(E>=L?L:E);printf("chr4\t%d\t%d\n",B,E);next;}
($1=="chr5") {L=180915260;B=int($2);E=int($3);B=(B>=L?L:B);E=(E>=L?L:E);printf("chr5\t%d\t%d\n",B,E);next;}
($1=="chr6") {L=171115067;B=int($2);E=int($3);B=(B>=L?L:B);E=(E>=L?L:E);printf("chr6\t%d\t%d\n",B,E);next;}
($1=="chr7") {L=159138663;B=int($2);E=int($3);B=(B>=L?L:B);E=(E>=L?L:E);printf("chr7\t%d\t%d\n",B,E);next;}
($1=="chrX") {L=155270560;B=int($2);E=int($3);B=(B>=L?L:B);E=(E>=L?L:E);printf("chrX\t%d\t%d\n",B,E);next;}
($1=="chr8") {L=146364022;B=int($2);E=int($3);B=(B>=L?L:B);E=(E>=L?L:E);printf("chr8\t%d\t%d\n",B,E);next;}
($1=="chr9") {L=141213431;B=int($2);E=int($3);B=(B>=L?L:B);E=(E>=L?L:E);printf("chr9\t%d\t%d\n",B,E);next;}

then use this awk script :

awk -f  script.awk input.bed

score 0 · Answer 3 · 2017-07-03

To solve this problem more generically, make a BED file and use that as a mask with BEDOPS bedops --element-of:

$ fetchChromSizes hg38 | awk '{ print $1"\t0\t"$2; }" | sort-bed - > hg38.bed
$ bedops --element-of 1 in.bedGraph hg38.bed > masked.in.bedGraph

Then convert the masked bedGraph file to Wiggle format.

But mainly I'd be concerned about having signal get generated in regions that don't or shouldn't exist. That might point to a potential data problem or code smell, somewhere.