Output bedGraph from genomeCoverageBed is missing "chr"
1
1
Entering edit mode
6.2 years ago
m98 ▴ 420

I am trying to make a UCSC Genome Browser hub using bigWig files. My starting file format are BAM files. Briefly, my script does as follows:

  • Scale the BAM file according to a specificic pre-determined scaling factor (using samtools).
  • From scaled BAM, make a bedgraph representing coverage (using genomeCoverageBed)
  • From beGgraph file, make bigwig file (using bedGraphToBigwig)

My script looks as follows:

samtools view -s 0.75 -b file.bam > file.scaled.bam;
genomeCoverageBed -bg -split -ibam file.scaled.bam; -g hg38.chrom.sizes > file.bedgraph;
sort -k1,1 -k2,2n file.bedgraph > file.sorted.bedgraph;
bedGraphToBigWig file.sorted.bedgraph hg38.chrom.sizes file.bw

I am having an issue displaying my bigwig file and noticed this is most likely due to my bedGraph file not having a "chr" in front of each chromosome name (so I have "1" instead of "chr1"). I am confused as how this happened.

Initially, when first testing genomeCoverageBed, I kept getting the following warning:

*****WARNING: Genome (-g) files are ignored when BAM input is provided.

I therefore removed the -g argument in the genomeCoverageBed. However, I just noticed (I should have noticed before) that the "chr" is missing in my bedgraph file. Presumably, this is what is causing my bigwig file to then be ill-formatted.

I am confused, am I missing something? Why is genomeCoverageBed not outputting a bedGraph file with "chr" in it, like it should be doing?

Thanks

RNA-Seq ucsc genomeCoverageBed bedGraph bigWig • 3.7k views
ADD COMMENT
2
Entering edit mode

Output of samtools view -H file.bam ? Bet your reference genome simply was like 1,2,3 and not chr1,chr2,chr3.

ADD REPLY
1
Entering edit mode
6.2 years ago

Are they there in your SAM file? It's possible whatever reference it was aligned to just didn't have them. You can easily add them back via awk:

awk '{print "chr" $0}' file.bedgraph > newfile.bedgraph

Also your second line has an erroneous semicolon.

ADD COMMENT
0
Entering edit mode

Yeah it seems to be the case. The BAM headers have the following tags:

@SQ SN:1     
@SQ SN:2     
etc

I tried doing this awk command you suggested but can't get it to work in the context of a shell script at the moment. Not sure how to get awk to recognize $0 as the whole line.

ADD REPLY
0
Entering edit mode

What is the error? The command should work fine on the bedGraph.

ADD REPLY
0
Entering edit mode

I sorted it out I think. I was using this command within a shell script so instead of $0 being the whole line of the bedgraph, it was outputting the name of the script. I used this instead:

 awk -v LINE="$0" '{print "chr" $LINE}' $BEDGRAPH
ADD REPLY
0
Entering edit mode

Don't make simple things complicated ;-D

ADD REPLY
0
Entering edit mode

What do you mean? The awk command suggested by jared.andrews07 will not work within a script. I'm guessing this is maybe a simpler way than my awk command? I'm not very comfortable changing a BedGraph like this to be honest, I wish there was a way out outputting a bedGraph with what I want directly.

ADD REPLY
1
Entering edit mode

There is, add the 'chr' to the headers of your reference genome FASTA file if desired.

ADD REPLY

Login before adding your answer.

Traffic: 1975 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6