Converting Bam To Bedgraph For Viewing On Ucsc?
7
10
Entering edit mode
11.8 years ago
user ▴ 950

I'm trying to go from a BAM file to a representation viewable in UCSC, ideally bedGraph. I am trying to use Bedtools's genomeCoverage like this:

genomeCoverageBed -ibam accepted_hits.sorted.bam -bg -trackline -split -g ... > mytrack.bedGraph

I'm not sure what the -g argument is supposed to be or how to generate it. The documentation does not explicitly say what it is supposed to be, though it gives an example where it is some sort of BED file. I am simply looking for a bedGraph or other UCSC-friendly compact representation that will allow me to visualize read densities using UCSC from the BAM. EDIT When I generate a bedGraph and put it in UCSC, I get tracks that look like this:

enter image description here

not a histogram. How can I make it a histogram? How can I generate the genome file for use with genomeCoverageBed? Also Is this the best way to get a UCSC viewable file with Bedtools? To clarify, I want to visualize the BAM as a histogram. I'm not sure this is possible with bedGraph? Thank you.

bedtools bam bed ucsc genome-browser • 34k views
ADD COMMENT
1
Entering edit mode

It looks like your track is still in "dense" visualization mode, which encodes density as grayscale. Try changing it to "full", and hopefully you'll see the histogram representation you're looking for.

ADD REPLY
8
Entering edit mode
11.8 years ago
Ryan Dale 5.0k

Using pybedtools, you can get a bedgraph, complete with track line, like this:

import pybedtools
x = pybedtools.BedTool('path/to/bam')
x.genome_coverage(bg=True, genome='hg19', split=True)\
    .saveas('path/to/bedgraph', trackline='track name="test track" visibility="full" type=bedGraph')

In that case however, the y-axis scale is in reads, so it's difficult to compare across libraries of different sizes in the genome browser. Ideally you'd want to scale your files so they have a uniform y-axis, like reads per million mapped reads (RPMMR). The bam_to_bigwig function does this:

from pybedtools.contrib.bigwig import bam_to_bigwig
bam_to_bigwig(bam='path/to/bam', genome='hg19', output='path/to/bigwig')

So this has the advantage of 1) counting how many mapped reads in your BAM file and scaling the bigWig such that it has units of reads per million mapped reads (RPMMR), 2) takes care of format conversions between bedGraph and bigWig (bigWIg is a much better format in the long run, in my opinion) and 3) handles the genome for you (-g argument for BEDTools, as long as you're working with a commonly-used assembly like hg19).

EDIT: while the bedGraph configuration is in the first line of the file, for bigWigs the configuration happens when you write a track line in the custom track upload text box. No configuration is stored in the file. Once you have uploaded your bigWig to a publicly-accessible location, use something like this in the upload text box:

track name='test' type=bigWig visibility=full viewLimits=0:25 autoScale=off bigDataUrl=http://your/url/here

Using the same viewLimits when uploading another track that was configured in the same way will show it on the same y-axis limits. See also the bigWig help and configuration settings for wig files

  • (edit: added type=bedGraph to track line)
  • (edit 2: added info about configuring bigWig)
  • (edit 3: added autoScale=off to trackline)
ADD COMMENT
0
Entering edit mode

this works great but how can you set the same y-axis for all tracks when you use bam_to_bigwig?

ADD REPLY
0
Entering edit mode

When you make the track file, you can set it there which contains the description about the bigwig.

ADD REPLY
0
Entering edit mode

I edited my answer above to address this.

ADD REPLY
0
Entering edit mode

Everything you wrote works for me except autoscaling if i do: track type=bigWig name="track1" bigDataUrl=http://file1.bigWig visibility=full viewLimits=0.0:25.0 track type=bigWig name="track2" bigDataUrl=http://file2.bigWig color=255,0,0 visibility=full viewLimits=0.0:25.0 then it does not give an error but does not respect viewLimits. tried making the lower:upper values integers too.

ADD REPLY
1
Entering edit mode

add autoScale=off inline :)

ADD REPLY
0
Entering edit mode

works thank you!

ADD REPLY
4
Entering edit mode
11.8 years ago

Get a WIG file directly from a sorted BAM file using SAMtools (code snippet from http://www.ecseq.com/NGSsnippets.html)

samtools mpileup -BQ0 run.sorted.bam | perl -pe '($c, $start, undef, $depth) = split;if ($c ne $lastC || $start != $lastStart+1) {print "fixedStep chrom=$c start=$start step=1 span=1\n";}$_ = $depth."\n";($lastC, $lastStart) = ($c, $start);' | gzip -c > run.wig.gz

To get a bedGraph file, you can either create a bigWig file with wigToBigWig and then use bigWigToBedGraph to end up with a bedGrapg file. Or you change the code above (which is much faster). But the probably simplest way is to directly upload the wig or bigWig file to UCSC. :)

ADD COMMENT
0
Entering edit mode

with this code snippet, is it possible to retain strand information?

ADD REPLY
3
Entering edit mode
11.8 years ago

Perhaps try BEDOPS bam2bed and sort-bed to convert your BAM data to UCSC BED?

$ bam2bed < foo.bam | sort-bed - | cut -f1-6 > foo.bed

Then make a custom track from it, using the full, pack or squish display modes. If you zoom out far enough, you can get a rough picture of read density.


Addendum: If you want to count reads within windows, take a look at this usage case which bins BAM reads ("tags") within a sliding 20 bp window. The result is a compressed BED file that contains binned score values that (when extracted from Starch to BED) can also be brought into a UCSC Genome Browser instance. Because it will contain binned scores, this rendition is probably going to be closer to the histogram visualization you are after, than a simple display of individual tags.


Explanation of binning script

I'll walk through the process of using the BEDOPS-based binning script to generate a histogram of binned reads on a UCSC Genome Browser.

(1) Get the hg19 version of the chromInfo table from the UCSC Genome Browser

Visit the UCSC Table Browser. With the All Tables group selected, for example, select the hg19 database and the chromInfo table. Output all fields to a text file. (This step can also be performed with mysql commands, if this needs automating.)

(2) Edit this text file (e.g. run awk on it to put in the start coordinate) and pipe it to sort-bed to turn it into a sorted BED file. Here's a ready-to-use example for hg19 that I just made: https://dl.dropbox.com/u/31495717/chrList.bed Again, this can be automated.

(3) Bin the read data. For example, the following makes a 75 bp-windowed read count spaced in 20 bp bins, written to a Starch-formatted archive called result.starch:

$ binReads.sh myReads.bam $PWD/result.starch 75 20 chrList.bed

The Starch file is just a very highly-compressed BED file. We made this format so that we could make the best use of our lab's storage capabilities. You can edit the binReads.sh script to remove the starch - call if you don't want the BED data to be compressed, which lets you skip step 4. Otherwise, we go on to the next step:

(4) Extract the binned result to a BED file:

$ unstarch result.starch > result.bedGraph

(5) Edit the result.bedGraph file to add the track type. All you need to do is insert track type=bedGraph on its own line at the top of the file, although you can add various parameters to customize the display and look, etc.

(6) Place the modified result.bedGraph on a public-facing web site and copy the URL — or otherwise load a local copy — into a UCSC Genome Browser instance via the Custom Track page (Genomes > manage custom track). The Genome Browser will recognize it as a bedGraph file and render it accordingly.

Browser snapshot

That's all there is to it. All these steps can be automated, once you have the process down.

ADD COMMENT
0
Entering edit mode

Thanks but Can UCSC accept regular bed files from track URL definitions, i.e. with bigDataUrl=? I thought it does not

ADD REPLY
0
Entering edit mode

Not with bigDataUrl=, but if you could just put the URL in directly into the custom track URL field. That seemed to work for me on our local genome browser instance, at least.

ADD REPLY
0
Entering edit mode

I want to use a track file with track so that I can put all my samples in one file pointing to all the relevant samples' URLs and see it. I don't want to add one BED at a time, so I prefer to use bedGraph. When I add a bedGraph track to UCSC, it appears as vertical bars - not a histogram - see my edit above. Also, the options for the track are dense and full, there's no pack.

ADD REPLY
0
Entering edit mode

But Alex, thats different from a bedGraph, which is essentially a coverage file as compared to the read locations, a bed file !!!!

ADD REPLY
0
Entering edit mode

Okay, please see the addendum I added. I think this addresses your concern?

ADD REPLY
1
Entering edit mode

Thats good Alex, is there already a utility present in BedOps, or can the code be modified to calculate the density/coverage for a specific region of interest provided as file such as genes. Consider an example of dividing 100 genes of different sizes into 20 bins and then calculating the protein coverage (binding intensity binned average) for each gene in that gene specific bin and averaging all bins per gene later on, to have an averaged profile. Then, one could plot the average intensity of protein binding in a single plot, like here which is also called as a composite profile. Plotting Read Density/ Distribution/ Composite Profiles [Chip-Seq].

Cheers

ADD REPLY
2
Entering edit mode

Sure. One might use the binned output from this usage case as a map or signal file within the bedmap application in BEDOPS, using regions-of-interest (e.g., "genes") as the reference file. That way, you can use numerical operators like --mean, --max, etc. to quantify properties of the bins/signal over regions-of-interest. This was done repeatedly for our portion of the ENCODE project, for example, where we compared read (DNase-seq, ChIP-seq, etc.) densities in sites, say, proximal and distal to promoters, etc. One can apply this bedmap approach to any number of categories of BED-formatted region and signal data.

ADD REPLY
0
Entering edit mode

Sounds good Alex, I'll try it out !!

ADD REPLY
0
Entering edit mode

that looks very promising thanks! but is there an easy way to get a bedgraph at the end so that it can be put into a file defining multiple tracks? Or can the Starched format be put in a UCSC tracks file in track type= lines?

ADD REPLY
0
Entering edit mode

Let's assume that you used the binning script and you set the output to a file called result.starch.

All you have to do is this:

  1. Run unstarch result.starch > result.bed to extract the result to a BED file.
  2. Edit the result.bed file with a text editor, adding type=bedGraph on its own line at the top of the result BED file.
  3. Load it into a UCSC Genome Browser instance as a custom track.

That's all there is to it. I tried it now with a small BAM file and was able to get a read histogram out of this. If you want to see that example BED file, load this URL into your genome browser as a custom track: https://dl.dropbox.com/u/31495717/density.bed and zoom out a couple times.

Once you do this once or twice, it's easy to automate.

ADD REPLY
0
Entering edit mode

Hey Alex, I got acquanited to BedOps suite though few things are off the track. I have posted a new question here, its easy to answer and track. Binning Over Genes And Calculating The Coverage [Bedops/Bedmap]

ADD REPLY
0
Entering edit mode
ADD REPLY
3
Entering edit mode
11.8 years ago

In the help menu, its the first point, though it might not be clear.

Notes: (1) The genome file should tab delimited and structured as follows:

<chromName><TAB><chromSize>
For example, Human (hg19):
chr1    249250621
chr2    243199373
...
chr18_gl000207_random    4262

Its the tab delimited file of length of each chromosome of your organism of interest. Use fetchChromSizes to pull the sizes.

For mm9, this is how it looks

chr1    197195432
chr2    181748087
chrX    166650296
chr3    159599783
chr4    155630120
chr5    152537259
chr7    152524553
chr6    149517037
chr8    131738871
chr10    129993255
chr14    125194864
chr9    124076172
chr11    121843856
chr12    121257530
chr13    120284312
chr15    103494974
chr16    98319150
chr17    95272651
chr18    90772031
chr19    61342430
chrY_random    58682461
chrY    15902555
chrUn_random    5900358
chrX_random    1785075
chr1_random    1231697
chr8_random    849593
chr17_random    628739
chr9_random    449403
chr13_random    400311
chr7_random    362490
chr5_random    357350
chr4_random    160594
chr3_random    41899
chrM    16299
chr16_random    3994

Once saved used it like

genomeCoverageBed -ibam accepted_hits.sorted.bam -bg -g ~/src/useFul/ucsc/genomeIndex/mm9.chrom > file.bedGraph

Then, try reading this on how to visualize it efficiently. Visualizing Chip-Seq Data Using Ucsc [Bigwig]

Cheers

ADD COMMENT
0
Entering edit mode

I read your tutorial on visualization but I am still confused. Remember that I am starting from a BAM file. It seems unnecessary to go from bam to BedGraph to yet another format, Wig/bigWig. There has to be a better way. I just want to plot histogram like distributions (similar to the way phastCons or the Chip-Seq tracks show) using a BAM, and a bedGraph should have all the information to do this it seems, so I don't want to have yet another file format.

ADD REPLY
0
Entering edit mode

Use the bedgraph then and replace the bigDataUrl with url!!

ADD REPLY
0
Entering edit mode

This is exactly what I've done. It does not work - the track appears as a bunch of vertical bars, not as a histogram like the phastCons score or other ChipSeq tracks that are built into UCSC. I want a histogram, not vertical bars or pileups of reads

ADD REPLY
0
Entering edit mode
5.3 years ago

Use deepTools bamCoverage:

bamCoverage --bam Your.bam -o Your.bw --normalizeUsing None

It can also generate BedGraph files, if wanted, or you can choose various options to normalize across different samples and so on.

ADD COMMENT
0
Entering edit mode
5.3 years ago
ATpoint 85k

For bedGraph, the currently fastest option would be https://github.com/brentp/mosdepth.

ADD COMMENT
0
Entering edit mode
3.1 years ago

Actually, there is no need to convert your file at all. Copy/paste the BAM file URL into the custom track box, click "submit", then click on the track (or right-click > configure), and on the configuration page there is a checkbox at the end of the other options: "Show as density graph". It's slower, because it has to download all the reads in the current range, but works fine.

ADD COMMENT

Login before adding your answer.

Traffic: 1692 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6