How Can I Convert Bam/Sam To Wiggle
7
19
Entering edit mode
14.2 years ago
Stew ★ 1.4k

I would like to convert BAM files into wiggle files for viewing in IGB and UCSC. I am aware that I can open BAM files directly in IGB and IGV, but I would really like to make smaller wiggle files for quick viewing of the data, additionally lots of the people I do analysis for prefer to upload wiggles into UCSC, so even if this isn't the best method I need to do it.

I am currently using this one liner to generate a wiggle file, with a value roughly every 10bp and with a minimum depth of 3.

samtools pileup fileName.bam | \
  perl -ne '
    BEGIN { print "track type=wiggle_0 name=fileName description=fileName\n" };
    ($c, $start, undef, $depth) = split;
    if ($c ne $lastC) { print "variableStep chrom=$c\n"; };
    $lastC=$c;
    next unless $. % 10 ==0;
    print "$start\t$depth\n" unless $depth<3;' > fileName.wig

This is based on a solution I found here. But adapted to give smaller files and a header track line. It works, and generates an 11Mb wig file from a 728M bam file, but it takes it's time about it.

Does anyone have a more elegant and perhaps faster solution?

wiggle perl samtools • 53k views
ADD COMMENT
11
Entering edit mode
14.2 years ago

Here's a Python script which converts BAM files to wiggle, then BigWig format.

This uses pysam to get a pileup via samtools -- essentially the same approach as your one liner -- so the speed will be comparable. The script allows specification of specific chromosome regions so is useful for examining a region in depth without having to skip over any bases.

We upload BigWig files to a local Galaxy instance. It can serve out sections of the file on-demand, which gives you the advantage of wiggle display at UCSC without the upload times.

ADD COMMENT
0
Entering edit mode

I am getting the following error If I use the bam_to_wiggle.py script

Traceback (most recent call last):
  File "bam_to_wiggle.py", line 142, in <module>
    main(*args, **kwargs)
  File "bam_to_wiggle.py", line 68, in main
    convert_to_bigwig(wig_file, chr_sizes, config, outfile)
  File "bam_to_wiggle.py", line 115, in convert_to_bigwig
    subprocess.check_call(cl)
  File "/usr/local/lib/python2.7/subprocess.py", line 499, in check_call
    retcode = call(*popenargs, **kwargs)
  File "/usr/local/lib/python2.7/subprocess.py", line 486, in call
    return Popen(*popenargs, **kwargs).wait()
  File "/usr/local/lib/python2.7/subprocess.py", line 672, in __init__
    errread, errwrite)
  File "/usr/local/lib/python2.7/subprocess.py", line 1201, in _execute_child
    raise child_exception
OSError: [Errno 2] No such file or directory

The command I am using is:

python bam_to_wiggle.py /data1/goutham/dDocentGATK/alignment/1_method/BAM/sort.sample_EC1.bam -o sort.sample_EC1

The file /usr/local/lib/python2.7/subprocess.py exists

ADD REPLY
1
Entering edit mode

You need to have the `wigToBigWig` executable available in your PATH. The script header has a bit more documentation: https://github.com/chapmanb/bcbb/blob/796cc2a1179df6f5030212fd5e7f36540351abdb/nextgen/scripts/bam_to_wiggle.py#L23 Hope this helps get it running.

ADD REPLY
0
Entering edit mode

Thanks. I have downloaded the file and added to PATH but forgot to reload the current working terminal. Its working now.

ADD REPLY
9
Entering edit mode
12.6 years ago

For me wiggle representation is generally for the coverage, so I'll just compute the coverage first using bedtools utility as genomeCoverageBed -i file.bed -bg -g my.genome > sample.cov, where file.bed can be obtained using bamToBed utility bamToBed -i file.bam > file.bed and then would use bedGraphToBigWig utility for ucsc to convert the coverage file to bigwig, which is better than wig as you can place on the local server and pass the text file containing links of these bigwigs to collaborators etc. for viewing with UCSC and its safe as only you having rights can change the file content. Another option would be using pyicos convert tool as pyicos convert my_experiment.bam my_experiment.wig -f SAM -F variable_wig -O, this will give you wig directly.

Cheers

ADD COMMENT
0
Entering edit mode

May I ask you a question about bedGraphToBigWig? Can you talk about how to set the parameter of blockSize and itemsPerSlot? Because the default 256 and 1024 can lose lots of reads, which value do you think is the point that can balance the size of coverage file and sensitivity of reads? Thank you!

ADD REPLY
0
Entering edit mode

I am not sure, I always use the defaults, may be you can test using different paramters to confirm which one suits you!!

ADD REPLY
0
Entering edit mode

Is the bigwig file a binary file? How can I look at it directly except by IGV?

ADD REPLY
0
Entering edit mode

Its an indexed binary format. You can put the file on a webserver and import it via link in IGV.

ADD REPLY
0
Entering edit mode

I got error

Traceback (most recent call last):
  File "bam_wiggle.py", line 31, in <module>
  from bcbio.pipeline.config_utils import load_config, get_program
ImportError: No module named bcbio.pipeline.config_utils

Is there anything else required other than pysam and wigToBigWig?

Thank you

ADD REPLY
0
Entering edit mode

I don't know why are you getting this error, could you elaborate more, what command you used?

ADD REPLY
3
Entering edit mode
14.2 years ago

I've been using Findpeaks which is actually a ChIP software, but there's an option (-dist_type 3) to make it only count reads...which is what you would want.

It can also break down the track on a per chromosome basis, or one track with all the chromosomes.

http://sourceforge.net/apps/mediawiki/vancouvershortr/index.php?title=FP4Parameters

ADD COMMENT
0
Entering edit mode

I have added this as the accepted answer as I was looking for something faster than my perl hack, and this seems to fit that bill. It is also part of my chIP seq pipleine already so I do not need to install any additional tools or dependencies.

ADD REPLY
2
Entering edit mode
14.2 years ago
Rm 8.3k

veggie's bam2ucsc might be useful to visualize it in the ucsc browser.

Script can be found here.

ADD COMMENT
2
Entering edit mode
10.8 years ago
cmdcolin ★ 4.0k

In recent versions, the samtools pileup command has been removed

[main] The 'pileup' command has been removed. Please use 'mpileup' instead

So, I found a solution that uses samtools depth instead

samtools depth fileName.bam | perl -ne 'BEGIN{ print "track type= print wiggle_0 \
name=fileName description=fileName\n"}; ($c, $start, $depth) = split; \
if ($c ne $lastC) { print "variableStep chrom=$c span=10\n"; };$lastC=$c; \
next unless $. % 10 ==0;print "$start\t$depth\n" unless $depth<3' > fileName.wig

It isn't necessarily faster or more elegant but it works with latest versions of samtools. Also, there's a similar thread here that talks about using samtools mpileup: convert bam to wig

Edit: If you are converting to bigwig and receive the error "chromosome xxx has nnnn bases but ends at nnnn", then it's probably because the added span parameter goes beyond the end of the genome. Therefore, use the -clip option when doing wigToBigWig

Edit 2016: I would use bedtools coverageBed and then bedGraphToBigWig instead of this method

ADD COMMENT
1
Entering edit mode

Here's something interesting about this solution: samtools depth has an arbitrary cutoff of 8000! Anything above that is just discarded. I have no idea why was it made default, and still kind of cannot believe it. If you want your wig file to report true coverage from the BAM file, you would have to use '-d 0' key.

Edit 2016: I would use bedtools coverageBed and then bedGraphToBigWig instead of this method

Unfortunately, for some tools you need a properly formatted wig file, which bedGraphToBigWig - bigWigToWig combo does not generate. The solution above works quite well though.

ADD REPLY
0
Entering edit mode

This works great! What are the output values though for say ChIP-seq data thats been trimmed, bowtie2 aligned, duplicates removed and indexed? I know that the first column is the position, but do 5,7,12,and 15 represent raw read counts or rpkm values? Thanks! 10030 5 10040 7 10050 12 10060 15

ADD REPLY
0
Entering edit mode
13.0 years ago
wm ▴ 570

Is there anyway to make pileup include duplicates marked reads in the coverage?

ADD COMMENT
0
Entering edit mode
9.6 years ago
catherine ▴ 250

The python script showed above doesn't work well for me. So I would recommend samtools mpileup as well.

ADD COMMENT

Login before adding your answer.

Traffic: 1970 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6