Question

Tools To Calculate Average Coverage For A Bam File?

102

Entering edit mode

13.8 years ago

Biomed 5.0k

I would like to get the average coverage of all the captured bases in a bam file. What would be the best way to do this? What I am looking is a simple one number like 40X. Given that there may be millions of bases sequenced in a next gen study I would like to get the overall average coverage for all these bases as a single number.

Thanks

coverage bam sequencing • 252k views

ADD COMMENT • link updated 3 months ago by GenoMax 147k • written 13.8 years ago by Biomed 5.0k

5

Entering edit mode

People seem to be equally devided between GATK, Samtools and BedTools.

ADD REPLY • link 13.8 years ago by Biomed 5.0k

1

Entering edit mode

The Depth of Coverage is depreciated in GATK4. It's only available in GATK3. See here

ADD REPLY • link 5.0 years ago by rohitsatyam102 ▴ 920

0

Entering edit mode

DepthOfCoverage is available as beta https://gatk.broadinstitute.org/hc/en-us/articles/360041851491-DepthOfCoverage-BETA-#--intervals

This is what I ended up using, after trying bedtools

ADD REPLY • link 2.3 years ago by SplitInf ▴ 20

3

Entering edit mode

Most answers seems to be very old and hence would like to have updated suggestions.

I have 6 bam files and I have used samtools depth to calculate chromosome wise depth for all chromosomes and then plotted in R. While looking at the plots at 2-3 places, depth shows upto 200-3500 and hence I would like to calculate average read depth of each chromosome from each bam file. For e.g.: 1) average depth of chr1 from bam1, average depth of chr1 from bam2, ...until bam6. 2) average depth of chr1 from all 6 bam together.

Kindly guide.

Thanks.

ADD REPLY • link 7.4 years ago by bioinfo8 ▴ 230

1

Entering edit mode

Try the solutions if they don't work then you need to reframe the question with your commands and write a new query in a new post. This is advised. And yes paired end should work.

ADD REPLY • link 7.4 years ago by ivivek_ngs ★ 5.2k

0

Entering edit mode

in addition, if you have already plotted depths it means that you have got a temporal R data.frame (or similar type) in there containing chromosome, position and depth information. splitting such data.frame by chromosome and calculating average values should be straight-forward. I would just suggest to take into account empty values not reported by samtools depth dividing each chromosome depths' sum by the size of the chromosome, and not by the number of depth values.

ADD REPLY • link 7.4 years ago by Jorge Amigo 14k

1

Entering edit mode

Why is it wrong to take the number of sequenced bases and divide by the genome size?

ADD REPLY • link 12.1 years ago by Ketil 4.1k

5

Entering edit mode

because you cannot align all the bases on the genome, some reads are optical duplicates, etc...

ADD REPLY • link 12.1 years ago by Pierre Lindenbaum 164k

1

Entering edit mode

Okay, contamination is a fair point, but even if some reads fail to match, that could still be a sign of genetic divergence, and should still count towards genome coverage. Do any of the other suggestions deal with duplicates implicitly?

ADD REPLY • link 12.1 years ago by Ketil 4.1k

0

Entering edit mode

Qualimap is very fast and excellent tool. thanks

ADD REPLY • link 11.3 years ago by anuragkuleuven • 0

0

Entering edit mode

Hi,

I have multiple paired-end bam files from RNA-Seq data, already aligned and computed depth with samtools depth > bam.depth. Would any one please guide for any straight forward/latest way to compute coverage plots, avg. coverage plots and other metrics from depth files?

Picard CollectRNAseqmetrics, RSeQC and QoRTs have not been helpful. :-(

Thanks.

ADD REPLY • link 7.1 years ago by bioinfo8 ▴ 230

0

Entering edit mode

Have you run any of the other tools mentioned in this thread (e.g. Qualimap, pileup from BBMap)?

ADD REPLY • link 7.1 years ago by GenoMax 147k

1

Entering edit mode

@genomax My bam files are very big and Qualimap has not been helpful, so as bbmap which is Java-based (memory issues like Picard and QoRTs) and hence I used samtools depth. My depth files are also in GB.

ADD REPLY • link 7.1 years ago by bioinfo8 ▴ 230

0

Entering edit mode

Both programs will happily accept larger allocations of RAM and can open very large files. If you don't have access to a machine with more RAM then that is a different problem.

ADD REPLY • link 7.1 years ago by GenoMax 147k

Ram · Answer 1 · 2013-04-01

58

Entering edit mode

11.6 years ago

russdurrett ▴ 540

if you have samtools and awk, you can keep it simple with this:

samtools depth  *bamfile*  |  awk '{sum+=$3} END { print "Average = ",sum/NR}'

and with stdev:

samtools depth *bamfile*  |  awk '{sum+=$3; sumsq+=$3*$3} END { print "Average = ",sum/NR; print "Stdev = ",sqrt(sumsq/NR - (sum/NR)**2)}'

ADD COMMENT • link updated 6.3 years ago by Ram 44k • written 11.6 years ago by russdurrett ▴ 540

6

Entering edit mode

Not sure this works. It calculates coverage over regions but does not account for regions that were not covered with a read. Then it divides all the sum with the number of rows that were reported... but the positions without a read hit are not included. Basically - it gives a too high result.

ADD REPLY • link 10.8 years ago by nbartonicek ▴ 160

16

Entering edit mode

samtools depth -a

Using the '-a' flag outputs all positions (including zero depth), which should solve this problem.

ADD REPLY • link 7.9 years ago by Ian 6.1k

4

Entering edit mode

[FYI] Only available from v1.3, I couldn't find that parameter in my version

https://github.com/samtools/samtools/issues/374

ADD REPLY • link 7.9 years ago by Sukhi Singh 11k

6

Entering edit mode

That's true. To be fair, the denominator needs to be the size of genome where the BAM file was generated against, rather than the number of bases covered at least once (that's what NR is in this context). To get the genome size from the BAM file:

samtools view -H *bamfile* | grep -P '^@SQ' | cut -f 3 -d ':' | awk '{sum+=$1} END {print sum}'

ADD REPLY • link updated 6.3 years ago by Ram 44k • written 10.5 years ago by sung ▴ 60

0

Entering edit mode

Please correct me if I am wrong, but none of these answers seem to work with nanopore data.

ADD REPLY • link 9.0 years ago by biology1234 • 0

1

Entering edit mode

why would it not?

ADD REPLY • link 9.0 years ago by Michael 55k

0

Entering edit mode

Hi. So when I try to do it in your method, I'm getting a result of 1.6-1.8 from my tumor BAM files. So the average coverage is 1.6X? It's not supposed to be this low.

ADD REPLY • link 2.3 years ago by Hasib ▴ 20

1

Entering edit mode

In the first command, its $3 for me

ADD REPLY • link 11.6 years ago by Sukhi Singh 11k

1

Entering edit mode

I agree, and edited Russ' answer accordingly to avoid confusing anyone else.

ADD REPLY • link 11.6 years ago by Peter 6.0k

0

Entering edit mode

To anyone who uses this: Please note that these commands ignore positions with 0 coverage. If you are working with a genome that has limited coverage, the average and stdev coverages given here may be incorrect. If so, use the "-a" option with samtools depth to output all positions, e.g.:

samtools depth -a  *bamfile* | *the rest of the command*

ADD REPLY • link 3 months ago by Shruteek • 0

1

Entering edit mode

samtools depth is the most straightforward way to do this

ADD REPLY • link 11.6 years ago by Lee Katz ★ 3.2k

Ram · Answer 2 · 2011-01-28

42

Entering edit mode

13.8 years ago

Casey Bergman 18k

Try the genomeCoverageBed tool in the BEDtools package, which takes a BAM or BED file as input and:

computes a histogram of feature coverage (e.g., aligned sequences) for a given genome. Optionally, by using the –d option, it will report the depth of coverage at each base on each chromosome in the genome ﬁle (-g).

The way we run this is to first make a tab-delimited genome.txt file with "contig_name contig_size" information for all contigs in your genome and then run genomeCoverageBed on a sorted BAM file:

$ genomeCoverageBed -ibam sortedBamFile.bam -g genome.txt > coverage.txt

This will give you a summary histogram of coverage across each contig and for the entire genome, from which you can obtain the modal or mean value (the single fold-coverage value you are looking for).

ADD COMMENT • link 13.8 years ago by Casey Bergman 18k

0

Entering edit mode

Direct link: http://bedtools.readthedocs.org/en/latest/content/tools/genomecov.html

ADD REPLY • link updated 2.2 years ago by Ram 44k • written 9.2 years ago by scchess ▴ 640

0

Entering edit mode

With your command, I have different value for the same contig. For instance:

jcf7180001864843    2   4   137928  2.90006e-05
jcf7180001864843    3   28  137928  0.000203004
jcf7180001864843    4   10  137928  7.25016e-05

What does it mean? How can I have one value for each contig?

ADD REPLY • link updated 2.2 years ago by Ram 44k • written 8.4 years ago by Picasa ▴ 650

3

Entering edit mode

That's because it reports a histogram. From the documentation: The values in the output are:

chromosome (or entire genome) [eg jcf7180001864843]
depth of coverage from features in input file [eg. 2]
number of bases on chromosome (or genome) with depth equal to column 2. [eg. 4]
size of chromosome (or entire genome) in base pairs [eg. 137928]
fraction of bases on chromosome (or entire genome) with depth equal to column 2. [2.90006e-05]

[all values taken from your first row]

If you want to get the average coverage: add up the product of bases per coverage [2* 4+3* 28+4* 10+...] and divide by the total number of bases [137928].

ADD REPLY • link updated 2.2 years ago by Ram 44k • written 8.3 years ago by ronald.jaepel ▴ 30

0

Entering edit mode

Hello Ronald,

I have a question about what you are saying. So for the average coverage you are stating, that would be per scafffold correct? I attempted to open a topic recently and was referred to this one :)

Cheers

ADD REPLY • link 6.3 years ago by ggman ▴ 90

0

Entering edit mode

The example given by @ronald.jaepel demonstrates calculating mean coverage for one scaffold (jcf7180001864843). You can perform this calculation on a per-scaffold basis by, e.g., subsetting on the scaffolds in the tabular text file. You can also calculate averages (or other summary statistics) for multiple or all scaffolds.

ADD REPLY • link 15 months ago by kalavattam ▴ 280

1

Entering edit mode

Is your BAM file sorted (as noted on the help doc page linked by @student-t above)?

ADD REPLY • link 8.4 years ago by GenoMax 147k

0

Entering edit mode

what is content of genome.txt

ADD REPLY • link 4.7 years ago by svp ▴ 680

0

Entering edit mode

From Casey's original reply above

...tab-delimited genome.txt file with "contig_name contig_size" information for all contigs in your genome...

ADD REPLY • link 3.0 years ago by Yee Mey • 0

1

Entering edit mode

To add to this for future readers, those genome.txt files are so called "chromosome size files", formatted as "contig name <TAB> length"; e.g. like this:

chr1    248956422
chr2    242193529
chr3    198295559

Those files can be downloaded for example with fetchChromSizes from UCSC, found here for linux (navigate back for Mac version): https://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/

ADD REPLY • link 2.7 years ago by rosikiewicz ▴ 10

0

Entering edit mode

... as the link points out, the chromosome size file (ie genome.txt) can be retrieve directly through this URL:

  http://hgdownload.soe.ucsc.edu/goldenPath/<db>/bigZips/<db>.chrom.sizes

where: <db> - name of UCSC database, e.g.: hg38, hg18, mm9, etc ...

However! If you supply bam as input, the -g option is ignored so there is no need for this file in the first place.

ADD REPLY • link 2.3 years ago by SplitInf ▴ 20

Ram · Answer 3 · 2011-01-28

26

Entering edit mode

13.8 years ago

Michael 55k

And the mandatory R solution. The Rsamtools package can be used to read in the BAM file.

Here is a code example:

library(Rsamtools)
bamcoverage <- function (bamfile) {
  # read in the bam file
  bam <- scanBam(bamfile)[[1]] # the result comes in nested lists
  # filter reads without match position
  ind <- ! is.na(bam$pos)
  ## remove non-matches, they are not relevant to us
  bam <- lapply(bam, function(x) x[ind])
  ranges <- IRanges(start=bam$pos, width=bam$qwidth, names=make.names(bam$qname, unique=TRUE))
  ## names of the bam data frame:
  ## "qname"  "flag"   "rname"  "strand" "pos"    "qwidth"
  ## "mapq"   "cigar"  "mrnm"   "mpos"   "isize"  "seq"    "qual"
  ## construc: genomic ranges object containing all reads
  ranges <- GRanges(seqnames=Rle(bam$rname), ranges=ranges, strand=Rle(bam$strand), flag=bam$flag, readid=bam$rname )
  ## returns a coverage for each reference sequence (aka. chromosome) in the bam file
  return (mean(coverage(ranges)))      
}

And the output:

> bamcoverage("~/test.bam")
gi|225184640|emb|AL009126.3| 
                    15.35720

The good thing about this is you can do much more with the genomic ranges object than computing the coverage, the downside is, that the code is not that efficient.

ADD COMMENT • link updated 6.3 years ago by Ram 44k • written 13.8 years ago by Michael 55k

0

Entering edit mode

@Michael : What maximum size of input bam file that can be handled by this solution ? It slurps the whole bamfile in memory ??

ADD REPLY • link 13.8 years ago by toni ★ 2.2k

2

Entering edit mode

Sorry for the late reply. Yes it does, and it is not very memory efficient. For a 850MB BAM file with approximately 17 million alignments the R process uses up to 15GB of memory during processing.

ADD REPLY • link 10.7 years ago by Michael 55k

6

Entering edit mode

A reply four years after.....

ADD REPLY • link 9.2 years ago by scchess ▴ 640

0

Entering edit mode

@Micheal: Will it work for paired end BAM files?

ADD REPLY • link 7.8 years ago by bioinfo8 ▴ 230

0

Entering edit mode

Yes, it will work.

ADD REPLY • link 15 months ago by kalavattam ▴ 280

0

Entering edit mode

Hi! I used this R code with Rsamtools with my bam file and the result was:

6503.159

What could have gone wrong?

ADD REPLY • link 5.5 years ago by macielrodriguez2 ▴ 50

0

Entering edit mode

You might have incredibly high coverage :) I would try a different tool, now 8 years later. Honestly, this code was only intended to show it can be done in R too. But I am not even sure it works correctly.

ADD REPLY • link 5.5 years ago by Michael 55k

0

Entering edit mode

Thanks!!! I'll look for another tools :) genomecoverage from bedtools looks good too. I'll try that one. Thanks for the advice!

ADD REPLY • link 5.5 years ago by macielrodriguez2 ▴ 50

Ram · Answer 4 · 2011-01-27

16

Entering edit mode

13.8 years ago

Pierre Lindenbaum 164k

did you try the GATK "DepthOfCoverage" ? http://www.broadinstitute.org/gsa/wiki/index.php/Depth_of_Coverage_v3.0

or you can run 'samtools pileup' and calculate the mean value for the coverage column.

ADD COMMENT • link 13.8 years ago by Pierre Lindenbaum 164k

2

Entering edit mode

I have used the samtools method (which excludes bases with 0 coverage) and the BEDTools version (which includes bases with 0 coverage). The former will give a higher mean coverage. This will have an impact on genome where there are large areas that are not sequenced.

ADD REPLY • link 13.7 years ago by Ian 6.1k

1

Entering edit mode

Sorry I am new to bioinformatics. I have used 'samtools mpileup', but how can I get mean value for the depth column. Thanks.

ADD REPLY • link 8.1 years ago by jjaimi921129 ▴ 10

1

Entering edit mode

I tried GATK DepthOfCoverage last July with no luck - complicated java.lang.RuntimeExceptions. I did find success with BEDtools however (see answer below/above)

ADD REPLY • link 13.8 years ago by Casey Bergman 18k

1

Entering edit mode

I use this method to calculate % of sites with at least a coverage of 20x in a list of sites from a bed. It provides that number directly as output.

ADD REPLY • link 13.5 years ago by Wjeck ▴ 490

0

Entering edit mode

That's the gte_20 value from the sample_cumulative_coverage_proportions file, correct?

ADD REPLY • link updated 2.2 years ago by Ram 44k • written 9.1 years ago by stianlagstad ★ 1.1k

0

Entering edit mode

I was interested in this issue too, but I wanted to know the coverage for certain genes. after looking to this tool's wiki I've realized that I can feed it with a list of genes intervals (which of course I'll have to retrieve from UCSC or Ensembl) and I'll surely sort my problem out! look what an unexpected great piece of information I've just found! once again thanks a lot, Pierre.

ADD REPLY • link 13.8 years ago by Jorge Amigo 14k

0

Entering edit mode

Thanks Pierre. I haven't tried Depthof Coverage but I tried Samtools. Samtools would give you a base by base coverage, and I use it successfully like Jorge explained but I 'll look at GATK for sure.

ADD REPLY • link 13.8 years ago by Biomed 5.0k

Ram · Answer 5 · 2011-01-28

14

Entering edit mode

13.8 years ago

Michael.James.Clark ▴ 560

Here's a very simple perl script I wrote that does exactly this that you're free to use:

($num,$den)=(0,0);
while ($cov=<STDIN>) {
    $num=$num+$cov;
    $den++;
}
$cov=$num/$den;
print "Mean Coverage = $cov\n";

Just pipe samtools pileup to it like so:

/path/to/samtools pileup in.bam | awk '{print $4}' | perl mean_coverage.pl

It will also work easily for genomic regions. Just index the BAM file and use samtools view like so:

/path/to/samtools view -b in.bam <genomic region> | /path/to/samtools pileup - | awk '{print $4}' | perl mean_coverage.pl

Hope someone finds this useful.

ADD COMMENT • link updated 6.3 years ago by Ram 44k • written 13.8 years ago by Michael.James.Clark ▴ 560

4

Entering edit mode

One more comment on this pipeline... In my data, where I have relatively low coverage there are quite a few genomic positions not covered at all. These will not show up in the pileup command, so I guess that the mean coverage calculated this way will be the mean coverage of the covered positions. Am I right? Is there a way (in the "samtools pileup" command) to output all positions even if they are not covered by any read?

ADD REPLY • link 13.5 years ago by Panos ★ 1.8k

0

Entering edit mode

looks promising. Thanks

ADD REPLY • link 13.8 years ago by Biomed 5.0k

0

Entering edit mode

I am certainly going to try this. Thanks for the answer.

ADD REPLY • link 13.8 years ago by Biomed 5.0k

0

Entering edit mode

Nice answer! Tried it and it's working! The only thing I'd like to mention is that I think that the BAM file needs to be sorted.

ADD REPLY • link 13.5 years ago by Panos ★ 1.8k

0

Entering edit mode

That's correct, panos. This simple solution is only if every base is covered at least once because samtools pileup doesn't report bases with zero coverage. Thanks for pointing that out!

ADD REPLY • link 13.4 years ago by Michael.James.Clark ▴ 560

0

Entering edit mode

can you give me a detailed description about $cov=[HTML] you used ? Thanks!

ADD REPLY • link 11.6 years ago by siyu ▴ 150

0

Entering edit mode

Very useful thanks! Used it on my own dataset (sorry to bump an old thread but still very handy)

ADD REPLY • link 10.8 years ago by Andrew Watson ▴ 40

0

Entering edit mode

I made a BioStars account to upvote your contribution. I've been trying to find something this simple and straightforward for some time now.. Thanks!

ADD REPLY • link 8.0 years ago by pvdam3 • 0

3

Entering edit mode

this is a very old answer. you would find then useful a simpler samtools depth based solution:

samtools depth [-b regions.bed] input.bam | awk '{c++;s+=$3}END{print s/c}'

ADD REPLY • link 8.0 years ago by Jorge Amigo 14k

0

Entering edit mode

@Jorge Amigo

samtools depth bamfile | awk '{sum+=$3} END { print "Average = ",sum/NR}'

by @russdurrett also gives same output for my paired-end bam file from RNA-Seq:-)

ADD REPLY • link 7.1 years ago by bioinfo8 ▴ 230

1

Entering edit mode

that's because NR is an awk variable that stores the number of rows, therefore the c variable on my code is not wrong, but it is superfluous.

ADD REPLY • link 7.1 years ago by Jorge Amigo 14k

0

Entering edit mode

I ran:

samtools depth zm_cp.bam | awk '{sum+=$3} END { print "Average = ",sum/NR}'

and the result was:

Average =  6495.19

:O is this okay?

ADD REPLY • link 5.5 years ago by macielrodriguez2 ▴ 50

0

Entering edit mode

Does this also works with mpileup? I ran:

samtools mpileup zm_cp.bam | awk '{print $4}' | perl mean_coverage.pl

and this was the result:

[mpileup] 1 samples in 1 input files
<mpileup> Set max per-file depth to 8000

Mean Coverage = 5107.87408859271

:O is this okay?

ADD REPLY • link 5.5 years ago by macielrodriguez2 ▴ 50

0

Entering edit mode

Hello, I know this post is quite old, but I am trying to use this as it seems to work wonderfully for everyone else. I am new to bioinformatics and genomics, so please be patient with me. I have run the command 'samtools mpileup A186944_mtgenome.srt.dup.bam | awk '{print $4}' | perl coverage.pl' where my bam file is a sorted bam with duplicates removed and I get the following error:

Illegal division by zero at coverage.pl line 6.

Can someone please help me figure out the error? I think it's maybe because there are 0's where unmapped reads occur? How do I obtain coverage for just the mapped reads, and eliminate the 0 denominator?

ADD REPLY • link 5.1 years ago by dgrace999 • 0

score 11 · Answer 6 · 2018-08-09

11

Entering edit mode

6.3 years ago

WouterDeCoster 47k

A modern, very fast, solution for this would be mosdepth.

ADD COMMENT • link 6.3 years ago by WouterDeCoster 47k

0

Entering edit mode

how do i install other than bioconda?

ADD REPLY • link 5.6 years ago by 1769mkc ★ 1.2k

1

Entering edit mode

You can install the executable from their github release page: https://github.com/brentp/mosdepth/releases

Latest version at time of commenting (0.2.9):

(Download executable) wget https://github.com/brentp/mosdepth/releases/download/v0.2.9/mosdepth

(Update permissions to run it) chmod a+x mosdepth

(Good to go!) <path_to_mosdepth> --help

ADD REPLY • link 4.6 years ago by oconnwald ▴ 20

0

Entering edit mode

Why not use bioconda?

More installation options and instructions can be found on the GitHub page.

ADD REPLY • link 5.6 years ago by WouterDeCoster 47k

Ram · Answer 7 · 2011-01-30

5

Entering edit mode

13.8 years ago

Heikki ▴ 360

@michael.james.clark, thank you. I wrote for myself a script that I can add to when the need arises:

#!/usr/bin/env perl

use strict;
use warnings;

# Usage: samtools pileup file.bam | bam_coverage.pl

my $num;        # per residue coverage
my $len;        # sequence length counter 
my $min = 1000; # minimum coverage
my $max = 0;    # maximum coverage

while (<>) {
    my @a = split /\t/;
    $num += $a[3];
    $min = $a[3] if $min > $a[3];
    $max = $a[3] if $max < $a[3];
    $len++;
}

printf  "Mean coverage  : %.1f\n", $num/$len;
printf  "Coverage range : %d - %d\n", $min, $max;

ADD COMMENT • link updated 6.3 years ago by Ram 44k • written 13.8 years ago by Heikki ▴ 360

1

Entering edit mode

Heiki thanks for the codes. It is working for me

ADD REPLY • link 12.2 years ago by Kwame ▴ 50

0

Entering edit mode

This doesn't work because mpileup does not report the bases with coverage of zero

ADD REPLY • link 11.6 years ago by Raygozak ★ 1.4k

Ram · Answer 8 · 2011-07-01

5

Entering edit mode

13.4 years ago

Rm 8.3k

use

samtools pileup accepted_hits.bam | awk '{ count++ ; SUM += $4 } END { print "Total: " SUM "\t" "Nucleotides: " count "\t" "Average_coverage: " SUM/count }'

(use $8 instead; if pileup with -cf option)

ADD COMMENT • link updated 6.3 years ago by Ram 44k • written 13.4 years ago by Rm 8.3k

0

Entering edit mode

I used the above command to extract the average coverage on my aligned bam files and my statistics are: Total: 8011581243 Nucleotides: 510234050 Average_coverage: 15.7018

I want to know that the average_coverage is stating that on a distribution each of my base is read atleast 15 time right? What does the total signifies and the Nucleotides count: 510234050. Is this the total no of bases that are in the exonic region? then how will I calculate the number of reads from here that actually got mapped on in the exome region? Can we do this from the nucleotides count? I know for each reads consist of 100 bases so if I divide this 510234050 bases with 100 it gives me the reads but that will not give me the reads that actually got mapped in the exome region. How t extract that?

ADD REPLY • link 11.0 years ago by ivivek_ngs ★ 5.2k

score 5 · Answer 9 · 2013-08-04

5

Entering edit mode

11.3 years ago

William ★ 5.3k

How about just running qualimap in GUI mode or to create a QC PDF? Your can run it on the whole genome or on 1 or multiple regions using a bed file.

global coverage stats

coverage histogram

percentage of genome covered x times plot

http://qualimap.bioinfo.cipf.es/

ADD COMMENT • link 11.3 years ago by William ★ 5.3k

0

Entering edit mode

I tried using QualiMap2 on HPC. It just doesn't work.

ADD REPLY • link 5.0 years ago by rohitsatyam102 ▴ 920

Ram · Answer 10 · 2015-05-27

3

Entering edit mode

9.5 years ago

5heikki 11k

This is for people who have a sam/bam file consisting of reads mapped to multiple contigs (samtools needs to be in $PATH):

BBMap: http://sourceforge.net/projects/bbmap/

bbmap/pileup.sh -h

Written by Brian Bushnell
Last modified May 19, 2015

Description:  Calculates per-scaffold coverage information from an unsorted sam file.

Usage:  pileup.sh in=<input> out=<output>

Input may be stdin or a SAM file, compressed or uncompressed.
Output may be stdout or a file.

Input Parameters:
in=<file>           The input sam file; this is the only required parameter.
ref=<file>          Scans a reference fasta for per-scaffold GC counts, not otherwise needed.
fastaorf=<file>     An optional fasta file with ORF header information in PRODIGAL's output format.  Must also specify 'outorf'.
unpigz=t            Decompress with pigz for faster decompression.

..

ADD COMMENT • link updated 6.3 years ago by Ram 44k • written 9.5 years ago by 5heikki 11k

0

Entering edit mode

does it work with a bam file ?

ADD REPLY • link 8.1 years ago by Picasa ▴ 650

1

Entering edit mode

Yes it does. Use the latest version.

$ pileup.sh
Written by Brian Bushnell
Last modified September 1, 2016
Description:  Calculates per-scaffold coverage information from an unsorted sam or bam file.

ADD REPLY • link 8.1 years ago by GenoMax 147k

0

Entering edit mode

Could you introduce a bed file, say exome, and get the coverage stats?

ADD REPLY • link 6.2 years ago by sm.hashemin ▴ 90

score 2 · Answer 11 · 2011-01-28

2

Entering edit mode

13.8 years ago

Mdperry ▴ 40

BioPerl has a library named Bio-Samtools which provides a familiar Perl/BioPerl access to samtools. I heard the author, Lincoln Stein, give a talk on on how he wove the two together use XS. As I recall, one of his examples was exactly the question you posed

ADD COMMENT • link 13.8 years ago by Mdperry ▴ 40

score 2 · Answer 12 · 2021-08-31

2

Entering edit mode

3.2 years ago

MSRS ▴ 590

Now the new release of Samtools for coverage analysis.

samtools coverage input.bam

ADD COMMENT • link 3.2 years ago by MSRS ▴ 590

0

Entering edit mode

Does it provide depth per chromosome or for entire genome?

ADD REPLY • link 3.2 years ago by Kash ▴ 110

0

Entering edit mode

Better check documentation before asking.

ADD REPLY • link 3.2 years ago by Jorge Amigo 14k

0

Entering edit mode

I saw it can give coverage for a specific proportion. But I am looking to find the coverage for entire bam file. I was wondering if there a way to use this tool for entire genome. I am working in university cluster and they have an older version of samtools. So I do not have access to samtools coverage to test this. Anyway thank you.

ADD REPLY • link 3.2 years ago by Kash ▴ 110

0

Entering edit mode

Samtools gives coverage both specific portion and entire genome. See here

ADD REPLY • link 3.2 years ago by MSRS ▴ 590

score 1 · Answer 13 · 2024-08-06

1

Entering edit mode

3 months ago

GenoMax 147k

To make this thread a little more complete. adding pandepth as a current tool : https://github.com/HuiyangYu/PanDepth

ADD COMMENT • link 3 months ago by GenoMax 147k

score 0 · Answer 14 · 2021-09-24

0

Entering edit mode

3.2 years ago

WouterDeCoster 47k

Also tiwih meandepth

ADD COMMENT • link 3.2 years ago by WouterDeCoster 47k

score 0 · Answer 15 · 2021-09-24

Genozip calculates coverage a bit differently, based on the number of bases rather than reads - ignoring hard-clipped and soft-clipped bases, as well as unmapped, secondary, supplementary and duplicate alignments. It can also calculate approximate coverage directly on a FASTQ file.

 $ genocat --coverage my-sample.bam.genozip

--coverage for: my-sample.bam.genozip
Contig         LN        Reads        Bases       Bases   Depth
1              249.3 Mb  61,215,549   9.0 Gb      7.1 %   36.10
2              243.2 Mb  63,342,488   9.3 Gb      7.3 %   38.26
3              198.0 Mb  50,819,357   7.5 Gb      5.9 %   37.76
4              191.2 Mb  48,131,726   7.1 Gb      5.6 %   37.04
5              180.9 Mb  46,495,225   6.8 Gb      5.4 %   37.81
6              171.1 Mb  43,659,445   6.4 Gb      5.0 %   37.53
7              159.1 Mb  41,171,496   6.0 Gb      4.7 %   38.01
8              146.4 Mb  38,081,676   5.6 Gb      4.4 %   38.27
9              141.2 Mb  31,755,198   4.7 Gb      3.7 %   33.05
10             135.5 Mb  34,954,121   5.1 Gb      4.0 %   37.90
11             135.0 Mb  35,234,129   5.2 Gb      4.1 %   38.38
12             133.9 Mb  34,523,219   5.1 Gb      4.0 %   37.91
13             115.2 Mb  24,526,664   3.6 Gb      2.8 %   31.32
14             107.3 Mb  23,525,557   3.5 Gb      2.7 %   32.21
15             102.5 Mb  22,613,925   3.3 Gb      2.6 %   32.41
16             90.4 Mb   23,675,707   3.5 Gb      2.7 %   38.41
17             81.2 Mb   22,148,439   3.2 Gb      2.5 %   40.01
18             78.1 Mb   19,542,517   2.9 Gb      2.3 %   36.81
19             59.1 Mb   16,374,766   2.4 Gb      1.9 %   40.50
20             63.0 Mb   16,530,922   2.4 Gb      1.9 %   38.52
21             48.1 Mb   9,811,790    1.4 Gb      1.1 %   29.93
22             51.3 Mb   10,238,901   1.5 Gb      1.2 %   29.26
X              155.3 Mb  20,421,365   3.0 Gb      2.3 %   19.29
Y              59.4 Mb   3,971,007    582.0 Mb    0.5 %    9.80
MT             16.6 Kb   105,906      15.5 Mb     0.0 %  937.41
Other contigs            28,989,258   4.1 Gb      3.2 %
-----
All contigs              771,860,353  113.3 Gb    88.8%   36.60
Soft clip                             1.2 Gb      0.9 %
Unmapped                 17,071,958   2.4 Gb      1.9 %
Supplementary            1,028,655    52.2 Mb     0.0 %
Duplicate                71,068,615   10.6 Gb     8.3 %
TOTAL                    861,029,581  127.5 Gb    100.0%

Documentation: https://genozip.com/coverage.html

Installing: https://genozip.com/installing.html

Publication: https://www.researchgate.net/publication/349347156_Genozip_-_A_Universal_Extensible_Genomic_Data_Compressor

score 0 · Answer 16 · 2022-03-29

0

Entering edit mode

2.7 years ago

Mansi • 0

Nice tool to get coverage stats: https://github.com/brentp/goleft/tree/master/covstats

ADD COMMENT • link 2.7 years ago by Mansi • 0