Tool:PanDepth, an ultra-fast and efficient genomic tool for coverage calculation
1
4
Entering edit mode
11 months ago
Huiyang ▴ 190
  1. PanDepth is a high-performance tool for calculating coverage in sequencing data, outperforming other tools in speed for both BAM and CRAM-format alignment files, regardless of read length.
  2. PanDepth accepts sorted or unsorted BAM and CRAM-format alignment files and GTF/GFF/BED-formatted interval files, or a specific window size
  3. PanDepth is memory efficient, making it an attractive choice for large-scale genomic data analysis.
  4. The statistical results of PanDepth on depth and coverage are completely consistent with samtools.

You can get the PanDepth code and manual on github here

enter image description here

Figure: The computation time comparison of seven software tools using 150GB sequencing reads in different numbers of threads for genome coverage calculations.

bam paf depth cram coverage • 2.5k views
ADD COMMENT
1
Entering edit mode

Can PanDepth output the base coverage for all position?

ADD REPLY
1
Entering edit mode

I regret to inform you that PanDepth does not support outputting the base coverage for all position due to the extremely time-consuming of this process and the large size of the output file. As not every base position requires in-depth analysis, you can use the ‘-w 200’ parameter to divide the whole genome into non-overlapping 200 bp windows. Then, based on the results of these sections, you can select the sections you are interested in and use tools like 'samtools depth' to output the base coverage for each position in your selected sections.

ADD REPLY
1
Entering edit mode

There are datatypes where this kind of coverage calculation would be useful; even if time-consuming, if your toolkit scales to that kind of analysis it would be preferable to more manageable samtools/pysam-based approaches.

ADD REPLY
2
Entering edit mode

We greatly appreciate your suggestion. In the forthcoming version, we will incorporate a feature to report the depth of coverage across all positions.

ADD REPLY
1
Entering edit mode

Thank you very much for your suggestion. The latest version of PanDepth (v2.21) now supports the output of depth for all positions.

ADD REPLY
0
Entering edit mode

Thank you for this tool, as it served my purpose in a very short period of time. As I am new to bioinformatics, I am not clearly understanding the output file. I used the bed and bam files as input and got the result. What do the total depth, coverage, and mean depth mean, and how are they calculated? It will be helpful if I can know this.

ADD REPLY
0
Entering edit mode

Thank you very much for using PanDepth. PanDepth is a tool that calculates the coverage of alignment regions by extracting chromosome names, alignment start positions, and CIGAR tags from alignment files, and then merges the coverage information of each extracted read to obtain the final output.

"Total depth" refers to the sum of sequencing depths for all bases at a given region.

"Coverage" represents the proportion of at least one sequencing read covering a genome or specific region. It is typically expressed as a percentage; for example, a coverage of 95% at a position indicates that the sequencing reads cover 95% of that segment.

"Mean depth" denotes the average sequencing depth at each position within a specified region. It is calculated by dividing the total depth by the number of covered positions.

ADD REPLY
0
Entering edit mode
9 months ago
Huiyang ▴ 190

PanDepth's official version (v2.22) has been released and fixes the following issues:

  1. Fixed the bug that drop the sites with depth >65535.
  2. Compatibility with CSI-format index, which are typically used for alignment files with chromosomes larger than 512MB.
  3. Fixed the bug related to reads with alignment regions larger than 1 Mb.

And the following features have been added:

  1. The '-a' parameter is added for output all the site depth.
  2. Support for input of PAF format alignment files with the CIGAR tag.

You can get the PanDepth code and manual on github here

ADD COMMENT
0
Entering edit mode

Hi, thanks for developping PanDepth. I would like to know/suggest whether you would consider to include the possibility of calculating the coverage for several files with the same call, by using a wildcard and aggregating everything onto the same output file. Cheers,

Jordi

ADD REPLY
0
Entering edit mode

Thank you very much for your suggestion. I am not sure if my understanding is correct, but it seems that you have sequencing data from one sample, which had its reads split prior to alignment, resulting in multiple BAM/CRAM files from the same sample post-alignment. Due to the time-consuming of merging large BAM/CRAM files using 'samtools merge', you plan to skip this step and directly use PanDepth to calculate coverage from the multiple BAM/CRAM files and merge the results into a single file. Is that correct?

ADD REPLY
0
Entering edit mode

Not really. However, I think this functionality could be useful if you have replicates or conditions, to be able to directly compare the outputs instead of having to concatenate them.

ADD REPLY
0
Entering edit mode

I appreciate your explanation and we will carefully consider your suggestions. We will notify you here if these features are implemented.

ADD REPLY

Login before adding your answer.

Traffic: 2172 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6