Question

Bam file depth of coverage and statstics

4

Entering edit mode

3.5 years ago

daewowo ▴ 80

I am looking for a tool that can provide a high level summary of coverage statistics of genomes used in an alignment.

I concatenated GCF_000002285.5_Dog10K_Boxer_Tasha_genomic.fna, GCF_000001405.39_GRCh38.p13_genomic.fna, and GCF_000001635.27_GRCm39_genomic.fna. Each of these animal genomes contains hundreds of chromosome sequences.

I have aligned RNASeq data to this combined dog-human-mouse reference and now I would like to calculate the total number of reads, coverage of each animal genome and depth of coverage. I am only interested in animal level summary, not chromosome coverage statstics.

Tools such as mosdepth, depth-cover, qualimap (V2.2.1) only provide coverage at a chromosome level, not a summary at full genome level (eg coverage 1x, 4x, 10x depths).

Update after a couple of days: Pehaps what I can do is get all the chromosome accessions for each of the genomes. Then extract from the .bam (forst convert to sam) result 3 seperate files containing only the results relevant to each genome. Then find a tool that will calculate summary statistics on each of the 3 sam files (total coverage at depth n, total number of reads mapped).

Bam • 3.8k views

ADD COMMENT • link 3.5 years ago by daewowo ▴ 80

0

Entering edit mode

Why do you say that the Qualimap does not give you statistics at the genome level?

qualimap bamqc -c -bam input.bam

The above command gives you a multi-section report and the "Coverage" section shows you depth at the genome level

ADD REPLY • link 3.5 years ago by reza ▴ 300

2

Entering edit mode

Maybe I am missing something - are you talking about that .html report output? The coverage section in that generates a single value for coverage

Coverage
Mean    0.0661
Standard Deviation  133.6669

plus a long table of chromosome stats and summary plots. Looking at the 'geneome_results.txt' file generates, I get also a 'Coverage' section, also with same single mean and std dev as above and a long list like this

There is a 0.03% of reference with a coverageData >= 1X

However it does not give me results per genome eg like this:

#Genome    DATA(%)   Avg depth      Median   Coverage%    Cov 4x %   Cov 10x %   Cov 30x %  Cov 100x %
human     90.00    28.41         23.0      99.42       96.91       83.98       34.05        2.58
mouse     5.00     28.41         23.0      99.42       96.91       83.98       34.05        2.58
dog   5.00     28.41         23.0      99.42       96.91       83.98       34.05        2.58

ADD REPLY • link 3.5 years ago by daewowo ▴ 80

0

Entering edit mode

For multi-sample bam file you can use qualimap2 but unfortunately, I do not have information about the multi-reference bam file

ADD REPLY • link 3.5 years ago by reza ▴ 300

0

Entering edit mode

I am using qualimap 2. I may be able to do this if I can generate GTF files for each genome and supply that to qualimap 2. Am trying to work out how to get GTF files

ADD REPLY • link 3.5 years ago by daewowo ▴ 80

1

Entering edit mode

Why not use pre-prepared GTF files (per genome) on the NCBI?

For example, you can find that for dog assembled genome here

ADD REPLY • link 3.5 years ago by reza ▴ 300

0

Entering edit mode

Pysam looks to be the way to go, counts per chromosome is easy but I need some time to spend to work out coverage per genome. Then just total counts for the full genome. No need to mess with GTF files

ADD REPLY • link 3.5 years ago by daewowo ▴ 80

3

Entering edit mode

3.5 years ago

cfos4698 ★ 1.1k

Check out bedtools genomecov: https://bedtools.readthedocs.io/en/latest/content/tools/genomecov.html

As per their example:

bedtools genomecov -i A.bed -g my.genome
chr1   0  980  1000  0.98
chr1   1  20   1000  0.02
chr2   1  500  500   1
genome 0  980  1500  0.653333
genome 1  520  1500  0.346667

ADD COMMENT • link 3.5 years ago by cfos4698 ★ 1.1k

0

Entering edit mode

I have 3 large genomes I am simultaneously aligning to: human, mouse, dog. The output of bedtools gives me alignments to 840,000 genomes. I am looking for a tool that can give me the summary at a mouse/human/dog level only, without showing alignment scores to individual chromosomes.

Both bamdst and bamstats can give %coverage at various read depths for human/mouse/dog but I cant use these as per above

ADD REPLY • link 3.5 years ago by daewowo ▴ 80

score 1 · Accepted Answer · 2021-11-07

1

Entering edit mode

3.5 years ago

daewowo ▴ 80

Will use Pysam to write code to do this

ADD COMMENT • link 3.5 years ago by daewowo ▴ 80