I am looking for a tool that can provide a high level summary of coverage statistics of genomes used in an alignment.
I concatenated GCF_000002285.5_Dog10K_Boxer_Tasha_genomic.fna, GCF_000001405.39_GRCh38.p13_genomic.fna, and GCF_000001635.27_GRCm39_genomic.fna. Each of these animal genomes contains hundreds of chromosome sequences.
I have aligned RNASeq data to this combined dog-human-mouse reference and now I would like to calculate the total number of reads, coverage of each animal genome and depth of coverage. I am only interested in animal level summary, not chromosome coverage statstics.
Tools such as mosdepth, depth-cover, qualimap (V2.2.1) only provide coverage at a chromosome level, not a summary at full genome level (eg coverage 1x, 4x, 10x depths).
Update after a couple of days: Pehaps what I can do is get all the chromosome accessions for each of the genomes. Then extract from the .bam (forst convert to sam) result 3 seperate files containing only the results relevant to each genome. Then find a tool that will calculate summary statistics on each of the 3 sam files (total coverage at depth n, total number of reads mapped).
Why do you say that the Qualimap does not give you statistics at the genome level?
The above command gives you a multi-section report and the "Coverage" section shows you depth at the genome level
Maybe I am missing something - are you talking about that .html report output? The coverage section in that generates a single value for coverage
plus a long table of chromosome stats and summary plots. Looking at the 'geneome_results.txt' file generates, I get also a 'Coverage' section, also with same single mean and std dev as above and a long list like this
However it does not give me results per genome eg like this:
For multi-sample bam file you can use qualimap2 but unfortunately, I do not have information about the multi-reference bam file
I am using qualimap 2. I may be able to do this if I can generate GTF files for each genome and supply that to qualimap 2. Am trying to work out how to get GTF files
Why not use pre-prepared GTF files (per genome) on the NCBI?
For example, you can find that for dog assembled genome here
Pysam looks to be the way to go, counts per chromosome is easy but I need some time to spend to work out coverage per genome. Then just total counts for the full genome. No need to mess with GTF files