Question

Is there a tool that quantifies “spatial coverage”, that is what percentage of a reference assembly has a read mapped to it?

0

Entering edit mode

2.3 years ago

O.rka ▴ 740

I’m not sure if this is the appropriate term but the only way I can think of doing this is converting a bam file to bed file then, making an array of length N where N is the size of the genome, then adding up all the positions, then getting the ratio of nonzero events. Sounds very memory intensive so I’m wondering if there’s a better way.

I have the following files:

BAM files of reads mapped to a metagenome of contigs from different metagenome-assembled genomes (MAG)
A table of identifiers [id_contig]<tab>[id_mag]
A fasta file with all of the contigs

I see that there is samtools coverage but I don't how to get coverage for only certain contigs in the bam file. I also found bedtools genomeCov but it's a little confusing how I can adapt my data.

What I'm ultimately looking for is the following table:

             [mag_1] [mag_2] ... [mag_m]
[bam_file_1] 
[bam_file_2] 
...
[bam_file_n]

Where each value the matrix has the percent of genome covered by reads in the bam file.

coverage genomics mapping assembly • 754 views

ADD COMMENT • link updated 2.3 years ago by GenoMax 148k • written 2.3 years ago by O.rka ▴ 740

0

Entering edit mode

You could do samtools coverage with

-r, --region REG        show specified region. Format: chr:start-end.

ADD REPLY • link 2.3 years ago by GenoMax 148k