Dear community,
I am trying to find coverage for the chr19 as "per-base". Instead I get coverage "per-base" for the whole genome, which is a time consuming operation. I have paired-end data which was filtered to proper aligned reads. The organism is mm10.
My command is as following:
bedtools genomecov -d -ibam infile.bam -g only_chr19.genome > output_file.txt
bedtools Version: v2.26.0
I found chr19 length for mm10 by using the command:
curl -s ftp://hgdownload.cse.ucsc.edu/goldenPath/mm10/database/chromInfo.txt.gz | gunzip -c
Content of the only_chr19.genome file:
chr19 61431566
Content of BAM input file, which is sorted by position and also filtered to "read mapped in proper pair" with the following command
samtools view -b -F 256 -o out.filt.bam infile.bam
NS500:6295 99 chr1 3047349 1 16M = 3047349 -16 AGGCACTGAAGAATTA AAAAAEEEAEEEEEEE AS:i:0 XS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:16 YS:i:0 YT:Z:CP
NS500:6295 147 chr1 3047349 1 16M = 3047349 -16 AGGCACTGAAGAATTA EEEEEEEEEEEAAAAA AS:i:0 XS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:16 YS:i:0 YT:Z:CP
NS500:12119 99 chr1 3086977 32 22M = 3086977 -22 AAGAGTGGATGCCTGAGCTTTC AAAAAEEEEEEEEEEEEAEEEE AS:i:0 XS:i:-5 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:22 YS:i:0 YT:Z:CP
NS500:18909 99 chr1 3086977 18 22M = 3086977 -22 AAGAGAGGATGCCTGAGCTTTC AAAAA//E/EEEE/E/EEE/EE AS:i:-3 XS:i:-8 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:5T16 YS:i:0 YT:Z:CP
NS500:12119 147 chr1 3086977 32 22M = 3086977 -22 AAGAGTGGATGCCTGAGCTTTC //AE/EEEEEE6AEEAE66AAA AS:i:0 XS:i:-4 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:22 YS:i:0 YT:Z:CP
I can't understand how bedtools can produce coverage for the whole genome if the genome file contains only one line of the required information (chr19)?
Another question I am having is, the manual says "-pc Calculates coverage for paired-end reads, coverage is calculated as the number of fragments covering each base pair"
What is understood by fragments? Does it mean "first in pair R1" and "second in pair R2" read? Or does it take each read (R1 and R2) and count coverage for each of them, and not as pair-end reads?
How could I found coverage considering both reads?
Thank you
thank you so much, Istvan. It's very clear!