bedtools genomecover fails to find coverage for the specific chr region
1
0
Entering edit mode
7.3 years ago
IrK ▴ 100

Dear community,

I am trying to find coverage for the chr19 as "per-base". Instead I get coverage "per-base" for the whole genome, which is a time consuming operation. I have paired-end data which was filtered to proper aligned reads. The organism is mm10.

My command is as following:

bedtools genomecov -d -ibam infile.bam -g only_chr19.genome > output_file.txt

bedtools Version: v2.26.0

I found chr19 length for mm10 by using the command:

curl -s ftp://hgdownload.cse.ucsc.edu/goldenPath/mm10/database/chromInfo.txt.gz | gunzip -c

Content of the only_chr19.genome file:

chr19    61431566

Content of BAM input file, which is sorted by position and also filtered to "read mapped in proper pair" with the following command

samtools view -b -F 256 -o out.filt.bam infile.bam

NS500:6295    99    chr1    3047349    1    16M    =    3047349    -16    AGGCACTGAAGAATTA    AAAAAEEEAEEEEEEE    AS:i:0    XS:i:0    XN:i:0    XM:i:0    XO:i:0    XG:i:0    NM:i:0    MD:Z:16    YS:i:0    YT:Z:CP
NS500:6295    147    chr1    3047349    1    16M    =    3047349    -16    AGGCACTGAAGAATTA    EEEEEEEEEEEAAAAA    AS:i:0    XS:i:0    XN:i:0    XM:i:0    XO:i:0    XG:i:0    NM:i:0    MD:Z:16    YS:i:0    YT:Z:CP
NS500:12119    99    chr1    3086977    32    22M    =    3086977    -22    AAGAGTGGATGCCTGAGCTTTC    AAAAAEEEEEEEEEEEEAEEEE    AS:i:0    XS:i:-5    XN:i:0    XM:i:0    XO:i:0    XG:i:0    NM:i:0    MD:Z:22    YS:i:0    YT:Z:CP
NS500:18909    99    chr1    3086977    18    22M    =    3086977    -22    AAGAGAGGATGCCTGAGCTTTC    AAAAA//E/EEEE/E/EEE/EE    AS:i:-3    XS:i:-8    XN:i:0    XM:i:1    XO:i:0    XG:i:0    NM:i:1    MD:Z:5T16    YS:i:0    YT:Z:CP
NS500:12119    147    chr1    3086977    32    22M    =    3086977    -22    AAGAGTGGATGCCTGAGCTTTC    //AE/EEEEEE6AEEAE66AAA    AS:i:0    XS:i:-4    XN:i:0    XM:i:0    XO:i:0    XG:i:0    NM:i:0    MD:Z:22    YS:i:0    YT:Z:CP

I can't understand how bedtools can produce coverage for the whole genome if the genome file contains only one line of the required information (chr19)?

Another question I am having is, the manual says "-pc Calculates coverage for paired-end reads, coverage is calculated as the number of fragments covering each base pair"

What is understood by fragments? Does it mean "first in pair R1" and "second in pair R2" read? Or does it take each read (R1 and R2) and count coverage for each of them, and not as pair-end reads?

How could I found coverage considering both reads?

Thank you

genomecov bedtools • 2.1k views
ADD COMMENT
1
Entering edit mode
7.3 years ago

The purpose of the chromosome file is not to subselect intervals but to indicate how far out to report coverages even if you don't have alignments covering those regions. Hence it is reporting all coverages.

Slice your BAM file to the region of interest (samtools view -b alignment.bam chr19 > small.bam) and run genome coverage on that.

A fragment typically means the region between the leftmost aligned base of a read pair to the rightmost aligned base of its mate.

If you want to count the coverage of both pairs as reads and not fragments then don't use the -pc option

ADD COMMENT
0
Entering edit mode

thank you so much, Istvan. It's very clear!

ADD REPLY

Login before adding your answer.

Traffic: 2473 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6