Question

How long will bam-readcount take to run human samples?

0

Entering edit mode

5.4 years ago

Laven9 • 0

I am now trying to run bam-readcounts. The sample I use is from human. But it takes almost 2-3 days to run one bam file. Is it normal? The size of my file is 8,000,000 KB. I am a little worried the time it takes is too long.

bam-readcounts • 2.1k views

ADD COMMENT • link updated 5.4 years ago by ATpoint 85k • written 5.4 years ago by Laven9 • 0

0

Entering edit mode

Consider using mosdepth (https://github.com/brentp/mosdepth ) instead. It is multi-threaded. 8G is not that large a bam file so 2-3 days sounds excessive. bam-readcounts is a single threaded application?

ADD REPLY • link 5.4 years ago by GenoMax 147k

0

Entering edit mode

I am not sure whether bam-readcounts is a single threaded application, but the time it cost is much longer than 2-3 days. And I am going to try mosdepth now. Thanks for your advice!

ADD REPLY • link 5.4 years ago by Laven9 • 0

0

Entering edit mode

mosdepth and varscan are not compatible, don't even try. ~~plit the file into chromosomes and use GNU parallel to parallelize.~~ see answer below

ADD REPLY • link 5.4 years ago by ATpoint 85k

0

Entering edit mode

Oh! Is it that? Then I'll try GUN parallel. Thanks a lot !

ADD REPLY • link 5.4 years ago by Laven9 • 0

0

Entering edit mode

Actually, I am a little worried. The size I got with bam-readcounts is as large as 200,000,000 KB with an input of 8,000,000. I am not sure if it is normal. The code I use is quite simple, as:

bam-readcount my.bam -f hg38.fa > my.metric.txt

ADD REPLY • link 5.4 years ago by Laven9 • 0

score 1 · Answer 1 · 2019-07-01

1

Entering edit mode

5.4 years ago

ATpoint 85k

Above you are running the tool on the entire genome which is not necessary and slow, you only need the positions where you have variants. Here is a suggestion, given you have a BAM file called your.bam which is indexed by samtools and a list of variants your.vcf:

## Use the variant file to create a BED file with the variants for which `bam-readcounts`extracts information:
grep -v '^#'  your.vcf \
  | awk 'OFS="\t" {print $1, $2-1, $2+1}' \
  | sort -k1,1 -k2,2n \
  | bedtools merge -i - > your.bed

## Use samtools to pass only genomic positions to bam-readcount where variants are located:
samtools view -b -L your.bed your.bam > your_subset.bam

bam-readcount (...command...) your_subset.bam

ADD COMMENT • link 5.4 years ago by ATpoint 85k

0

Entering edit mode

Sorry for reply so late. Thanks for your help! Wow! I think I need to use those tools more flexibly. It seems much faster if I just run with bam-readcount the positions I need in the file. Thanks again for your help. It do help a lot!

ADD REPLY • link 5.4 years ago by Laven9 • 0

0

Entering edit mode

Sorry for troubling again. But I indeed have met with an annoying problem. When I tried to run fpfilter by Varscan, I got results like:

171661 variants in input file
171466 had a bam-readcount result
47692 had reads1>=2
0 passed filters
171661 failed filters
195 failed because no readcounts were returned
 425 failed minimim variant count < 4
 34 failed minimum variant freq < 0.05
 13936 failed minimum strandedness < 0.01
 47716 failed minimum reference readpos < 0.1
 171466 failed minimum variant readpos < 0.1
 47716 failed minimum reference dist3 < 0.1
171466 failed minimum variant dist3 < 0.1
 0 failed maximum reference MMQS > 100
 0 failed maximum variant MMQS > 100
 0 failed maximum MMQS diff (var - ref) > 50
29 failed maximum mapqual diff (ref - var) > 50
2209 failed minimim ref mapqual < 15
9102 failed minimim var mapqual < 15
1051 failed minimim ref basequal < 15
134 failed minimim var basequal < 15
 0 failed maximum RL diff (ref - var) > 0.25

And when I run bam-readcount, there are all warnings like:

 Couldn't find the generated tag.
 Couldn't find single-end mapping quality. Check to see if the SM tag is in BAM.

I do have researched for the warnings, and it was saying that I could ignore it if I don't use the result of sing-end counts. But I am not quite sure if it is the warnings leading to the failing of my filter. And this is some of my output of readcount:

    chr1    14539   T       4       =:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00  A:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.0
0       C:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00  G:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00  T:4:18.25:31.50:0.00:
4:0:0.00:0.02:0.00:0:0.00:0.00:0.00     N:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00

Could you please give me some advice on the failed filtering?

ADD REPLY • link 5.4 years ago by Laven9 • 0

0

Entering edit mode

I haven't used that tool in a while and actually gave up on it because I never found proper documentation on things like this fpfilter. The tool is not maintained anymore and pretty slow unless one invests time in custom parallelization. It served its purpose to the community but now I would really use a more recent and maintained tool.

ADD REPLY • link 5.4 years ago by ATpoint 85k

0

Entering edit mode

And may be I'd better try more recent tools too. Luckily, I have just found a review for this area and I think it will help a lot. I used to try Mutect2, but may be because I only offer two bam files ( a tumor bam and a normal bam), the result I got seems not exclude those SNPs. And I now guess it will do better if I could offer gnomad files as well. I am going to try this tool again with gnomad files ( it is hard to download really, area problem).

Anyway, thank you for your help! It do help a lot!

ADD REPLY • link 5.4 years ago by Laven9 • 0