I have created a fasta file with 300 simulated reads at a given SNP position (reads are 75mer, and I simulate every position in the + and - strand, hence 75x4=300 reads). Then, I map them back to the ref genome using bowtie and then use pileup to see the alignment at the position in question:
bf <- BamFile(bamfile)
param <- ScanBamParam(which=GRanges("chr10",IRanges(start=100189138, end=100189138)))
quickBamFlagSummary
shows that there are 300 reads
> quickBamFlagSummary(bf, param=param)
group | nb of | nb of | mean / max
of | records | unique | records per
records | in group | QNAMEs | unique QNAME
All records........................ A | 300 | 300 | 1 / 1
o template has single segment.... S | 300 | 300 | 1 / 1
o template has multiple segments. M | 0 | 0 | NA / NA
- first segment.............. F | 0 | 0 | NA / NA
- last segment............... L | 0 | 0 | NA / NA
- other segment.............. O | 0 | 0 | NA / NA
Note that (S, M) is a partitioning of A, and (F, L, O) is a partitioning of M.
Indentation reflects this.
Details for group S:
o record is mapped.............. S1 | 300 | 300 | 1 / 1
- primary alignment......... S2 | 300 | 300 | 1 / 1
- secondary alignment....... S3 | 0 | 0 | NA / NA
o record is unmapped............ S4 | 0 | 0 | NA / NA
Pileup only shows 262 reads (63+62+75+62)... Any idea why? I don't use any threshold on min_base_quality
and min_mapq
...
> pileup(bf, scanBamParam=param, min_base_quality=0,min_mapq=0)
seqnames pos strand nucleotide count which_label
1 chr10 100189138 + A 63 chr10:100189138-100189138
2 chr10 100189138 - A 62 chr10:100189138-100189138
3 chr10 100189138 + G 75 chr10:100189138-100189138
4 chr10 100189138 - G 62 chr10:100189138-100189138
I looked the region up with IGV, and also, IGV reports 300 reads.
I actually simulate several different positions, and the 262 reads (instead of 300) appear every time.. Not sure why. I am sure pileup is filtering the reads somehow..
Also, the same exact script works fine if I simulate 36mer reads (instead of 75...)! Any ideas why?
Could you post a BAM file just containing this region somewhere for debugging? Also, what's the output of
sessionInfo()
?Actually, note that
pileup()
takes not only ascanBamParam
option, but also apileupParam
option. Given that the latter defaults to some filtering, I wouldn't be surprised if that's the source of the problem.Thank you! it is solved by changing
max_depth
parameter withinPileupParam()
I was adding my arguments without PileupParam constructor, this is wrong:
This is what samtools pileup command (from samtools) gives:
Counting in python shows 150 reads in each allele - the reference ("A") and alternative ("G") (75 reads in each strand):