edit: Thanks for Devon Ryan's help. I solved this problem.
I am a green hand of bioinfomatics. I am trying to apply pybedtools and pysam to compute the read enrichment. I have a BAM file and a BED file. The BED records target regions. My output is over 80%. I feel it is too high. Since I also computed the percentage of bases enriched in the target regions which is just about 57%(Of course, it could be wrong also). Could you please help me to double-check my code and result? Thanks! PS: Sorry for my poor English.
Attached please find my data: https://www.google.com/url?q=https://www.dropbox.com/s/c2768qj1n2flcfg/TEST1.zip?dl%3D0&sa=D&ust=1527459500133000&usg=AFQjCNHqoYhvNp3OObNdyivI9I0sr8IT3w
My code for read enrichment
:
import pybedtools as bd
target = bd.BedTool(path_bed)
bam = bd.BedTool(path_bam)
bam = bam.bam_to_bed(stream=True)
target = target.remove_invalid().saveas()
temp = bam.remove_invalid().saveas()
ontag_reads = temp.coverage(target, counts=True).sort() ## Reads mapped on target regions
read_count = np.array([int(line[-1]) for line in ontag_reads]) ## Sum of reads on target regions
Per_ontag = read_count.sum()/samraw.mapped*100 ## Percentage of reads enriched in the target regions
print('Percentage of reads enriched in the target regions: {0:.2f}%'.format(Per_ontag))
My code of bases enriched
(Total Aligned Bases in Targeted Regions/Total Aligned Bases) :
samraw = py.AlignmentFile(path_bam, 'rb')
read_tag = target.coverage(path_bam, d=True)
ontag_base = np.array([int(base[5]) for base in read_tag]) ## Bases mapped on target
read_base = np.array([line.query_length for line in samraw]) ## Total mapped bases ##the length of the read.
Per_base = ontag_base.sum()/read_base.sum()*100
print('Percentage of bases enriched in the target regions: {0:.2f}%'.format(Per_base))
You could save yourself the hassle and use
plotEnrichment
from deepTools. I wrote it for this purpose.Thanks Devon. I tried
plotEnrichment
. It proves my output is correct. Thanks a lot.A quick question: I want to count the total aligned bases in BAM. Is there some way to do that? Do I just need to sum the length of each alignment in BAM?
You can basically get that from
plotFingerprint
. 1 - the x intercept is the fraction of covered bases.