Hi
I am using pysam pileup to find bases from all reads that map to a single base in the reference sequence (all bases at that genomic position)
I have this so far but it does not seem to be printing all the bases
vcf_reader = vcf.Reader(filename=input_vcf)
for record in vcf_reader:
position = record.POS
for pileupColumn in bam.pileup("chr22", position, position+1, stepper='samtools', min_base_quality=0, ignore_overlaps=False, ignore_orphans=False, truncate=True ):
for read in pileupColumn.pileups:
print(read.alignment.query_name, read.alignment.query_sequence[read.query_position], read.query_position)
For example, when I compare the results to allele counts generated from IGV, samtools mpileup and GATK CollectAllelicCounts, at chr22:16262728, there would be 70 Gs (reference nucleotide) and 49 As (alternate nucleotide) but the output from the code only shows:
('HS24_85:1:1201:7411:83378', 'G', 103)
('HS22_78:3:1214:9386:19864', 'G', 122)
('HS24_85:1:2101:12797:23657', 'G', 122)
('HS22_78:3:1315:8282:30946', 'G', 122)
('HS22_78:3:1101:14500:47773', 'G', 121)
('HS24_85:2:1216:5699:17766', 'G', 120)
('HS23_42:1:2111:4475:69608', 'G', 119)
('HS22_78:3:1202:14726:38119', 'G', 119)
('HS24_85:1:1115:9370:82669', 'G', 117)
('HS24_85:1:2307:2074:88160', 'G', 117)
('HS22_78:3:2314:10450:47392', 'G', 116)
('HS22_78:3:2308:15054:34133', 'G', 114)
('HS23_42:1:1305:13596:55163', 'G', 114)
('HS24_85:2:1111:1612:16820', 'G', 112)
('HS22_78:3:2216:2740:77764', 'C', 111)
('HS22_78:3:2302:9314:13921', 'G', 111)
('HS24_85:2:1211:4578:88656', 'G', 110)
('HS23_42:1:1101:20531:37762', 'G', 110)
('HS22_78:3:2310:8780:32097', 'G', 110)
('HS24_85:1:1115:10861:81726', 'G', 107)
('HS23_42:1:1107:6662:41114', 'G', 106)
('HS24_85:1:1212:20743:82452', 'G', 105)
('HS23_42:1:2313:11725:19964', 'G', 105)
('HS22_78:3:1310:13644:88843', 'G', 105)
('HS24_85:2:1107:9280:81457', 'G', 103)
('HS24_85:1:1212:18950:61905', 'G', 103)
('HS24_85:2:1101:17461:24965', 'G', 103)
('HS24_85:1:1116:9648:70745', 'G', 102)
('HS23_42:1:2107:13018:14721', 'G', 102)
('HS24_85:2:2306:11254:67768', 'G', 101)
('HS23_42:1:2104:6675:95474', 'G', 106)
('HS22_78:3:1305:2201:11591', 'G', 101)
('HS24_85:1:1206:14931:10360', 'G', 100)
('HS24_85:2:1315:18777:27046', 'G', 99)
('HS23_42:1:2206:20980:4951', 'G', 99)
('HS24_85:1:2110:10249:85104', 'G', 98)
truncated
pysam version:0.15.2 python version:2.7.16 Aligner used: minimap2 (2.15) Reference: Hg38
Thank you
Dollina
Please use the formatting bar (especially the
code
option) to present your post better. You can use backticks for inline code (`text` becomestext
), or select a chunk of text and use the highlighted button to format it as a code block. I've done it for you this time.