Finding all bases at a specific genomic position
1
0
Entering edit mode
5.5 years ago
dodani • 0

Hi

I am using pysam pileup to find bases from all reads that map to a single base in the reference sequence (all bases at that genomic position)

I have this so far but it does not seem to be printing all the bases

vcf_reader = vcf.Reader(filename=input_vcf)
for record in vcf_reader:
   position = record.POS
   for pileupColumn in bam.pileup("chr22", position, position+1, stepper='samtools', min_base_quality=0, ignore_overlaps=False, ignore_orphans=False, truncate=True ):
      for read in pileupColumn.pileups:
         print(read.alignment.query_name, read.alignment.query_sequence[read.query_position], read.query_position)

For example, when I compare the results to allele counts generated from IGV, samtools mpileup and GATK CollectAllelicCounts, at chr22:16262728, there would be 70 Gs (reference nucleotide) and 49 As (alternate nucleotide) but the output from the code only shows:

('HS24_85:1:1201:7411:83378', 'G', 103)
('HS22_78:3:1214:9386:19864', 'G', 122)
('HS24_85:1:2101:12797:23657', 'G', 122)
('HS22_78:3:1315:8282:30946', 'G', 122)
('HS22_78:3:1101:14500:47773', 'G', 121)
('HS24_85:2:1216:5699:17766', 'G', 120)
('HS23_42:1:2111:4475:69608', 'G', 119)
('HS22_78:3:1202:14726:38119', 'G', 119)
('HS24_85:1:1115:9370:82669', 'G', 117)
('HS24_85:1:2307:2074:88160', 'G', 117)
('HS22_78:3:2314:10450:47392', 'G', 116)
('HS22_78:3:2308:15054:34133', 'G', 114)
('HS23_42:1:1305:13596:55163', 'G', 114)
('HS24_85:2:1111:1612:16820', 'G', 112)
('HS22_78:3:2216:2740:77764', 'C', 111)
('HS22_78:3:2302:9314:13921', 'G', 111)
('HS24_85:2:1211:4578:88656', 'G', 110)
('HS23_42:1:1101:20531:37762', 'G', 110)
('HS22_78:3:2310:8780:32097', 'G', 110)
('HS24_85:1:1115:10861:81726', 'G', 107)
('HS23_42:1:1107:6662:41114', 'G', 106)
('HS24_85:1:1212:20743:82452', 'G', 105)
('HS23_42:1:2313:11725:19964', 'G', 105)
('HS22_78:3:1310:13644:88843', 'G', 105)
('HS24_85:2:1107:9280:81457', 'G', 103)
('HS24_85:1:1212:18950:61905', 'G', 103)
('HS24_85:2:1101:17461:24965', 'G', 103)
('HS24_85:1:1116:9648:70745', 'G', 102)
('HS23_42:1:2107:13018:14721', 'G', 102)
('HS24_85:2:2306:11254:67768', 'G', 101)
('HS23_42:1:2104:6675:95474', 'G', 106)
('HS22_78:3:1305:2201:11591', 'G', 101)
('HS24_85:1:1206:14931:10360', 'G', 100)
('HS24_85:2:1315:18777:27046', 'G', 99)
('HS23_42:1:2206:20980:4951', 'G', 99)
('HS24_85:1:2110:10249:85104', 'G', 98)
truncated

pysam version:0.15.2 python version:2.7.16 Aligner used: minimap2 (2.15) Reference: Hg38

Thank you

Dollina

pysam pileup • 3.0k views
ADD COMMENT
1
Entering edit mode

Please use the formatting bar (especially the code option) to present your post better. You can use backticks for inline code (`text` becomes text), or select a chunk of text and use the highlighted button to format it as a code block. I've done it for you this time.
code_formatting

ADD REPLY
0
Entering edit mode
5.5 years ago
AK ★ 2.2k

Hi dodani,

Try (updated):

vcf_reader = vcf.Reader(filename=input_vcf)
for record in vcf_reader:
    position = record.POS
    for pileupColumn in bam.pileup("chr22", position-1, position, stepper='samtools', min_base_quality=0, ignore_overlaps=False, ignore_orphans=False, truncate=True):
        for pileupRead in pileupColumn.pileups:
            if not pileupRead.is_del and not pileupRead.is_refskip:
                print ('Base in read %s at position %s is %s' % (pileupRead.alignment.query_name, position, pileupRead.alignment.query_sequence[pileupRead.query_position]))
ADD COMMENT
0
Entering edit mode

Hi, thanks!

I tried that and still unable to see all bases that map to a particular position. For example, when I compare the allele count with results from IGV or GATK CollectAllelicCounts, at chr22:16262728 there would be 70 Gs (count of reference allele) and 49 As (count of alternate allele).

ADD REPLY
0
Entering edit mode

OK, it works in my case. Did you get any error, or just printing nothing?

ADD REPLY
0
Entering edit mode

no it prints only 1 of the two alleles. For example at chr22:16262728 I get:

Base in read HS24_85:1:1201:7411:83378 at position 16262728 is G
Base in read HS22_78:3:1214:9386:19864 at position 16262728 is G
Base in read HS24_85:1:2101:12797:23657 at position 16262728 is G
Base in read HS22_78:3:1315:8282:30946 at position 16262728 is G
Base in read HS22_78:3:1101:14500:47773 at position 16262728 is G
Base in read HS24_85:2:1216:5699:17766 at position 16262728 is G
Base in read HS23_42:1:2111:4475:69608 at position 16262728 is G
Base in read HS22_78:3:1202:14726:38119 at position 16262728 is G
Base in read HS24_85:1:1115:9370:82669 at position 16262728 is G
Base in read HS24_85:1:2307:2074:88160 at position 16262728 is G
Base in read HS22_78:3:2314:10450:47392 at position 16262728 is G
Base in read HS22_78:3:2308:15054:34133 at position 16262728 is G
Base in read HS23_42:1:1305:13596:55163 at position 16262728 is G
Base in read HS24_85:2:1111:1612:16820 at position 16262728 is G
Base in read HS22_78:3:2216:2740:77764 at position 16262728 is C
Base in read HS22_78:3:2302:9314:13921 at position 16262728 is G
Base in read HS24_85:2:1211:4578:88656 at position 16262728 is G
Base in read HS23_42:1:1101:20531:37762 at position 16262728 is G
Base in read HS22_78:3:2310:8780:32097 at position 16262728 is G
Base in read HS24_85:1:1115:10861:81726 at position 16262728 is G
Base in read HS23_42:1:1107:6662:41114 at position 16262728 is G
Base in read HS24_85:1:1212:20743:82452 at position 16262728 is G
Base in read HS23_42:1:2313:11725:19964 at position 16262728 is G
Base in read HS22_78:3:1310:13644:88843 at position 16262728 is G

. . . Is there a way to print all the bases at that position?

ADD REPLY
1
Entering edit mode

Can you try again by changing from position, position+1 to position-1, position (in bam.pileup)? It should work as within pysam coordinates are 0-based.

ADD REPLY
0
Entering edit mode

Thank you so much! That helps

ADD REPLY

Login before adding your answer.

Traffic: 1230 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6