Question

BAM File Export Coverage Values from IGV or using PySam/Samtools

0

Entering edit mode

6.6 years ago

EOX123 ▴ 10

Would anyone please be able to help me with the following, I am new to sequencing analysis:

Input Data: .bam file with thousands of reads .fasta file with my reference

When I align my .bam file reads to my reference using IGV I get a coverage track displaying the depth of the reads at each locus as a grey bar, and coloured bars with the proportion of each base (A, C, G, T), insertions and deletions, at "low quality" read sites. These are the values I am interested in.

What I would like now, would be to export these values so I can work in Python. How may I achieve this?

Or is there a more direct way of calculating the same values directly in python with say pysam that I have not found yet?

In IGV you can export the counts using IGVtools to .tdf or even .wig, but I am restricted to IGV to read them afterwards, and would prefer a more direct method.

IGV alignment sequencing samtools bam • 4.3k views

ADD COMMENT • link updated 6.3 years ago by christophersemail40 • 0 • written 6.6 years ago by EOX123 ▴ 10

0

Entering edit mode

What is your end goal with this?

ADD REPLY • link 6.6 years ago by Devon Ryan 104k

0

Entering edit mode

To have a file that gives me at each position the percent reads for each base (ACTGN), insertion, deletions but I just found out that the IGVtools command line does exactly what I want!

ADD REPLY • link 6.6 years ago by EOX123 ▴ 10

0

Entering edit mode

That's not an end goal. What do you actually want to do with that? I imagine your end goal isn't to start at lists of "percent A, percent C, ..." for each position all day.

ADD REPLY • link 6.6 years ago by Devon Ryan 104k

0

Entering edit mode

Could you kindly post your solution here? I also need that values.

ADD REPLY • link 5.9 years ago by xzhangxmu • 0

score 0 · Answer 1 · 2018-04-11

It seem that pysam can get the coverage values(as well as the proportion of each base (A, C, G, T), insertions and deletions), but it's not the best choice if there are some more convenient python packages.

for example:

SRR3239808.3377 16 chr2 812 60 13S29M * 0 0 CCGCGCCACCACCGCGGACTCCGCTCCCCGGCCGGGGCCGCC E/AEE///A/

This is a single-end read.

After pysam open a bam/sam file, this read1 is a < pysam.libcalignedsegment.AlignedSegment object >

get the read's SEQ .
read1.query_sequence
'CCGCGCCACCACCGCGGACTCCGCTCCCCGGCCGGGGCCGCC'
get the reference(which reads mapped)'s seq
read1.get_reference_sequence()
'aCaGcCTgtcCaCCgCGGCCGGGGCCGCC'
get the list of aligned read (query) and reference positions
read1.get_aligned_pairs(with_seq=True)#this seq is the reference seq
[(0, None, None), (1, None, None), (2, None, None), (3, None, None), (4, None, None), (5, None, None), (6, None, None), (7, None, None), (8, None, None), (9, None, None), (10, None, None), (11, None, None), (12, None, None), (13, 811, 'a'), (14, 812, 'C'), (15, 813, 'a'), (16, 814, 'G'), (17, 815, 'c'), (18, 816, 'C'), (19, 817, 'T'), (20, 818, 'g'), (21, 819, 't'), (22, 820, 'c'), (23, 821, 'C'), (24, 822, 'a'), (25, 823, 'C'), (26, 824, 'C'), (27, 825, 'g'), (28, 826, 'C'), (29, 827, 'G'), (30, 828, 'G'), (31, 829, 'C'), (32, 830, 'C'), (33, 831, 'G'), (34, 832, 'G'), (35, 833, 'G'), (36, 834, 'G'), (37, 835, 'C'), (38, 836, 'C'), (39, 837, 'G'), (40, 838, 'C'), (41, 839, 'C')]
get the mapped reference postions
read1.get_reference_positions()
[811, 812, 813, 814, 815, 816, 817, 818, 819, 820, 821, 822, 823, 824, 825, 826, 827, 828, 829, 830, 831, 832, 833, 834, 835, 836, 837, 838, 839]

More information in http://pysam.readthedocs.io/en/latest/api.html

So after counting this read, the result may like this:

reference postion 811: ['A':0,'G':1,'C':0,'T':0] #because read1.query_sequence[13] is G
reference postion 812: ['A':0,'G':0,'C':1,'T':0] #because read1.query_sequence[14] is C
......
insert or deletion: from which pos to which pos
you can store with a list of tuple from start to end such as [(5,8),(34,40)]

If nobody has built the python packege to get the coverage values, you can try pysam.

score 0 · Answer 2 · 2018-07-27

0

Entering edit mode

6.3 years ago

christophersemail40 • 0

Can you post solution i am currently trying to export my data in a similar way, and am unfamiliar with the works of IGVtools

Thanks

ADD COMMENT • link 6.3 years ago by christophersemail40 • 0

1

Entering edit mode

BamDeal can calculate depth of four bases in every locus of reference genome. It is a full featured software for operating bam files. Other functions may facilitate your research. https://github.com/BGI-shenzhen/BamDeal

ADD REPLY • link 5.7 years ago by jiaobingke ▴ 30

0

Entering edit mode

yes. ./BamDeal-0.21/bin/BamDeal_Linux statistics BasesCount

./BamDeal-0.21/bin/BamDeal_Linux   statistics  BasesCount

        Usage: BasesCount  -List  <bam.list>  -OutPut  <outfix>

                -InList    <str>     Input Bam/Sam File List
                -InFile    <str>     Input Bam/Sam File File[repeat]
                -OutPut    <str>     OutPut File prefix

                -Bed       <str>     Stat Coverage,MeanDepth for bed Regions 
                -MinQ      <int>     Ignore too low mapQ read[10]

                -help                Show this help [hewm2008]

if for [-bed ] file coordinate is start form 1, and this coordinate system will be updated in the next version to make it consistent with samtools starting from 0.

if you want the mapQ==0 to count the depth, you can set the [-MinQ -1 ]

ADD REPLY • link 5.7 years ago by hewm2008 ▴ 50