Question

Derive Chromatogram From Raw Fluorescence Data

1

Entering edit mode

13.3 years ago

drew.matteson ▴ 40

I'm trying to understand how raw data in an .abi file is transformed into the chromatogram I'm used to seeing. I've exported an abi file with bioedit that has the following snippet:

    Seq        Traces                    Sequence    Peaks                Raw Data
            G    A    T    C                                G       A      T       C
...
2216    .    0    286    92    359                        2216    1722    60    362     630
2217    .    0    427    84    352                        2217    2415    54    372     628
2218    .    0    555    73    337                        2218    3263    89    397      623
2219    A    0    639    61    309                        2219    4164    75    463      637
2220    .    0    659    48    266                        2220    5040    81    546     679
2221    .    0    611    36    213                        2221    5895    104    666     715
2222    .    0    508    25    158                        2222    6343    114    858     752
2223    .    0    372    17    108                        2223    6578    70    1184    756
2224    .    0    238    10    67                        2224    6298    142    1570    791
2225    .    2    131    5    37                        2225    5790    100    2094    792
2226    .    20    56    2    18                        2226    5002    89    2656    795
2227    .    58    13    0    7                        2227    4108    92    3230    798
2228    .    117    0    0    2                        2228    3116    95    3764    779
2229    .    193    0    0    0                        2229    2281    82    4125    751
2230    .    281    0    0    0                        2230    1640    73    4246    692
2231    .    355    0    0    0                        2231    1081    69    4137    676
2232    G    398    0    0    0                        2232    697        72    3747    608
2233    .    400    0    0    0                        2233    504        59    3222    557
2234    .    358    0    0    0                        2234    351        49    2622    523

In the right four columns is the raw data as titled. Using other software tools with abi files (biopython, abifpy), I can get at this data. The data on the left has been processed in some way. It's the points through which a typical chromatogram is drawn and bases are called. I'd like to understand how the processed data was generated, but can't seem to find an explanation online.

Any help would be appreciated.

--Andrew

biopython • 4.3k views

ADD COMMENT • link updated 9.4 years ago by kapil.joshi036 ▴ 80 • written 13.3 years ago by drew.matteson ▴ 40

score 3 · Answer 1 · 2012-08-16

After a day of research I found a workaround. The data shown in the far right columns are contained in a directory inside the abi file as data channels 1-4. The peak data on the left is stored in channels 9-12. You can read more about the abi specification here.

I was attempting to use the biopython library to access this. I couldn't get that to work, so used a library that is supposed to be included in biopython, abifpy. That library can be used to access the trace data (the data on the left) like so:

import abifpy
test = abifpy.Trace(filename)
trace_data = test.tags['DATA9'].tag_data
print trace_data # prints the trace values associated with the first base in the abi file, usually G for my data

Hope this helps someone else.

--Andrew

score 0 · Answer 2 · 2016-06-17

thanks for the help even you can able to extract the data by using sangerseqR package in R where use

library(sangerseqR)
    hetsangerseq <- readsangerseq(system.file("extdata", "heterozygous.ab1", package = "sangerseqR")

    hetcalls = makeBasecalls(hetsangerseq, ratio= 0.2)
    peakAmpMatrix(hetcalls)
     where column 1,2,3,4 corresponding to A,C,G,T