Question

What do the fields DATA.9 - DATA.12 represent in an ab1 file ?

1

Entering edit mode

10.6 years ago

anuragm ▴ 130

I am trying to better understand the information content of a .ab1 file. I wanted to know what the four fields DATA.9 to DATA.12 mean. The R vignette for SangerseqR package explains them as 'Vectors containing signal intensities for each channel' whereas the Applied Biosystems document for .ab1 files calls them "Short Array holding analyzed color data".

Also, the ab1 files that I am currently using do not have the fields for amplitude of primary and secondary base signals P1AM.1 and P1AM.2, respectively (Checked it using R). So I was wondering how the chromatogram is built when I open it in a viewer (Geneious or Finch)

ab1 sequencing chromatogram • 5.6k views

ADD COMMENT • link updated 10.6 years ago by Dan D 7.4k • written 10.6 years ago by anuragm ▴ 130

score 0 · Answer 1 · 2014-12-12

0

Entering edit mode

10.6 years ago

Dan D 7.4k

Data fields 1 through 4 represent the raw data from each of the four color channels. Each of the four color channels represents a nucleotide letter. Fields 9 through 12 correspond to fields 1 through 4, but have a signal correction applied. These are the fields which are the primary data source for basecalling. They're also used to build the histogram you see in your viewer. The higher the value at an array index, the taller the peak for the associated color:

ADD COMMENT • link 10.6 years ago by Dan D 7.4k

0

Entering edit mode

The plot you have attached, is it a plot of the values returned when you try accessing the DATA.9/10/11/12 field ? I am getting close to 20,000 data points for a sequence 100bp long, when I use R to extract any of DATA.9 to DATA.12, so I am not very sure which data points in the file correspond to which nucleotide.