Is there a way to access the data stored in a .ab1 file ?
3
0
Entering edit mode
10.0 years ago
anuragm ▴ 130

I wish to use information in a .ab1 file and find the ratio of the peak value of nucleotide with the strongest signal at one position to the peak value of the nucleotide with the second strongest signal, so as to identify the good sites that I can use for analysis. Is there a way to do this?

sequence • 8.4k views
ADD COMMENT
7
Entering edit mode
10.0 years ago
Malcolm.Cook ★ 1.5k

To understand what the data looks like in AB1 file, you will want to refer to http://www6.appliedbiosystems.com/support/software_community/ABIF_File_Format.pdf

If you are comfortable in R, you might reach for http://bioconductor.org/packages/release/bioc/html/sangerseqR.html and work with data from calling peakAmpMatrix (or traceMatrix)

Or, in perl, then, http://search.cpan.org/~vita/Bio-Trace-ABIF-1.05/lib/Bio/Trace/ABIF.pm

However, the specifics of what you want to compute are not entirely clear to me.

Making reference to the vignette http://bioconductor.org/packages/release/bioc/vignettes/sangerseqR/inst/doc/sangerseq_walkthrough.pdf, given these definitions

P1AM.1 Amplitude of primary basecall peaks.
P2AM.1 (optional) Amplitude of the secondary basecall peaks.

Then "the ratio of the peak value of nucleotide with the strongest signal at one position to the peak value of the nucleotide with the second strongest signal" is simply P1AM.1 / P2AM.1, which can be computed for demo AB1 file as follows:

> x <- read.abif(system.file("extdata", "heterozygous.ab1", package = "sangerseqR"))
> x@data$P1AM.1 / x@data$P2AM.1

Note however that this is the value is ratio of peaks. Not area under the peak. It is usually greater than one but not always. You may want to reconsider exactly what you are trying to determine. Good luck.

ADD COMMENT
0
Entering edit mode

Could I get the values for the 4 different bases in form of some parameter at each position ?

ADD REPLY
0
Entering edit mode

I modified my answer to more fully answer your original question. This comment poses a new related but different question. If you get to the point where you think my answer addresses your original question, you will probably have learned enough to answer this new one. Give it a go!

ADD REPLY
0
Entering edit mode
x <- read.abif(system.file("extdata", "heterozygous.ab1", package = "sangerseqR"))

I need to import files from an external drive for which I modified it to

x <- read.abif(system.file("extdata", file.choose() , package = "sangerseqR"))

But I am getting an error message.

Error in readBin(fc, what = "raw", n = 1.2 * file.info(filename)$size) :
  can only read from a binary connection
In addition: Warning message:
In file(filename, open = "rb") :
  file("") only supports open = "w+" and open = "w+b": using the former

Any idea how to correct this ? What else do I need to modify in the code?

ADD REPLY
0
Entering edit mode

you are getting quite far afield from you original question now...... try:

> x <- read.abif(file.choose())
ADD REPLY
0
Entering edit mode

I managed to import the file using the command you suggested, but I am unable to access its contents (=different parameters of the ab1 file such as P1AM and P2AM like the ones given here using the $ operator. I guess it doesn't work for this class.

ADD REPLY
0
Entering edit mode

I managed importing different parameters using x@data$ but when I use x@data$P1AM.1 it gives output 'NULL'. Any reason behind this? Am I using the wrong code?

x <- read.abif(file.choose())
x@data$P1AM.1
NULL
ADD REPLY
0
Entering edit mode

I'm guessing: perhaps the basecaller was not run? Where did you get the file? Have you looked at it in a chromatgram viewer? If not, you should, perhaps with the free FinchTV.

ADD REPLY
0
Entering edit mode

I am using the files that I got from 1stBase sequencing. I have looked at the chromatograms in Geneious and I see signals for nucleotides.

ADD REPLY
0
Entering edit mode

I know this is a fairly old post, but I will try and ask it here anyways

I am looking to do some fairly specific QC by looking into the signal information of an abi file. I have the data read into R using sangerseqR and can access the S4 data just fine. My question is: What does it mean? Specifically, the traceMatrix and DATA.9-DATA.12. The trace matrix is much larger than my sequence, but is the same length as the DATA fields. Is there a standard window for interpreting these values?

ADD REPLY
1
Entering edit mode
10.0 years ago

using the io_lib in the staden package

ADD COMMENT
0
Entering edit mode

I just downloaded Staden and I am a bit clueless about how to proceed. Could you provide a few pointers ? Thanks.

ADD REPLY
1
Entering edit mode
10.0 years ago
m.koohi.m ▴ 120

You can use the org.biojava.bio.program.abi package to parse and access to the AB1 data.

http://www.biojava.org/docs/api14/org/biojava/bio/program/abi/package-summary.html

I wrote a program in Java to modify the peaks and also eliminate some noise from the data, you can find some parts of the code bellow. With using this library you can play with AB1 data whenever you want!

public PeakTracer(File file) {
        try {
            abiTrace = new ABITrace(file);
            abifChromatogram = ABIFChromatogram.create(file);
            abifParser = new ABIFParser(file);
        } catch (IOException e) {
            e.printStackTrace();
        } catch (UnsupportedChromatogramFormatException e) {
            e.printStackTrace();
        }

As you can see in the above codes, with abiTrace and abiParser Objects you can conduct different types of processing on your data.

ADD COMMENT

Login before adding your answer.

Traffic: 1249 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6