Entering edit mode
5.4 years ago
t-jim
▴
30
Hello,
I'm trying to parse a fastq file with biojava and I need to get the quality score for every base of each sequence. So far I got this:
FastqReader fastqReader = new SangerFastqReader();
List<DNASequence> sequences = new LinkedList<DNASequence>();
File in = new File("fastqfile.fastq");
fastqReader.read(in);
for (Fastq fastq : fastqReader.read(in)) {
DNASequence test = FastqTools.createDNASequenceWithQualityScores(fastq);
sequences.add(test);
}
for(DNASequence seq : sequences) {
String sequence = seq.getSequenceAsString();
/*get score sequence*/
}
I looked through the API and I know that the score is stored as a QualityFeature in the DNASequence but I can't figure out how to get it. I would appreciate your help.
I have already tried that. This gives me the score in ASCII characters but it want them as numbers. I use the createDNASequenceWithQualityScores() methode because it returns a DNASequence object, converts the ASCII score into numbers and stores it in the object. I just need to figure out how to access the score.
See edit... Basically, convert ASCII to decimal and from there to quality score using the appropriate offset. Here I use -33 to produce Sanger scores. I don't think it is possible to always decide unambiguously, i.e. automatically, what offset should be used although after reading a few sequencing one should be able to tell what encoding has been used.