Fastq files with only " ! " score
1
0
Entering edit mode
4.3 years ago
pablo ▴ 310

Hello,

I got reads from PacBio sequencing. The reads are in BAM format. I converted them into FASTQ format.

I both used the bam2fastq and samtools fastq tools to do that.

The problem I have is that I got the " ! " score for all bases of the sequences with both tools, which means the bases are all wrong. What is not good because the phred score I got with FASTQC is really good (and reads obtained with the PacBio tech are usually always good)

Any idea?

Bests

fastq samtools • 3.1k views
ADD COMMENT
1
Entering edit mode

some more info which I recently found online:

Please note that raw data quality scores are the same for all bases of the Sequel raw data (PHRED 0 — ASCII !). PacBio came to the conclusion that computing the quality scores for the raw data was a waste of time. Apparently the quality scores for the raw data cannot be reliably computed (and consequently these were also ignored for RSII data pipelines). However, usable PacBio quality scores can be generated from consensus data if the project allows (either by CCS or other secondary analysis algorithms: e.g. by alignments all-vs-all). In short the determination of the quality of individual reads is up the downstream analysis pipeline (e.g. the assembler).

ADD REPLY
0
Entering edit mode

base scores in pacbio fastq files have no to very little meaning due to the specifics of the pacbio technique (at least that's how I remember from older datasets, perhaps it changed for more recent datasets), so don't worry too much about this I would say.

If you use the conversion tools of pacbio smrt package, I think you can even say what you want to scores to be

Just use the data as it is without taking the scores into account.

ADD REPLY
1
Entering edit mode
4.3 years ago
GenoMax 147k

I got reads from PacBio sequencing. The reads are in BAM format.

Then use PacBio's utility bam2fastx to do the conversion.

The problem I have is that I got the " ! " score for all bases of the sequences with both tools, which means the bases are all wrong.

There is a thread on SeqAnswers about your observation. I will deep link one post. You can read the entire thread. Is your data older?

ADD COMMENT
0
Entering edit mode

Actually, I used the bam2fastq tool, but as I said, I also got "!" score for each base.

My data are pretty new.

ADD REPLY
0
Entering edit mode

I guess the ! has not been replaced with meaningful values as stated by user rhall (who works for PacBio) in another post in the thread I had linked above from SeqAnswers.

I would follow @lieven's advice above or replace ! with something else using reformat.sh from BBMap suite.

ADD REPLY

Login before adding your answer.

Traffic: 1759 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6