Question

99.9999% of Q30 bases is normal?

0

Entering edit mode

8 months ago

Aki ▴ 20

I did fastp using published fastq files of single-end RNA seq data, and I got 99.9999% of Q20 bases and 99.9999% of Q30 bases. I have never got this score. I am a beginner in this informatics field, so I don't know if it is normal. Could you give me any suggestions?

Detecting adapter sequence for read1...
No adapter detected for read1

Read1 before filtering:
total reads: 47471798
total bases: 4747179800
Q20 bases: 4747174600(99.9999%)
Q30 bases: 4747174600(99.9999%)

Read1 after filtering:
total reads: 47471746
total bases: 4557287616
Q20 bases: 4557287616(100%)
Q30 bases: 4557287616(100%)

Filtering result:
reads passed filter: 47471746
reads failed due to low quality: 52
reads failed due to too many N: 0
reads failed due to too short: 0
reads with adapter trimmed: 0
bases trimmed due to adapters: 0

Duplication rate (may be overestimated since this is SE data): 60.5205%

JSON report: ./report/SRR23031659_fastp.json
HTML report: ./report/SRR23031659_fastp.html

fastp -i ./SRR23031659.fastq.gz -3 -o out_SRR23031659.fq.gz --html ./report/SRR23031659_fastp.html -j ./report/SRR23031659_fastp.json -q 15 -n 10 -t 1 -T 1 -l 20 
fastp v0.23.4, time used: 96 seconds

Thanks in advance.

RNA-seq fastp • 1.1k views

ADD COMMENT • link updated 8 months ago by LauferVA 4.5k • written 8 months ago by Aki ▴ 20

0

Entering edit mode

If the Q20 score is greater than 20, it will indicate higher probability of being correct. Similarly if Q30 score is also greater than 30, it will represent exceptional confidence accuracy. Please check out this blog.

ADD REPLY • link 8 months ago by bk11 ★ 3.0k

0

Entering edit mode

Is this from a AVITI sequencer? They do have quite high quality scores.

ADD REPLY • link 8 months ago by jkim ▴ 190

0

Entering edit mode

Thanks jkim. They seem to use MGISEQ-2000RS (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM6925047). Do you have any information on this model?

ADD REPLY • link 8 months ago by Aki ▴ 20

0

Entering edit mode

I have no idea. Good luck!

ADD REPLY • link 8 months ago by jkim ▴ 190

0

Entering edit mode

Thank you!

ADD REPLY • link 8 months ago by Aki ▴ 20

0

Entering edit mode

Some companies may change the Q value to some fixed values to save storage, do you know did they do something like that? This is my guess.

ADD REPLY • link 8 months ago by MatthewP ★ 1.4k

0

Entering edit mode

for sure illumina does this. they just upped it from 22 to 25 for certain calls etc. they base it on aggregated data then update the priors

ADD REPLY • link 8 months ago by LauferVA 4.5k

0

Entering edit mode

the bottom line is if compression is a concern then they will lump together things in the 20s as like 22 or 25 or whatever the closest fit is, that kind of thing.

regarding 3rd gen, nanopore too reports estimated quality scores in place of empiric in certain cases (though recently comparison has justified the estimates) which implies similar practices though i can comment specifically on most recent practices (changing fast). dont know enough about pacbio to say