Understanding the textual output file of FastQC - Per Base N Content
1
0
Entering edit mode
4.3 years ago

Hi,

I know that the fastqc_report.html generated by FastQC shows the Module of Per Base N Content with percents, but the textual output file generated as fastqc_data.txt shows the data of the same module as follows:

Per base N content  pass
#Base   N-Count
1   0.005880142888664921
2   7.156360513588139E-5
3   1.19272675226469E-5
4   0.0039121437474281835
5   0.009816141171138397
6   0.011378613216605141
7   0.02527387988048878
8   0.025476643428373778
9   0.024379334816290264
10  0.03693874751763745

Does the N-Count column in the textual data file represent percents or does it represent counts of N in that base's reads?

fastqc FastQC • 1.0k views
ADD COMMENT
0
Entering edit mode

Well, as counts should be integer and that fastqc outputed float numbers I would say it's a percentage

It does not make sense to have 0.1 N counts on a given base.

ADD REPLY
0
Entering edit mode
4.3 years ago

Hi,

It is the percentage of N ambiguous bases found a each bp position.

You can test it by creating some fake fastq data:

@fake_1
ATGCAGTCGATGTGCTANNN
+
ATGCAGTCGATGTGCTA!!!
@fake_2
ATGCAGTCGATGTGCTNNNN
+
ATGCAGTCGATGTGCT!!!!
@fake_3
ATGCAGTCGATGTGCTATGT
+
ATGCAGTCGATGTGCTATGC
@fake_4
NNGCAGTCGATGTGCTATGC
+
!!GCAGTCGATGTGCTATGT

The result is this:

>>Per base N content    fail
#Base   N-Count
1   25.0
2   25.0
3   0.0
4   0.0
5   0.0
6   0.0
7   0.0
8   0.0
9   0.0
10  0.0
11  0.0
12  0.0
13  0.0
14  0.0
15  0.0
16  0.0
17  25.0
18  50.0
19  50.0
20  50.0

António

ADD COMMENT

Login before adding your answer.

Traffic: 2364 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6