what does "total read count" means in fastqc file. how does it helpful for analysis
1
0
Entering edit mode
3.1 years ago
Fizzah ▴ 30

hello there, I am working on RNA seq data and I am confused about read count in fastq file. Can anyone explain what does "Total read count" mean? is it mean we are counting everything in fastq file (everything written in 4 line read)? Also why its necessary to count total no. of reads? how it will going to help in analysis.

fastqc • 2.2k views
ADD COMMENT
0
Entering edit mode
3.1 years ago

Total read count is the number of reads contained in the analyzed fastq file. Each read takes 4 lines

You need this data to get an idea about the coverage you got in the sequencing

ADD COMMENT
0
Entering edit mode

I use this command to find total read count in my fastq file: zcat 19213R-08-01_S16_L002_R1_001.fastq.gz | awk 'NR % 4 == 2 {print;}' | wc -c

Please confirm will this command give me total read count..if not then which command will give total read count. (I personally think it will give total no of sequences in fastq file)

ADD REPLY
0
Entering edit mode

yes, the command returns the read count; you could also use these commands to check total reads

$ n=$(zcat file.fq.gz | wc -l) && echo $((n/4)) 
$ seqkit stat file.fq.gz
$ fastqc file.fq.gz # find the html file: file_fastqc.html
ADD REPLY
0
Entering edit mode

By using this command : $ n=$(zcat file.fq.gz | wc -l) && echo $((n/4))

output I get is ...32144537

while using this command:

zcat 19213R-08-01_S16_L002_R1_001.fastq.gz | awk 'NR % 4 == 2 {print;}' | wc -c

output I get is :4885969624

Why both outputs are different? if the command returns total read count then output must be same

ADD REPLY
0
Entering edit mode
  1. you should use the same *.fq.gz file for the two commands.
  2. the second command, use wc -l instead wc -c
ADD REPLY
0
Entering edit mode

I used the same file and also use wc -c command but outputs are different

/mnt/e/fizza data/S017679/raw/raw$ n=$(zcat 19213R-08 01_S16_L002_R1_001.fastq.gz | wc -l) && echo $((n/4))

2977122658

fixu@DESKTOP-KJMSKGU:/mnt/e/fizza data/S017679/raw/raw$ zcat 19213R-08-01_S16_L002_R1_001.fastq.gz | awk 'NR % 4 == 2 {print;}' | wc -l

4885969624 if both commands are for read counts then why different output? Please guide

ADD REPLY
0
Entering edit mode

according man wc, the command wc -c provides the number of bytes counts.

You need to count the number of lines by using wc -l as each read uses 4 lines

The command n=$(zcat file.fq.gz | wc -l) && echo $((n/4)) provides you with the right answer. You have 32144537 reads in that fastq file in particular.

ADD REPLY

Login before adding your answer.

Traffic: 2676 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6