Entering edit mode
3.2 years ago
Fizzah
▴
30
hello there, I am working on RNA seq data and I am confused about read count in fastq file. Can anyone explain what does "Total read count" mean? is it mean we are counting everything in fastq file (everything written in 4 line read)? Also why its necessary to count total no. of reads? how it will going to help in analysis.
I use this command to find total read count in my fastq file: zcat 19213R-08-01_S16_L002_R1_001.fastq.gz | awk 'NR % 4 == 2 {print;}' | wc -c
Please confirm will this command give me total read count..if not then which command will give total read count. (I personally think it will give total no of sequences in fastq file)
yes, the command returns the
read count
; you could also use these commands to check total readsBy using this command : $ n=$(zcat file.fq.gz | wc -l) && echo $((n/4))
output I get is ...32144537
while using this command:
zcat 19213R-08-01_S16_L002_R1_001.fastq.gz | awk 'NR % 4 == 2 {print;}' | wc -c
output I get is :4885969624
Why both outputs are different? if the command returns total read count then output must be same
*.fq.gz
file for the two commands.wc -l
insteadwc -c
I used the same file and also use wc -c command but outputs are different
/mnt/e/fizza data/S017679/raw/raw$ n=$(zcat 19213R-08 01_S16_L002_R1_001.fastq.gz | wc -l) && echo $((n/4))
2977122658
fixu@DESKTOP-KJMSKGU:/mnt/e/fizza data/S017679/raw/raw$ zcat 19213R-08-01_S16_L002_R1_001.fastq.gz | awk 'NR % 4 == 2 {print;}' | wc -l
4885969624 if both commands are for read counts then why different output? Please guide
according
man wc
, the commandwc -c
provides the number of bytes counts.You need to count the number of lines by using
wc -l
as each read uses 4 linesThe command
n=$(zcat file.fq.gz | wc -l) && echo $((n/4))
provides you with the right answer. You have 32144537 reads in that fastq file in particular.