Hi, I currently have R1 and R1 fastq files for my sample_2. These were already run through quality control (QC) and I wanted to compare the number of nucleotides before and after QC.
I have found a script online that allows me to count the total number of sequences and nucleotides for each fastq file (See below). My question is to get the total number of nucleotides for the sample (and not R1 or R2 alone) do I simply add the total # of nucleotides from R1 and R2 (10,438,428,166+10,398,217,322)? Or do I have to merge R1 and R2 then run zgrep to get the total count?
zgrep . sample2_QUALITY_PASSED_R1.00.0_0.cor.fastq.gz |
awk 'NR%4==2{c++; l+=length($0)}
END{
print "Number of reads: "c;
print "Number of bases in reads: "l
}'
Number of reads: 70,369,761
Number of bases in reads: 10,438,428,166
zgrep . sample2_QUALITY_PASSED_R2.00.0_0.cor.fastq.gz |
awk 'NR%4==2{c++; l+=length($0)}
END{
print "Number of reads: "c;
print "Number of bases in reads: "l
}'
Number of reads: 70,369,761
Number of bases in reads: 10,398,217,322
Thank you in advance for your time and help!
Use seqkit for saving time.
Results are something like these: