Hi all,
I'm trying to write some code to generate mean read length data from a range of fastq files. awk '{if(NR%4==2) print NR"\t"$0"\t"length($0)}' HG1.fastq > readLength.txt
i've got as far as here from looking through other posts and trying to improve but i'm stuck on a couple of things. This command only works on a single file and will report the length of each read within that file separately.
I want to run a single command so the mean and Standard Dev of read lengths from all .fastq files within a folder are reported in a single .txt file, one sample per line. I gues SD might be difficult to calculate in a command so even just the mean read length.
e.g.the first 5 files in my folder are: ru1.fastq ru2.fastq hg3.fastq hg25.fastq ru7.fastq
obviously i'm a bit of a novice at this so all help would be appreciated !!
thanks a lot
Hello I'm so grateful for your help. I tried running your pipeline on a metagenomic file in the fastq format, and I got a very high stddev. Can you help me with this? total 84855052 avg=160.729787 stddev=2220.571952