Hi biostars,
I have a long read fastq file. It has 57,523,865 read counts, and trying to subsample using seqtk, but it gives a zero read counts. Can someone help with this issue?
wc -l ALL1807_RW0588_051220_LiveGuppy.fastq
230095460 ALL1807_RW0588_051220_LiveGuppy.fastq
Here is the seqtk command line I used.
./seqtk sample ~/ALL1807_RW0588_051220_LiveGuppy.fastq 19473944 > ALL1807_RW0588_051220_LiveGuppy_sub.fastq
When I count the mean read length, it gives a decimal point. Does having a decimal point make sense? I have never seen a mean read length with a decimal point. Can this be the issue why seqtk subsample is not working?
awk '{if(NR%4==2) {count++; bases += length} } END{print bases/count}' ALL1807_RW0588_051220_LiveGuppy.fastq
869.649
Solved! Thanks