Question

Counting occurrence of character in fastq file

0

Entering edit mode

7.9 years ago

fiona.newberry ▴ 80

I want to count the number of times 1.1 appears within my fastq file. It should only appear once every 4 lines (on the first line). I have been using:

grep -o '1.1' ./seqtk_1/subsample_1/new_sub_NC_001539_1.fq.gz |wc -l

This telling me it occurs 1046902 times, which is 46902 more times than I expected.

It appears to be including these characters in its count: 101, 111, 121, 131, 141, 151, 161, 171, 181, 191, 1/1

How do I search the file for specifically 1.1? Or searching just the first time each set of 4?

I have tried using -v on grep

Thanks

grep • 2.6k views

ADD COMMENT • link updated 7.9 years ago by Devon Ryan 105k • written 7.9 years ago by fiona.newberry ▴ 80

0

Entering edit mode

I have used this code:

awk 'NR%4==1' ./seqtk_1/subsample_1/new_sub_NC_001539_1.fq.gz -exec grep -o "1.1" {} \; | wc -l

This returns the number I wanted. But it also says:

awk: (FILENAME=./seqtk_1/subsample_1/new_sub_NC_001539_1.fq.gz FNR=4000000) fatal: cannot open file `-exec' for reading (No such file or directory)

Is it doing what I want it to do?

ADD REPLY • link 7.9 years ago by fiona.newberry ▴ 80

1

Entering edit mode

No: awk 'NR%4==1' ./seqtk_1/subsample_1/new_sub_NC_001539_1.fq.gz | grep -c '1\.1'

ADD REPLY • link 7.9 years ago by Devon Ryan 105k

0

Entering edit mode

try -F instead of -o

grep -Fc 1.1  ./seqtk_1/subsample_1/new_sub_NC_001539_1.fq.gz

ADD REPLY • link 7.9 years ago by cpad0112 21k

score 4 · Accepted Answer · 2017-08-11

4

Entering edit mode

7.9 years ago

Devon Ryan 105k

zgrep -c '1\.1' ./seqtk_1/subsample_1/new_sub_NC_001539_1.fq.gz

In a regular expression . is "any character, thus the need to escape it.

Note however that 1.1 is valid in a quality score for any recent fastq file, so you shouldn't be surprised if it appears more than once in every entry.