Finding length of multiple fastq.gz files
1
0
Entering edit mode
3.5 years ago
brisbio ▴ 30

Hi,

I need to determine the length of each of my fastq files in a directory which I know you can do by using find . | xargs wc -l. However I need them to be decompressed before the length can be determined. I know you use the zcat command to do this. What I am struggling with is how to combine the two together to get the length of each file in my directory after decompression in one line of code.

zcat Fastq • 794 views
ADD COMMENT
0
Entering edit mode

For total number of lines for each fastq.gz:

$ find . -type f -name "*.gz" -exec zgrep -c "$" {} \;

Since it is fastq, you may want to print number of reads, instead of lines.

$ find . -type f -name "*.gz" -exec zgrep -c "^@" {} \;

assuming that first letter of the quality line in any of the read in the file is not @

ADD REPLY
1
Entering edit mode
3.5 years ago
4galaxy77 2.9k

Done in parallel because I don't know xargs but I think it works the same.

find . | parallel -I{} "zcat {} | wc -l"

ADD COMMENT
1
Entering edit mode

That’s worked great, thank you!

ADD REPLY
0
Entering edit mode

If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one if they work.
upvote_bookmark_accept

ADD REPLY

Login before adding your answer.

Traffic: 1983 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6