Hi there,
I would like to know the CG content of some fastq file. I have this script:
This gives character count for all characters except As, Ts and Ns (and new lines):
cat file.fasta | grep -v ">" | tr -d aAtTnN"\n" | wc -c
24642235
This gives character count for all letters except Ns:
cat file.fasta | grep -v ">" | tr -d nN"\n" | wc -c
49100855
Mean overall GC content is therefore = 24642235 / 49100855 = 50%
But it is for fasta file exclusively.
how can I modified for .fastq files?
Thank you
Hi, In case you don't know it already, I can recommend the tool FASTQC to you, which gives you the distribution of GC contents in your reads, distribution of read lengths, average Phred score per base and much more!
Edit: In case you are looking at multiple FASTQ files at once, give the tool multiQC a try, it summarized FASTQC reports.
thanks! I had run FASTQC in my files before.
I just wanted to know a way of counting the % by using commands in bash :)
Ok, that's what I thought :) I'll move my answer to the comment section
Simply convert fq to fa. Plenty of posts both on Biostars and the web on that available.
Yes. I have been looking for this too but I only got answer for phyton but nothing for bash :(