Determining read count
0
0
Entering edit mode
8.7 years ago

Hello everyone,

I have run my reads through trimmomatic. I would like to know the number of reads in my raw reads and then the number of reads that were left after quality trimming/filtering in trimmomatic. I have generated "trimlogs" but these seem to be of no help as I do not quite understand them. Does anybody know how (or the code/command i can use) to go about counting the number of raw reads and then counting the number of reads left after filtering?

Thanks! Nikelle

trimmomatic rna-seq trinity • 9.3k views
ADD COMMENT
4
Entering edit mode

Trimmomatic should be anything like that of the cutadpat where you are using a custom tool to remove the adapter sequences. When you received your raw reads you must have some Fastq.gz raw files with FastQC report files. You can check there for the number of reads for the raw sequences. Once you have the output of timmomatic you can run again the FASTQC , genomax2 is correct in saying that, however for pre/post fastq or fastq.gz you can always do this

cat file.fastq | echo $((`wc -l`/4))

if compressed

zcat file.fastq.gz | echo $((`wc -l`/4))
ADD REPLY
0
Entering edit mode

Thank you!

Is it correct if I run this command:

[npetrill@trogdor AMMA_transcripts]$ awk '{s++}END{print s/4}' R1V29_B1F9St_CGATGT_L001_R1_001.fastq
31118843

On a fastq file that looks like this?

@DF8F08P1:347:C5YT5ACXX:1:1101:1385:2060 1:N:0:CGATGT
CACCTCNTTTCAATCCTCCTGGACCAGGTTCGCCGGATACTCCTCTTGTTCCTGTGAAACCTCTTAATCCTGATGGGCCGCTCTTGCCTGAAATTCCAGA
+
CCCFFF#2AFHHHJJJJJJJJIGIJJJJIHIJJJJIDGHJJJJJJJJJJJIJHGHGFHGFFFFFE@DEECEDDDDA?BBDDDDDDDDDDD?CDDDCCDDD
@DF8F08P1:347:C5YT5ACXX:1:1101:1498:2065 1:N:0:CGATGT
AAAGAAAGTGGCCTTCAGTGTACGTCCCATGATCAGCTGGAACGGGTCGACGGGGATACCCCGGTCGTCTAATGACACGCCAACAATTGCACACGTTCCC
+
CCCFFFFFFFHHHJJJIJJJJJJJJJJJJJJIJGIIIHJJIJJJHIHEIDIJHHF>BECDABBD??B8@DBDDCC>ACBDDDDDDDDDDDDDDDBABDDD
@DF8F08P1:347:C5YT5ACXX:1:1101:1399:2123 1:N:0:CGATGT
TTGCACTTTGGCTAACCAACAAGCAGTATTGCCCAACGCCATCCCACAGAAAAACATTAAAAACACCCAACAATTATCAATCATGTCACCGTATGTCAAA
+
@@CDDDDFHHHDHGIJI@EFAFFGGGHIGIJJGEEGCFHHJDCAABHGE@FD3CFE@EE@;AAEED;BAC=;;>@>C:>;@CCCDDCCDC@DBACCDDDA
@DF8F08P1:347:C5YT5ACXX:1:1101:1609:2103 1:N:0:CGATGT
CTGCAATGTTCCATTCACAAGTATGGTTCATCCTTCGACAGCTCCCATTTTCGGGGAGGTACTCCTGGTAAGACGGGTTTTGTTTCAATAGGTGCAACCT
+
CCCFFFFFHHHHHJJJJJJJHCHIIJHIJJJIIIJJJJJIJIJIJJIJIJJIJJJJHIJ?AEHHFEFFBDCCCED>B,8<@BBDDCCCDCDDCCCCDCDD
@DF8F08P1:347:C5YT5ACXX:1:1101:1726:2231 1:N:0:CGATGT
TGCAAAACTACTTCCTCCTCCCATGCTGTAGGAGCCACCAGCGCCGTATCCAGCACCACCACCACCTCCAGCTCCAGCTCCAGCTCCACCTCCACCACCG
+

Will that command give me the correct amount of reads?

ADD REPLY
0
Entering edit mode

Looks like it did give you the number of reads: 31118843
Confirm with the command @vchris_ngs gave in the post above yours.

ADD REPLY
0
Entering edit mode

Great, that works. Thank you both!

ADD REPLY
0
Entering edit mode

do both of the command gives you same read output? 31118843

ADD REPLY
0
Entering edit mode

They do, thanks so so much!

ADD REPLY
3
Entering edit mode

If you have FastQC installed then run it on pre-/post- set of files.
There would be grep (and other ways of doing this) but we would need to know what machine identifier your read headers have to make it fool proof.
Or this: How to count fastq reads

ADD REPLY
0
Entering edit mode

Thanks genomax,

I have the fast X toolkit installed, could I use this to count reads?

ADD REPLY

Login before adding your answer.

Traffic: 2004 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6