How to get summary statistics out of bbduk.sh ?
1
0
Entering edit mode
3.1 years ago

I am using bbduk to quality trim my raw sequencing reads but would like to get more summary statistics out of it than the default. I can see many options for pre-made histograms of the statistics I would like but not to get the raw numbers as I would rather the generate the plots myself! Is there a way to do this that I am missing?

quallity sequencing control • 1.4k views
ADD COMMENT
1
Entering edit mode
3.1 years ago
GenoMax 147k

Capture the stderr stream from your bbduk jobs. You should get the following

Input:                          12202091 reads          610104550 bases.
KTrimmed:                       9227056 reads (75.62%)  254394193 bases (41.70%)
Total Removed:                  1328490 reads (10.89%)  254394193 bases (41.70%)
Result:                         10873601 reads (89.11%)         355710357 bases (58.30%)
ADD COMMENT
0
Entering edit mode

I am capturing the stderr which does give some nice stats! But was wanting more detailed stats like those indicated in the help:

Histogram output parameters:
bhist=<file>        Base composition histogram by position.
qhist=<file>        Quality histogram by position.
qchist=<file>       Count of bases with each quality value.
aqhist=<file>       Histogram of average read quality.
bqhist=<file>       Quality histogram designed for box plots.
lhist=<file>        Read length histogram.
phist=<file>        Polymer length histogram.
gchist=<file>       Read GC content histogram.
enthist=<file>      Read entropy histogram.
ihist=<file>        Insert size histogram, for paired reads in mapped sam.
gcbins=100          Number gchist bins.  Set to 'auto' to use read length.
maxhistlen=6000     Set an upper bound for histogram lengths; higher uses
                    more memory.  The default is 6000 for some histograms
                    and 80000 for others.
ADD REPLY
0
Entering edit mode

That provide a file name for whichever plot/stat you want on the command line. e.g. bhist=myhist

ADD REPLY
0
Entering edit mode

Ohhhh they are text based histograms okay! Sorry I assumed they were pre-compiled plots.

ADD REPLY

Login before adding your answer.

Traffic: 2588 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6