Hi all,
I am working in data mining of NCBI transcriptome data. To control quality I am using bbduk with the following command:
bbduk.sh in1=$l.fastq out=$l\_SR.fastq ref=adapters qtrim=lr \
trimq=10 overwrite=true ktrim=r qskip=4 ways=$NSLOTS ftm=5 \
maq=10 minlen=20 trimpolya=10 trimpolyg=10 trimpolyc=10
I tried to modify the values of several parameters but it always removes 100% of most of the reads of the fastq files. Please if you have any suggestions to improve this command let me know. Many thanks!
This is an example of part of the output:
Filtered by header: 0 reads (0.00%) 0 bases (0.00%)
Low quality discards: 0 reads (0.00%) 0 bases (0.00%)
Total Removed: 2041 reads (0.02%) 51814939 bases (8.20%)
Filtered by header: 0 reads (0.00%) 0 bases (0.00%)
Low quality discards: 0 reads (0.00%) 0 bases (0.00%)
Total Removed: 2214 reads (0.02%) 51939725 bases (8.21%)
Filtered by header: 54054137 reads (100.00%) 2702706850 bases (100.00%)
Low quality discards: 54054137 reads (100.00%) 2702706850 bases (100.00%)
Total Removed: 54054137 reads (100.00%) 2702706850 bases (100.00%)
Filtered by header: 21331917 reads (100.00%) 1066595850 bases (100.00%)
Low quality discards: 21331917 reads (100.00%) 1066595850 bases (100.00%)
Total Removed: 21331917 reads (100.00%) 1066595850 bases (100.00%)
Filtered by header: 22840621 reads (100.00%) 1142031050 bases (100.00%)
Low quality discards: 22840621 reads (100.00%) 1142031050 bases (100.00%)
Total Removed: 22840621 reads (100.00%) 1142031050 bases (100.00%)
Filtered by header: 24084611 reads (100.00%) 1204230550 bases (100.00%)
Low quality discards: 24084611 reads (100.00%) 1204230550 bases (100.00%)
Total Removed: 24084611 reads (100.00%) 1204230550 bases (100.00%)
Filtered by header: 27595642 reads (100.00%) 1379782100 bases (100.00%)
Low quality discards: 27595642 reads (100.00%) 1379782100 bases (100.00%)
Total Removed: 27595642 reads (100.00%) 1379782100 bases (100.00%)
Filtered by header: 7218527 reads (100.00%) 353707823 bases (100.00%)
Low quality discards: 7218527 reads (100.00%) 353707823 bases (100.00%)
Total Removed: 7218527 reads (100.00%) 353707823 bases (100.00%)
Filtered by header: 7059269 reads (100.00%) 345904181 bases (100.00%)
Low quality discards: 7059269 reads (100.00%) 345904181 bases (100.00%)
Total Removed: 7059269 reads (100.00%) 345904181 bases (100.00%)
Filtered by header: 7607918 reads (100.00%) 372787982 bases (100.00%)
Low quality discards: 7607918 reads (100.00%) 372787982 bases (100.00%)
Total Removed: 7607918 reads (100.00%) 372787982 bases (100.00%)
Filtered by header: 7262556 reads (100.00%) 355865244 bases (100.00%)
Low quality discards: 7262556 reads (100.00%) 355865244 bases (100.00%)
Total Removed: 7262556 reads (100.00%) 355865244 bases (100.00%)
Filtered by header: 7371616 reads (100.00%) 361209184 bases (100.00%)
Low quality discards: 7371616 reads (100.00%) 361209184 bases (100.00%)
Total Removed: 7371616 reads (100.00%) 361209184 bases (100.00%)
Filtered by header: 7270371 reads (100.00%) 356248179 bases (100.00%)
Low quality discards: 7270371 reads (100.00%) 356248179 bases (100.00%)
Total Removed: 7270371 reads (100.00%) 356248179 bases (100.00%)
Filtered by header: 7007499 reads (100.00%) 343367451 bases (100.00%)
Low quality discards: 7007499 reads (100.00%) 343367451 bases (100.00%)
Total Removed: 7007499 reads (100.00%) 343367451 bases (100.00%)
Filtered by header: 7447287 reads (100.00%) 364917063 bases (100.00%)
Low quality discards: 7447287 reads (100.00%) 364917063 bases (100.00%)
Total Removed: 7447287 reads (100.00%) 364917063 bases (100.00%)
Filtered by header: 7322620 reads (100.00%) 358808380 bases (100.00%)
Low quality discards: 7322620 reads (100.00%) 358808380 bases (100.00%)
Total Removed: 7322620 reads (100.00%) 358808380 bases (100.00%)
Filtered by header: 7218751 reads (100.00%) 353718799 bases (100.00%)
Low quality discards: 7218751 reads (100.00%) 353718799 bases (100.00%)
Total Removed: 7218751 reads (100.00%) 353718799 bases (100.00%)
Filtered by header: 16728309 reads (100.00%) 836415450 bases (100.00%)
Low quality discards: 16728309 reads (100.00%) 836415450 bases (100.00%)
Total Removed: 16728309 reads (100.00%) 836415450 bases (100.00%)
Filtered by header: 17878193 reads (100.00%) 893909650 bases (100.00%)
Low quality discards: 17878193 reads (100.00%) 893909650 bases (100.00%)
Total Removed: 17878193 reads (100.00%) 893909650 bases (100.00%)
Filtered by header: 20845499 reads (100.00%) 1042274950 bases (100.00%)
Low quality discards: 20845499 reads (100.00%) 1042274950 bases (100.00%)
Total Removed: 20845499 reads (100.00%) 1042274950 bases (100.00%)
Filtered by header: 15829049 reads (100.00%) 791452450 bases (100.00%)
Low quality discards: 15829049 reads (100.00%) 791452450 bases (100.00%)
Total Removed: 15829049 reads (100.00%) 791452450 bases (100.00%)
Filtered by header: 17183836 reads (100.00%) 859191800 bases (100.00%)
Low quality discards: 17183836 reads (100.00%) 859191800 bases (100.00%)
Total Removed: 17183836 reads (100.00%) 859191800 bases (100.00%)
Filtered by header: 19239225 reads (100.00%) 961961250 bases (100.00%)
Low quality discards: 19239225 reads (100.00%) 961961250 bases (100.00%)
Total Removed: 19239225 reads (100.00%) 961961250 bases (100.00%)
Filtered by header: 16969394 reads (100.00%) 848469700 bases (100.00%)
Low quality discards: 16969394 reads (100.00%) 848469700 bases (100.00%)
Total Removed: 16969394 reads (100.00%) 848469700 bases (100.00%)
Filtered by header: 17612386 reads (100.00%) 880619300 bases (100.00%)
Low quality discards: 17612386 reads (100.00%) 880619300 bases (100.00%)
Total Removed: 17612386 reads (100.00%) 880619300 bases (100.00%)
Filtered by header: 15393894 reads (100.00%) 769694700 bases (100.00%)
Low quality discards: 15393894 reads (100.00%) 769694700 bases (100.00%)
Total Removed: 15393894 reads (100.00%) 769694700 bases (100.00%)
Filtered by header: 16096774 reads (100.00%) 804838700 bases (100.00%)
Low quality discards: 16096774 reads (100.00%) 804838700 bases (100.00%)
Total Removed: 16096774 reads (100.00%) 804838700 bases (100.00%)
Filtered by header: 4819322 reads (100.00%) 221688812 bases (100.00%)
Low quality discards: 4819322 reads (100.00%) 221688812 bases (100.00%)
Total Removed: 4819322 reads (100.00%) 221688812 bases (100.00%)
Filtered by header: 3969200 reads (100.00%) 182583200 bases (100.00%)
Low quality discards: 3969200 reads (100.00%) 182583200 bases (100.00%)
Total Removed: 3969200 reads (100.00%) 182583200 bases (100.00%)
Filtered by header: 6211484 reads (100.00%) 304362716 bases (100.00%)
Low quality discards: 6211484 reads (100.00%) 304362716 bases (100.00%)
Total Removed: 6211484 reads (100.00%) 304362716 bases (100.00%)
Filtered by header: 5898028 reads (100.00%) 289003372 bases (100.00%)
Low quality discards: 5898028 reads (100.00%) 289003372 bases (100.00%)
Total Removed: 5898028 reads (100.00%) 289003372 bases (100.00%)
Filtered by header: 5870395 reads (100.00%) 287649355 bases (100.00%)
Low quality discards: 5870395 reads (100.00%) 287649355 bases (100.00%)
Total Removed: 5870395 reads (100.00%) 287649355 bases (100.00%)
Filtered by header: 6088303 reads (100.00%) 298326847 bases (100.00%)
Low quality discards: 6088303 reads (100.00%) 298326847 bases (100.00%)
Total Removed: 6088303 reads (100.00%) 298326847 bases (100.00%)
Filtered by header: 0 reads (0.00%) 0 bases (0.00%)
Low quality discards: 0 reads (0.00%) 0 bases (0.00%)
Total Removed: 1332119 reads (7.12%) 55294533 bases (8.44%)
Filtered by header: 0 reads (0.00%) 0 bases (0.00%)
Low quality discards: 0 reads (0.00%) 0 bases (0.00%)
Total Removed: 1555910 reads (8.35%) 63889275 bases (9.80%)
Filtered by header: 0 reads (0.00%) 0 bases (0.00%)
Low quality discards: 0 reads (0.00%) 0 bases (0.00%)
Total Removed: 2015331 reads (10.86%) 82945814 bases (12.77%)
Total Removed: 5711034 reads (100.00%) 199886190 bases (100.00%)
Filtered by header: 2854429 reads (100.00%) 99905015 bases (100.00%)
Low quality discards: 2854429 reads (100.00%) 99905015 bases (100.00%)
Total Removed: 2854429 reads (100.00%) 99905015 bases (100.00%)
Filtered by header: 5759018 reads (100.00%) 201565630 bases (100.00%)
Low quality discards: 5759018 reads (100.00%) 201565630 bases (100.00%)
Total Removed: 5759018 reads (100.00%) 201565630 bases (100.00%)
Filtered by header: 20446366 reads (100.00%) 2065082966 bases (100.00%)
Low quality discards: 20446366 reads (100.00%) 2065082966 bases (100.00%)
Total Removed: 20446366 reads (100.00%) 2065082966 bases (100.00%)
Filtered by header: 0 reads (0.00%) 0 bases (0.00%)
Low quality discards: 0 reads (0.00%) 0 bases (0.00%)
Total Removed: 3864 reads (0.02%) 44301031 bases (2.20%)
Filtered by header: 4295725 reads (100.00%) 223377700 bases (100.00%)
Low quality discards: 4295725 reads (100.00%) 223377700 bases (100.00%)
Total Removed: 4295725 reads (100.00%) 223377700 bases (100.00%)
Filtered by header: 6081361 reads (100.00%) 316230772 bases (100.00%)
Low quality discards: 6081361 reads (100.00%) 316230772 bases (100.00%)
Total Removed: 6081361 reads (100.00%) 316230772 bases (100.00%)
Filtered by header: 4721831 reads (100.00%) 245535212 bases (100.00%)
Low quality discards: 4721831 reads (100.00%) 245535212 bases (100.00%)
Total Removed: 4721831 reads (100.00%) 245535212 bases (100.00%)
Filtered by header: 5467768 reads (100.00%) 191371880 bases (100.00%)
Low quality discards: 5467768 reads (100.00%) 191371880 bases (100.00%)
Total Removed: 5467768 reads (100.00%) 191371880 bases (100.00%)
Filtered by header: 15141684 reads (100.00%) 529958940 bases (100.00%)
Low quality discards: 15141684 reads (100.00%) 529958940 bases (100.00%)
Total Removed: 15141684 reads (100.00%) 529958940 bases (100.00%)
Filtered by header: 15345206 reads (100.00%) 537082210 bases (100.00%)
Low quality discards: 15345206 reads (100.00%) 537082210 bases (100.00%)
Total Removed: 15345206 reads (100.00%) 537082210 bases (100.00%)
Filtered by header: 7991117 reads (100.00%) 279689095 bases (100.00%)
Low quality discards: 7991117 reads (100.00%) 279689095 bases (100.00%)
Total Removed: 7991117 reads (100.00%) 279689095 bases (100.00%)
Filtered by header: 7977917 reads (100.00%) 279227095 bases (100.00%)
Low quality discards: 7977917 reads (100.00%) 279227095 bases (100.00%)
Total Removed: 7977917 reads (100.00%) 279227095 bases (100.00%)
Filtered by header: 14359040 reads (100.00%) 717952000 bases (100.00%)
Low quality discards: 14359040 reads (100.00%) 717952000 bases (100.00%)
Total Removed: 14359040 reads (100.00%) 717952000 bases (100.00%)
Filtered by header: 12269903 reads (100.00%) 613495150 bases (100.00%)
Low quality discards: 12269903 reads (100.00%) 613495150 bases (100.00%)
Total Removed: 12269903 reads (100.00%) 613495150 bases (100.00%)
Filtered by header: 14601226 reads (100.00%) 730061300 bases (100.00%)
Low quality discards: 14601226 reads (100.00%) 730061300 bases (100.00%)
Total Removed: 14601226 reads (100.00%) 730061300 bases (100.00%)
Filtered by header: 13348244 reads (100.00%) 667412200 bases (100.00%)
Low quality discards: 13348244 reads (100.00%) 667412200 bases (100.00%)
Total Removed: 13348244 reads (100.00%) 667412200 bases (100.00%)
Hi GenoMax, \ Many thanks for your reply. In the searches in papers they always show parameters of more or less:
qtrim=lr trimq=20 (or more) or more maq=20 (or more) minlen=20 (or more)
but now I put these values and the results were better:
qtrim=lr trimq=6 maq=10 minlen=15
In relation to your question. I fixed that error now. It was indicating the number of processors but I don't know how it was working.
I would be very happy if you tell me what you think about it.
Many thanks again
ways=
should bethreads=
if you want to use more than one core.Many thanks, GenoMax! You are the best!