Hi all,
I have been trying to use the ViromeScan software (https://sourceforge.net/projects/viromescan/) to detect viral sequences in RNA-Seq samples. The pipeline utilizes the perl script trimBWAstyle.usingBam.pl with the parameters -q 3 -o 65 -l 60. According to the documentation for trimBWAstyle.usingBam.pl (https://github.com/genome/genome/blob/master/lib/perl/Genome/Site/TGI/Hmp/HmpSraProcess/trimBWAstyle.usingBam.pl), -q indicates the quality threshold (the default is 20, but to trim q2, then set -q to 3), -l 60 indicates that the length cutoff of trimmed reads is 60, and -o is the Fastq offset value ("change to 65 for Illumina"). My questions are:
It is my understanding that most FASTQ files use phred +33, so I'm confused why the -o parameter is set to 65. Is there a specific reason why it was set this way, or should I change it to -o 33 after double checking that my data indeed uses phred+33?
What does it mean to trim q2? Does setting -q to 3 just mean that the quality threshold is 3? Or is there a specific meaning to q2 that differs from trimming bwa style (from how I understand bwa style trimming, you start from the right of the read and accumulate a badness sum or "area" based on encountering positions of lower quality than your -q parameter and losing some of that area when you encounter positions higher than your -q. You then trim to the position where the area was the greatest. Explanation from here: http://seqanswers.com/forums/showthread.php?t=6251)
When I run trimBWAstyle.usingBam.pl on the BAM file output by the preceding step in the ViromeScan pipeline (a BAM file output from running FastqtoSam on a fastq file output from bmtagger.sh), I always get 3 output files: *trimmed.1.fastq, *trimmed.2.fastq, and *trimmed.singleton.fastq, even when the BAM file only contains single-end reads. How does trimBWAstyle.usingBam.pl decide which reads to put in which file?
I appreciate any help anyone can provide!
Best, Elaine