Entering edit mode
5.1 years ago
SSK
▴
10
Hi!
At first, I used trimmomatic with an average quality score per 4 bp of >= 30, and made bam file by using bwa mem. However, I had to do an analysis with reads quality score per 4bp of >= 20.
To save time, I'm thinking of making bam file with quality score >= 20 and < 30, and merging two bam files ( bam1{ 20 =< quality < 30 } + bam2 { 30 =< quality } ). So to do this, I need fastq file with 20 =< quality score < 30.
Can I make this fatsq file with trimmomatic or other method?
Thanks!
I think just rerunning everything starting with q20 for trimmomatic is going to turn out much more efficient then trying to get a patch solution.
Does your analysis takes that long?
Yes,My data is big. I have 6 fastq files ( pair × 3 set ), and each fastq file is about 25GB. Also, when I did analysis with q30 reads, final bam file made by merging 3 bams was about 100GB.
I work routinely with FASTQ files that are > 500GB and I would repeat the whole analysis ^^ putting every command into a script doesn't take so long, after all, if they execute one after the other.
Wow ! I see. Could you give me some advice about handling data to save time and memory ?
Have as much as you need within programs that you can re-run at any time, save all logs possible. This is really 90% of the efficiency, imho! Everytime something like this happens, you be like changing your script parameters and re-run it. Even if it took a lot to get where you are, re-running it won't take as long because there is no decision making!
And if you're into these things: https://jupyter.org/
Thanks Macspider and lieven.sterck ! I could be better bioinformatician than 24 hours ago!