I have performed GATK's Base Quality Score Recalibration (BQSR) on my BAM files.
The BQSR has made the per base sequence quality look very bad on FastQC. Is this normal? Should I use the BAM files that have been through BQSR and therefore have worse FastQC reports?
The FastQC per base sequence quality goes from being in the green zone
to being all in the lower red zone
The next step I will do is SNP calling. This is genomic data.
(The BAM files contain only the data that is paired, matches uniquely, and has the duplicates marked by Picard). The FastQC images are from just one of my samples but all samples have the same pattern of being all green or almost all in the green zone, to being entirely in the red zone after BQSR.
How to add images to a Biostars post
That is extremely weird. How did you run BQSR?
I did it like this:
I would then c&p the output of the cmds into my command line. I did this because R uses java12 not java1.8 so it didn't work directly through R.
That's an... unusual way of running stuff.
Yes it is annoying, usually I can just do system(cmd) in the loop to have it run on the command line through R. But java was causing problems so I just did it through c&p each print because I couldn't think how else to do it.
What was your problem with JAVA. BTW, why use R to run a shell-script (BQSR)?
It is because if I run command line though R Studio using "system(cmd)" it makes an error saying:
I googled it and apparently it's because GATK's BQSR was designed to be run on java 1.8.
So there's no error when I run directly on command line when:
But there is an error when I when on command line through using system() on R. Because R uses:
I just use R just because I am familiar with it and I like how R Studio works and how scripts can be run easily, whereas I am new to using the command line.
To solve this, you can always use the complete path of the correct JAVA (inside R or outside also), something like
/usr/local/bin/java
. To know the correct path, writewhich java
on commandline.Did you encounter any error? Could you just run BQSR on a single sample using commandline directly (wihout R)?
It didn't flag up any errors when I ran them on the command line which is why I was so surprised at the FastQC.
For example, I would run this on a sample for the first command to make the table:
The end of the output of this command says this (I would post all the output but it exceeds the character limit):
For the second command I would write:
The end of the output that command (to create the new recalibrated quality BAM files) was:
<4000 reads? My guess is this is the wrong input file