Hey everyone
I have some .sam issues at the moment. I'm readdressing a BWA/GATK workflow I was using a couple of months ago and would like to be able to use GATKs base quality recalibration tools (CountCovariates / TableRecalibration).
Now, my SOLiD reads were converted to (fake)basespace prior to mapping with BWA. The resultant .bam can't be used in the GATK recalibration tools [despite my having reordered / sorted / indexed to conform to GATKs other requirements] because the original colourspace read data isn't present in the .bam.
So my question is: given a .csfasta file and a .sam file, is there a particularly efficient way to look up a read in the former and add its colourspace readdata (CS:Z:... and CQ:Z:...) to the latter, iterating over all of the reads in the .sam?
All the best, Russ
oh hi, thanks for your reply. I also did the mapping with SHRiMP, and was keen to compare the two. All the best.
I haven't used SHRiMP myself, but if computational resources aren't an issue for you compare your results to a colorspace aware mapper. My intuition is the colorspace aware algorithm will be superior