I have been given BAM files from a collaborator that have already gone through processing, but I want to incorporate this data set into a larger analysis for which I have a standard protocol for taking files from fastq through somatic variant calling (roughly similar to GATK's best practices). I am hoping to convert the BAM files back to fastq using Picard's SamToFastq program, and then take the files from there through the typical protocol, but I was curious about what potential issues this may raise, given that the quality scores may be different now than they were for the original fastqs.
FYI, the specifics for the BAM files were that the original fastqs were “aligned to the hg19 human genome build using BWA (v0.7.5) [and then] subjected to mark duplication, realignment, and recalibration using the Picard tool and GATK software tools”
Unfortunately, I don't know more about the origin of these files than that, but any general insights as to how the previous processing of the BAMs might affect how the fastqs I will generate are treated would be appreciated!
If you have been given the files, you could ask the person who gave you the bam files for more information, or even ask about the fastq files. Or is that out of question?
It's not necessarily out of the question, but difficult. I wasn't the one directly given the data, and my supervisor who received the files is having difficulty reaching those who generated the data. I was hoping to be able to move forward with analysis, but I wanted to try get a general feel for how feasible it is to work with fastqs generated from processed bams.