Question

Is it okay to realign and subsequently process fastq files that were converted from processed BAM files?

0

Entering edit mode

7.5 years ago

mary.a.wood.91 ▴ 10

I have been given BAM files from a collaborator that have already gone through processing, but I want to incorporate this data set into a larger analysis for which I have a standard protocol for taking files from fastq through somatic variant calling (roughly similar to GATK's best practices). I am hoping to convert the BAM files back to fastq using Picard's SamToFastq program, and then take the files from there through the typical protocol, but I was curious about what potential issues this may raise, given that the quality scores may be different now than they were for the original fastqs.

FYI, the specifics for the BAM files were that the original fastqs were “aligned to the hg19 human genome build using BWA (v0.7.5) [and then] subjected to mark duplication, realignment, and recalibration using the Picard tool and GATK software tools”

Unfortunately, I don't know more about the origin of these files than that, but any general insights as to how the previous processing of the BAMs might affect how the fastqs I will generate are treated would be appreciated!

alignment fastq bam bwa recalibration • 2.2k views

ADD COMMENT • link updated 7.5 years ago by Istvan Albert 102k • written 7.5 years ago by mary.a.wood.91 ▴ 10

0

Entering edit mode

If you have been given the files, you could ask the person who gave you the bam files for more information, or even ask about the fastq files. Or is that out of question?

ADD REPLY • link 7.5 years ago by h.mon 35k

0

Entering edit mode

It's not necessarily out of the question, but difficult. I wasn't the one directly given the data, and my supervisor who received the files is having difficulty reaching those who generated the data. I was hoping to be able to move forward with analysis, but I wanted to try get a general feel for how feasible it is to work with fastqs generated from processed bams.

ADD REPLY • link 7.5 years ago by mary.a.wood.91 ▴ 10

score 3 · Accepted Answer · 2017-07-11

3

Entering edit mode

7.5 years ago

Istvan Albert 102k

Unless the reads were hard clipped the sequence information is unaltered and can be recovered into its original format.

The samtools fastq command can also perform the back conversion.

ADD COMMENT • link 7.5 years ago by Istvan Albert 102k

0

Entering edit mode

To add more to this - the content of the BAM file may only be a subset of the original data though.

ADD REPLY • link 7.5 years ago by Istvan Albert 102k

0

Entering edit mode

Thanks Istvan! You don't think there will be any issues with the quality scores, for example when it comes time to call variants?

ADD REPLY • link 7.5 years ago by mary.a.wood.91 ▴ 10

0

Entering edit mode

It will reconstitute the quality scores as well.

That FASTQ file will be just as fresh, fluffy and untouched as if it just rolled off of an instrument.