Hello, I am new to sequencing analysis and ready to work on bams of tumor-normal pairs. I would like to know how to get the information of programs done on a bam file. Does @PG in header mean that?
Many thanks!
Hello, I am new to sequencing analysis and ready to work on bams of tumor-normal pairs. I would like to know how to get the information of programs done on a bam file. Does @PG in header mean that?
Many thanks!
If the BAM files has @PG lines then yes, those can be used to indicate that a certain program was run to generate that file. The problem is mostly that not all programs actually add the @PG lines.
I think picard and GATK both add @PG lines, at least the recent versions seem to. I don't think redoing BQSR or indel realignment will do much if they've already been done. Note that with remotely recent data you don't need to bother with either of these steps (assuming you use the haplotype caller rather than the unified genotyper).
Thanks so so much! There are no BQSR or realignment records in @PG lines of my bams and I have been hesitating to take both programs. But I plan to use unified genotyper for variant calling. Will the unified genotyper cause error and increase the false positive rate?
From the comparisons I've seen yes the unified genotyper has poorer calling characteristics (I don't recall whether it was false positives or false negatives, have a look at Brad Chapman's blog for a LOT of high quality variant caller comparisons). The haplotype caller also allows you to skip some steps, so it seems more sensitive and also faster (not to mention that it's a bit more straight forward to do samples in batches and combine them latter).
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Do you mean what tools used to generate that BAM ? Would you elaborate more? And also check this.
Thank you for your reply.I copied bams from others and was wondering about how the bams were generated and whether the bams was done markduplicate, indelrealign or BQSR.The format specification says that @PG is "Programs used for processing the read group". Does that mean the ways bams were generated?