programs on BAM file
1
0
Entering edit mode
8.3 years ago
zhoub • 0

Hello, I am new to sequencing analysis and ready to work on bams of tumor-normal pairs. I would like to know how to get the information of programs done on a bam file. Does @PG in header mean that?

Many thanks!

SNP sequencing bam • 1.3k views
ADD COMMENT
0
Entering edit mode

Do you mean what tools used to generate that BAM ? Would you elaborate more? And also check this.

ADD REPLY
0
Entering edit mode

Thank you for your reply.I copied bams from others and was wondering about how the bams were generated and whether the bams was done markduplicate, indelrealign or BQSR.The format specification says that @PG is "Programs used for processing the read group". Does that mean the ways bams were generated?

ADD REPLY
0
Entering edit mode
8.3 years ago

If the BAM files has @PG lines then yes, those can be used to indicate that a certain program was run to generate that file. The problem is mostly that not all programs actually add the @PG lines.

ADD COMMENT
0
Entering edit mode

Thank you for your reply! Do Picard tools and GATK BQSR and Indelrealign add the @PG lines? By the way, how repeat BQSR and realignment on the same bam file affect the variant calling?

ADD REPLY
0
Entering edit mode

I think picard and GATK both add @PG lines, at least the recent versions seem to. I don't think redoing BQSR or indel realignment will do much if they've already been done. Note that with remotely recent data you don't need to bother with either of these steps (assuming you use the haplotype caller rather than the unified genotyper).

ADD REPLY
0
Entering edit mode

Thanks so so much! There are no BQSR or realignment records in @PG lines of my bams and I have been hesitating to take both programs. But I plan to use unified genotyper for variant calling. Will the unified genotyper cause error and increase the false positive rate?

ADD REPLY
0
Entering edit mode

From the comparisons I've seen yes the unified genotyper has poorer calling characteristics (I don't recall whether it was false positives or false negatives, have a look at Brad Chapman's blog for a LOT of high quality variant caller comparisons). The haplotype caller also allows you to skip some steps, so it seems more sensitive and also faster (not to mention that it's a bit more straight forward to do samples in batches and combine them latter).

ADD REPLY

Login before adding your answer.

Traffic: 1628 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6