I've been mapping BAM files with bwa, and had bwa add read groups during the mapping. As a consequence, the exact command that I used to execute bwa is thus included in the @PG headers in all the BAM files. That obviously includes the read groups specified.
Since then, I've used Picard to replace all of the read groups with new values. As such, the information in the @PG headers is incorrect, and could mislead other researchers if I publish the BAM files.
1) How can I safely remove these @PG headers from the BAM files? I figure I might as well just strip them all out if they contain incorrect data.
2) Is this necessary, assuming I don't publish the BAM files and I'm the only one with potential to be confused? Could @PG headers affect downstream analyses when it comes to variant calling etc.? I don't think GATK uses them at all, but I don't know if other pipelines or programs might incorporate that data.
Thank you!
Do not all of Picard tools add @PG lines? I ran AddOrReplaceReadGroups and there was no @PG line for this but running MarkDuplicates there was.