To use GATK for SNP-calling, we need read group info in the header. But what if we have no such info?
I've got several bam files from other collaborator without knowing any info of, say, which lane or libraries the sequence come from. In such circumstances, we simply regard all alignment in the bam file as from the same lane of same library?
And we just insert a line into the header like:
@RG ID:filename SM:filename LB:filename PL:Illumina
? thx
The read group information has to be added to each read (RG:Z:your_RG_ID) so it needs to reprocess the entire file. Per-read run groups allow a BAM file to contain multiple RGs, but it's extra work when fixing up BAMs with missing information.
Yeah, it's working. But my bam file is 100GB; and seems it's rebuilding the whole file, which takes a long time.