I have 9 .bam files, produced from 2x75b PE Illumina reads (RNA-Seq) and aligned using STAR to the Ensemble rat reference genome. Each file has one @RG line with only two entries: ID and SM. So for sample s01, the @RG line looks as follows: @RG ID:s01 SM:s01
. I have not included any library information (LB:) in the @RG line.
When I run bamUtil's dedup to mark duplicates, I get the following error for each of the 9 .bam files:
WARNING: Cannot find library information in the header line @RG ID:s01 SM:s01 .
Using empty string for library name
I'm a beginner here. As best as I can tell the duplication marking seems to have worked well.
Should I be concerned that the input .bam files did not have a library defined? If I need to define a library for each .bam file, could you point me to some insights on what to define as the library? e.g. Should I just set the library to the sample name, so that between the 9 .bam files I will have 9 different libraries?
Thanks,
skhan