I'm analyzing whole exam sequencing data, and using Picard to perform some QC on my aligned / sorted / duplicated removed .bam files. When running picard CollectSequencingArtifactMetrics
, I receive the following error:
Exception in thread "main" picard.PicardException: Record contains library that is missing from header: UnknownLibrary
WES data processing / analysis is not my main area of expertise at all, and I'm not super familiar with the .sam / .bam formats. Does anyone have any idea what could be causing this? I've run other picard
functions, such as SortSam
and MarkDuplicates
without errors, so I'm pretty confused.
So I ran
picard ValidateSamFile
and did receive the error:The .bam file was generated automatically as output from
bwa mem
, first in .sam format and then converted to .bam usingpicard SamFormatConverter
. So, I'm guessing that I didn't declare the read group in the header (unless it's done automatically) and I'm not sure how to check if I have a read without theRG
attribute.you should/must have specified some read-groups. https://gatk.broadinstitute.org/hc/en-us/articles/360035890671-Read-groups
Specified them at what step? I ran
samtools view -H | grep '^@RG'
and got nothing in return, which I'm guessing means I failed to specify them at some point when I was supposed to.EDIT: might this have something to do with how I combined the R[1-2]_L001.fastq & R[1-2]_L002.fastq files? I used:
to combine the lanes for both paired reads.
better/faster:
bwa can use gzipped fastq files
Furthermore, you can parallelize things by mapping each fastq and merge the bam later.
At the time of original alignment. Could add now: Adding Read Groups To Bam Files
Plain
cat
works for combining lane specific files.Thank you so much; I am a little bit confused about what string I should use as the read group argument? Can it be an arbitrary name or does it need to follow a certain naming convention?
it can be any string. You can use the sample name.
This worked; I would add though that to use the
-R
flag, one needs to enclose the string in single quotes, like'@RG\tID:$samplename'
instead of double quotes, like"@RG\tID:$samplename"
, otherwise it will not work.