Picard CollectSequencingArtifactMetrics error
1
0
Entering edit mode
4.5 years ago
jrleary ▴ 210

I'm analyzing whole exam sequencing data, and using Picard to perform some QC on my aligned / sorted / duplicated removed .bam files. When running picard CollectSequencingArtifactMetrics, I receive the following error:

Exception in thread "main" picard.PicardException: Record contains library that is missing from header: UnknownLibrary

WES data processing / analysis is not my main area of expertise at all, and I'm not super familiar with the .sam / .bam formats. Does anyone have any idea what could be causing this? I've run other picard functions, such as SortSam and MarkDuplicates without errors, so I'm pretty confused.

picard WES • 2.2k views
ADD COMMENT
1
Entering edit mode
4.5 years ago

This error happens when a read is not associated to a read group

1) did you declare the read group in the header @RG

2) do you have any read without the RG attribute ?

ADD COMMENT
0
Entering edit mode

So I ran picard ValidateSamFile and did receive the error:

ERROR:MISSING_READ_GROUP

The .bam file was generated automatically as output from bwa mem, first in .sam format and then converted to .bam using picard SamFormatConverter. So, I'm guessing that I didn't declare the read group in the header (unless it's done automatically) and I'm not sure how to check if I have a read without the RG attribute.

ADD REPLY
0
Entering edit mode

you should/must have specified some read-groups. https://gatk.broadinstitute.org/hc/en-us/articles/360035890671-Read-groups

ADD REPLY
0
Entering edit mode

Specified them at what step? I ran samtools view -H | grep '^@RG' and got nothing in return, which I'm guessing means I failed to specify them at some point when I was supposed to.

EDIT: might this have something to do with how I combined the R[1-2]_L001.fastq & R[1-2]_L002.fastq files? I used:

zcat sample_R1_L001.fastq.gz sample_R1_L002.fastq.gz >  sample_R1.fastq

to combine the lanes for both paired reads.

ADD REPLY
1
Entering edit mode

better/faster:

cat sample_R1_L001.fastq.gz sample_R1_L002.fastq.gz >  sample_R1.fastq.gz

bwa can use gzipped fastq files

Furthermore, you can parallelize things by mapping each fastq and merge the bam later.

ADD REPLY
0
Entering edit mode

At the time of original alignment. Could add now: Adding Read Groups To Bam Files

Plain cat works for combining lane specific files.

ADD REPLY
0
Entering edit mode

Thank you so much; I am a little bit confused about what string I should use as the read group argument? Can it be an arbitrary name or does it need to follow a certain naming convention?

ADD REPLY
0
Entering edit mode

it can be any string. You can use the sample name.

ADD REPLY
0
Entering edit mode

This worked; I would add though that to use the -R flag, one needs to enclose the string in single quotes, like '@RG\tID:$samplename' instead of double quotes, like "@RG\tID:$samplename", otherwise it will not work.

ADD REPLY

Login before adding your answer.

Traffic: 1947 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6