Picard-tools mark duplicates error, missing @RG
1
0
Entering edit mode
9.5 years ago
kezcleal ▴ 160

Hi,

I'm trying to mark duplicates using picard tools but have come across this error:

Exception in thread "main" net.sf.samtools.SAMFormatException: SAM validation error: ERROR: Record 37, Read name FCC2CCMACXX:5:1101:6198:22031#, RG ID on SAMRecord not found in header: 2_DB31

The first line of my .bam file reads:

FCC2CCMACXX:4:1101:13561:50127#    99    chrM    1    15    49S51M    =    340    439    "sequence here"    "q score here"    NM:i:1    AS:i:46    XS:i:59    RG:Z:1_DB31

And if I look as samtools view -H, I see things such as:

@HD    VN:1.3    SO:coordinate
@SQ    SN:chrM    LN:16571
@SQ    SN:chr1    LN:249250621

Do I need to modify my original .bam file somehow?

EDIT: The reason I ask, is I am trying to feed this data into GATK. If I use something like samtools rmdup to remove duplicates, will this dataset still work with GATK?

next-gen sequence • 3.3k views
ADD COMMENT
2
Entering edit mode
9.5 years ago
bruce.moran ▴ 970

You could add readgroups using Picard, or try adding 'VALIDATION_STRINGENCY=LENIENT' to your command which will pass over these kinds of errors but you should be aware they still exist, so for example in GATK you will require readgroups for calling variants and downstream you will wish you had added readgroups. For RNAseq where you just want to call counts you should be OK without readgroups.

ADD COMMENT

Login before adding your answer.

Traffic: 1647 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6