Error parsing SAM header. @RG line missing SM tag
1
0
Entering edit mode
7.1 years ago

I have a bam file called test.bam with malformed @RG line. @RG line has only one parameter as following:

@RG ID:foo

I want to add the parameters like SM, LB, PL to @RG line, but I have difficulties to do it.

I already tried with samtools as following:

samtools view -H test.bam | sed 's,^@PG.*,@PG\tID:None\tSM:None\tLB:None\tPL:Illumina,g' |  samtools reheader - test.bam > test.rg.bam

However, It gives an error like this:

[E::sam_hdr_error] Missing tab at line 198: "@PGtID:NonetSM:NonetLB:NonetPL:Illumina"

I also tried picard with AddOrReplaceReadGroups function, but it also replied with errors!

java -jar picard.jar AddOrReplaceReadGroups I=test.bam O=test.out.bam RGID=4 RGLB=lib1 RGPL=illumina RGPU=unit1 RGSM=20

Exception in thread "main" htsjdk.samtools.SAMFormatException: Error parsing SAM header. @RG line missing SM tag. Line: @RG ID:foo

Please help!

Thank you in advance

alignment samtools BAM • 6.4k views
ADD COMMENT
1
Entering edit mode

Thank you! samtools command does not work but the VALIDATION_STRINGENCY=LENIENT option works! Cheers

ADD REPLY
2
Entering edit mode
7.1 years ago

in picard/htsjdk RG must have another tag than ID, the separator is the tabulation.

try to fix this with:

samtools -h view in.bam | sed 's/^\(@RG\tID:.*\)/\1\tSM:foo/' | samtools view -b -o out.bam -
ADD COMMENT
4
Entering edit mode

another option: use AddOrReplaceReadGroups with the option VALIDATION_STRINGENCY=LENIENT

ADD REPLY
1
Entering edit mode

Sorry about the off-topic comment but you crossed 100K points landmark today. WooHoo!

ADD REPLY
1
Entering edit mode

not really :-) Pierre Lindenbaum 99550

ADD REPLY
0
Entering edit mode

Upward rounding of the BioStars code already put you at 100K :-)

ADD REPLY
0
Entering edit mode

I edited several (multi-readgroup) BAM file SM tags this way. After reindexing them, they seem to work fine, ie. I can open them in IGV without errors. However, when I use Picard ValidateSamFile I get the following error:

## HISTOGRAM    java.lang.String
Error Type      Count
ERROR:INVALID_INDEX_FILE_POINTER        1

I thought this wasn't a big deal, because they open fine. However, when I try to use other GATK tools to further process these files I'm often given an error like:

htsjdk.samtools.util.RuntimeIOException: my.bam has invalid uncompressedLength: -1998504602
    at htsjdk.samtools.util.BlockCompressedInputStream.inflateBlock(BlockCompressedInputStream.java:543)
        ...
        at org.broadinstitute.hellbender.Main.main(Main.java:289)

or

htsjdk.samtools.SAMFormatException: Invalid GZIP header
        at htsjdk.samtools.util.BlockGunzipper.unzipBlock(BlockGunzipper.java:121)
        ...
        at org.broadinstitute.hellbender.Main.main(Main.java:289)

Is there anything else I can do to fix this other than regenerate the BAMs from source?

ADD REPLY
0
Entering edit mode

hi! I also have the same problem: htsjdk.samtools.util.RuntimeIOException: my.bam has invalid uncompressedLength

Have you finally solved this problem? and how do you solve it? thanks

ADD REPLY
0
Entering edit mode

This is because the .bam.bai does not correspond to the bam file. After you change the @RG tag of BAM, you need to re-build the index, such as:

samtools index xx.bam

good luck!

ADD REPLY
0
Entering edit mode

Any ideas anyone? Pierre Lindenbaum ? Using samtools reheader + sed to change the SM tag for multiple readgroups at once (something that I dont' think is possible with AddOrReplaceReadGroups) is a very elegant solution but if GATK complains about an invalid file it's not a viable one.

ADD REPLY

Login before adding your answer.

Traffic: 2363 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6