How to fix .bam file that was diagnosed as "MISSING_READ_GROUP" by Picard ValidateSamFile
2
2
Entering edit mode
6.7 years ago
freddiejung ▴ 60

Hi, I am using Pilon to polish and elongate my PacBio assembly.

But I realized Pilon always generate error like shown in the bottom.

I found that one of my .bam file have something wrong and I performed Picard ValidateSamFile.

This program diagnosed this .bam is missing group(↓).

## HISTOGRAM    java.lang.String
Error Type  Count
ERROR:MISSING_READ_GROUP    1
WARNING:RECORD_MISSING_READ_GROUP   22324694

How to fix this .bam file? Any comments would be appreciated.

Best, Jung


Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.simontuffs.onejar.Boot.run(Boot.java:340)
at com.simontuffs.onejar.Boot.main(Boot.java:166)
Caused by: java.lang.UnsupportedOperationException: Cannot query stream-based BAM file
at htsjdk.samtools.BAMFileReader.query(BAMFileReader.java:410)
at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.query(SamReader.java:498)
at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.query(SamReader.java:503)
at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.queryOverlapping(SamReader.java:365)
at org.broadinstitute.pilon.BamFile.readsInRegion(BamFile.scala:329)
at org.broadinstitute.pilon.BamFile.recruitBadMates(BamFile.scala:357)
at org.broadinstitute.pilon.GapFiller$$anonfun$recruitJumps$1.apply(GapFiller.scala:380)
at org.broadinstitute.pilon.GapFiller$$anonfun$recruitJumps$1.apply(GapFiller.scala:379)
at scala.collection.immutable.List.foreach(List.scala:381)
at org.broadinstitute.pilon.GapFiller.recruitJumps(GapFiller.scala:379)
at org.broadinstitute.pilon.GapFiller.recruitReads(GapFiller.scala:391)
at org.broadinstitute.pilon.GapFiller.assembleAcrossBreak(GapFiller.scala:51)
at org.broadinstitute.pilon.GapFiller.fixBreak(GapFiller.scala:45)
at org.broadinstitute.pilon.GenomeRegion$$anonfun$identifyAndFixIssues$4.apply(GenomeRegion.scala:383)
at org.broadinstitute.pilon.GenomeRegion$$anonfun$identifyAndFixIssues$4.apply(GenomeRegion.scala:381)
at scala.collection.immutable.List.foreach(List.scala:381)
at org.broadinstitute.pilon.GenomeRegion.identifyAndFixIssues(GenomeRegion.scala:381)
at org.broadinstitute.pilon.GenomeFile$$anonfun$processRegions$4.apply(GenomeFile.scala:120)
at org.broadinstitute.pilon.GenomeFile$$anonfun$processRegions$4.apply(GenomeFile.scala:109)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.parallel.ParIterableLike$Foreach.leaf(ParIterableLike.scala:972)
at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply$mcV$sp(Tasks.scala:49)
at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:48)
at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:48)
at scala.collection.parallel.Task$class.tryLeaf(Tasks.scala:51)
at scala.collection.parallel.ParIterableLike$Foreach.tryLeaf(ParIterableLike.scala:969)
at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask$class.internal(Tasks.scala:169)
at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.internal(Tasks.scala:443)
at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask$class.compute(Tasks.scala:149)
at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:443)
at scala.concurrent.forkjoin.RecursiveAction.exec(RecursiveAction.java:160)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Picard ValidateSamFile Bam Pilon • 6.4k views
ADD COMMENT
0
Entering edit mode

Hi Pierre and other experts,

I used Picard ValidateSamFile and found the following errors. about mate not found and missing read group.

I used tophat instead of BWA. Is there any difference between Tophat and BWA derived BAM files.

Also what to do abut Mate not found in a PE reads. It should have been a warning instead of Error. Any significance of this error wrt. RNA-Seq Tophat BAM file.

Thanks Adrian

## HISTOGRAM    java.lang.String
Error Type      Count
ERROR:MATE_NOT_FOUND    3157873
ERROR:MISSING_READ_GROUP        1
WARNING:RECORD_MISSING_READ_GROUP       140564013
ADD REPLY
0
Entering edit mode

Hello oriolebaltimore ,

You should know that the old 'Tuxedo' pipeline of Tophat(2) and Cufflinks is no longer the "advisable" tool for RNA-seq analysis. The software is deprecated/ in low maintenance and should be replaced by HISAT2, StringTie and ballgown. See this paper: Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. There are also other alternatives, including alignment with STAR and bbmap, or pseudo-alignment using salmon.

ADD REPLY
2
Entering edit mode
6.7 years ago

you need to add some read group to your BAM, either using your originial mapper , or using picard https://broadinstitute.github.io/picard/command-line-overview.html#AddOrReplaceReadGroups

furthemore, this error:

Caused by: java.lang.UnsupportedOperationException: Cannot query stream-based BAM file

means that you're trying to random access a stream (stdin) to get a specific region, which is not possible.

ADD COMMENT
0
Entering edit mode
6.7 years ago
mittu1602 ▴ 200

Hi Jung, Missing read groups occur because either your machine could not capture the reads, or the quality cutoffs are too high where those reads were removed. You can try decreasing your quality score.

This is best to my knowledge.

ADD COMMENT
1
Entering edit mode

Hi mittu1602,

Thank you for contributing, but I have the impression you are confused. There is no connection between sam-format read groups and the sequencing machine or quality cutoffs.

Cheers,
Wouter

ADD REPLY

Login before adding your answer.

Traffic: 1372 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6