Dear all,
I have checked some sorted BAM files with Picard 2.18.13 and I got this error:
## HISTOGRAM java.lang.String
Error Type Count
ERROR:INVALID_VERSION_NUMBER 1
I am using this version of Java on an Ubuntu 18 machine:
$ java -version
java version "10.0.2" 2018-07-17
Java(TM) SE Runtime Environment 18.3 (build 10.0.2+13)
Java HotSpot(TM) 64-Bit Server VM 18.3 (build 10.0.2+13, mixed mode)
I can see from here that the problem might be Picard rather than the BAM file. But from here. the issue seems to lie with the headers: "Does not match any of the acceptable versions". The header I got is:
$ samtools view -h file1.bam | head -n 5
@HD VN:1.6 SO:coordinate
@SQ SN:21 LN:46709983
@RG ID:C3MF6ACXX.L001 SM:501 PL:ILLUMINA LB:lib-501 PU:C3MF6ACXX.1.NoIndex
@PG ID:bwa PN:bwa VN:0.7.17-r1188 CL:bwa mem -t 10 -R @RG\tID:C3MF6ACXX.L001\tSM:501\tPL:ILLUMINA\tLB:lib-501\tPU:C3MF6ACXX.1.NoIndex ./ref/GRCh38-21.fa 501N-1_1.fq.gz 501N-1_2.fq.gz -o aln/501N_bwa.sam
HWI-ST1437:64:C3UM1ACXX:4:2214:15629:46172 163 21 2145 0 52S19M30S = 2145 19 ATATACATATACAAACATTCATAACAAAATAAGGAATATTTATATAATAATTACAGTCCTCATGTTAATAACTGGTCACATGCTTATAGCAGGTATTTATA CCCFFFFFHHHHHJJJJJJJJJJJJIJJJJJIJJJJJJJJJJJJJJIJJJGIJJIJHIJJJJJJIIIHIJJJJJJIIJIIJJJJJHHHHHGGFDFFFFFEE NM:i:0 MD:Z:19 MC:Z:27S19M55S AS:i:19 XS:i:19 RG:Z:C3MF6ACXX.L001
$ samtools view -h file2.bam | head -n 5
@HD VN:1.6 SO:coordinate
@SQ SN:21 LN:46709983
@RG ID:C3MF6ACXX.L001 SM:502T PL:ILLUMINA LB:lib-502 PU:C3MF6ACXX.1.NoIndex
@PG ID:bwa PN:bwa VN:0.7.17-r1188 CL:bwa mem -t 10 -R @RG\tID:C3MF6ACXX.L001\tSM:502T\tPL:ILLUMINA\tLB:lib-502\tPU:C3MF6ACXX.1.NoIndex ./ref/GRCh38-21.fa 502N-1_1.fq.gz 502N-1_2.fq.gz -o aln/502N_bwa.sam
HWI-1103R:166:C3UJNACXX:1:1216:10747:27634 163 21 2359 0 48S19M34S = 2359 19 CCCCTGGGTTAATCTACTTCTCATTATAAACGATGACCCATGATGATGTGTTTTCTGAACCACTCATCTTGCATAGAGTGCCACAATGTGGAACAGCCCTA C@CFFFFFHHGHHJIIIJJJIGIIJJJJJJJJIIGGIIJIHIGIJIJIFHGHIJJJJIJIJJIJIJGGIIJIFGHHHGEHEFEFFCCECDCECCBDDDDCD NM:i:0 MD:Z:19 MC:Z:9S19M73S AS:i:19 XS:i:19 RG:Z:C3MF6ACXX.L001
The problem is there even if I align with HISAT2:
$ samtools view -h fle2HST.bam | head -n 5
@HD VN:1.0 SO:coordinate
@SQ SN:21 LN:46709983
@RG ID:C3UJNACXX.L001 SM:502 PL:ILLUMINA LB:lib-502 PU:C3UJNACXX.1.NoIndex
@PG ID:hisat2 PN:hisat2 VN:2.1.0 CL:"/home/gigiux/src/hisat2/hisat2-align-s --wrapper basic-0 --rg ID:C3UJNACXX.L001 --rg SM:502 --rg PL:ILLUMINA --rg LB:lib-502 --rg PU:C3UJNACXX.1.NoIndex -p 10 -q -x ref/GRCh38-21.fa -S aln/502N_hst.sam -1 /tmp/14444.inpipe1 -2 /tmp/14444.inpipe2"
HWI-1103R:166:C3UJNACXX:1:2307:16976:65381 163 21 5010021 1 101M = 5010148 228 CTAAAGTGCTGGGATTACAGGTGTTAGCCACCACGTCCAGCTGTTAATTTTTATTTAATAAGAATGACAGAGTGAGGGCCATCACTGTTAATGAAGCCAGT CCCFFFDDHHHHHJJJJJJJJCFHIIJJJJJJIJJHIIJJJJJIJJIJJJJJJJJJIJJJIIJJJIJIJJJJCHHHHFFFDEEEEECDCDEEDDDDDDDD: AS:i:0 ZS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:101 YS:i:0 YT:Z:CP RG:Z:C3UJNACXX.L001 NH:i:2
Would you know how can I solve it?
Thank you.
To call picard overly picky would be an understatement. Whenever possible use something else.
I could not agree more :)
picard: the command line arguments are very unpleasant (nevertheless you can read them from a file) but otherwise, I like it. (le's start a troll with picard :-) )
I have seen a lot of complaints about Picard on the web; but what is the alternative then? I understand Picard is a required as a preparatory step to GATK. Is there something better? I have been told
sambamba
, but this looks to me more a replacement for samtools.If you need read groups then have your aligner add them. If you need to sort or mark duplicates, that's what sambamba or samtools are for. Picard is then no longer needed.
OK then, I'll try sambamba. Do I need to run an error check? I haven't seen a command similar to
ValidateSamFile
in sambamba.You shouldn't normally use that command, it's only appropriate when you get an error and even then it's easier to
samtools quickcheck
, which ensures the basic file is intact and doesn't do all of the absurd stuff picard tries to do.