Entering edit mode
9.4 years ago
bioguy24
▴
230
I am getting an error in Picard using a modified bed file that was generated using. I apologize for the long post, just trying to be thorough as I can not figure it out. Thank you :).
samtools view -H IonXpress_009_150603.bam > header.sam
cat header.sam your_file.bed > new_file.bed
Example Bed file for BI=
@HD VN:1.4 GO:none SO:coordinate
@SQ SN:chr1 LN:249250621
@SQ SN:chr2 LN:243199373
@SQ SN:chr3 LN:198022430
@SQ SN:chr4 LN:191154276
@SQ SN:chr5 LN:180915260
@SQ SN:chr6 LN:171115067
@SQ SN:chr7 LN:159138663
@SQ SN:chr8 LN:146364022
@SQ SN:chr9 LN:141213431
@SQ SN:chr10 LN:135534747
@SQ SN:chr11 LN:135006516
@SQ SN:chr12 LN:133851895
@SQ SN:chr13 LN:115169878
@SQ SN:chr14 LN:107349540
@SQ SN:chr15 LN:102531392
@SQ SN:chr16 LN:90354753
@SQ SN:chr17 LN:81195210
@SQ SN:chr18 LN:78077248
@SQ SN:chr19 LN:59128983
@SQ SN:chr20 LN:63025520
@SQ SN:chr21 LN:48129895
@SQ SN:chr22 LN:51304566
@SQ SN:chrX LN:155270560
@SQ SN:chrY LN:59373566
@SQ SN:chrM LN:16569
@RG ID:8AH6U.IonXpress_009 PL:IONTORRENT PU:Unspecified/P1.1.17/IonXpress_009 FO:TACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACG DT:2015-06-03T14:03:02-0700 SM:E1 PG:tmap KS:TCAGTGAGCGGAACGAT CN:TorrentServer/Proton
@RG ID:8AH6U.IonXpress_009.1 PL:IONTORRENT PU:Unspecified/P1.1.17/IonXpress_009 FO:TACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACG DT:2015-06-03T13:27:05-0700 SM:E1 PG:tmap KS:TCAGTGAGCGGAACGAT CN:TorrentServer/Proton
@RG ID:8AH6U.IonXpress_009.10 PL:IONTORRENT PU:Unspecified/P1.1.17/IonXpress_009
Example Bed file for TI=
@HD VN:1.4 GO:none SO:coordinate
@SQ SN:chr1 LN:249250621
@SQ SN:chr2 LN:243199373
@SQ SN:chr3 LN:198022430
@SQ SN:chr4 LN:191154276
@SQ SN:chr5 LN:180915260
@SQ SN:chr6 LN:171115067
@SQ SN:chr7 LN:159138663
@SQ SN:chr8 LN:146364022
@SQ SN:chr9 LN:141213431
@SQ SN:chr10 LN:135534747
@SQ SN:chr11 LN:135006516
@SQ SN:chr12 LN:133851895
@SQ SN:chr13 LN:115169878
@SQ SN:chr14 LN:107349540
@SQ SN:chr15 LN:102531392
@SQ SN:chr16 LN:90354753
@SQ SN:chr17 LN:81195210
@SQ SN:chr18 LN:78077248
@SQ SN:chr19 LN:59128983
@SQ SN:chr20 LN:63025520
@SQ SN:chr21 LN:48129895
@SQ SN:chr22 LN:51304566
@SQ SN:chrX LN:155270560
@SQ SN:chrY LN:59373566
@SQ SN:chrM LN:16569
@RG ID:8AH6U.IonXpress_009 PL:IONTORRENT PU:Unspecified/P1.1.17/IonXpress_009 FO:TACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACG DT:2015-06-03T14:03:02-0700 SM:E1 PG:tmap KS:TCAGTGAGCGGAACGAT CN:TorrentServer/Proton
@RG ID:8AH6U.IonXpress_009.1 PL:IONTORRENT PU:Unspecified/P1.1.17/IonXpress_009 FO:TACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACG DT:2015-06-03T13:27:05-0700 SM:E1 PG:tmap KS:TCAGTGAGCGGAACGAT CN:TorrentServer/Proton
@RG ID:8AH6U.IonXpress_009.10 PL:IONTORRENT PU:Unspecified/P1.1.17/IonXpress_009
Error:
dnascopev@ubuntu:/media/C2F8EFBFF8EFAFB9$ picard-tools CalculateHsMetrics BI=sam_sort_5column_xgen_probes.bed TI=sam_sort_5column_xgen_targets.bed I=IonXpress_009_150603.bam O=IonXpress_009_150603_all_IDT.CalculateHSmetrics
[Tue Jun 23 14:07:49 CDT 2015] net.sf.picard.analysis.directed.CalculateHsMetrics BAIT_INTERVALS=sam_sort_5column_xgen_probes.bed TARGET_INTERVALS=sam_sort_5column_xgen_targets.bed INPUT=IonXpress_009_150603.bam OUTPUT=IonXpress_009_150603_all_IDT.CalculateHSmetrics TMP_DIR=/tmp/dnascopev VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
[Tue Jun 23 14:07:50 CDT 2015] net.sf.picard.analysis.directed.CalculateHsMetrics done. Elapsed time: 0.02 minutes.
Runtime.totalMemory()=138412032
Exception in thread "main" java.lang.IllegalArgumentException: Program record with group id bc already exists in SAMFileHeader!
at net.sf.samtools.SAMFileHeader.addProgramRecord(SAMFileHeader.java:197)
at net.sf.samtools.SAMTextHeaderCodec.parsePGLine(SAMTextHeaderCodec.java:150)
at net.sf.samtools.SAMTextHeaderCodec.decode(SAMTextHeaderCodec.java:90)
at net.sf.picard.util.IntervalList.fromReader(IntervalList.java:181)
at net.sf.picard.util.IntervalList.fromFile(IntervalList.java:152)
at net.sf.picard.analysis.directed.HsMetricsCalculator.<init>(HsMetricsCalculator.java:83)
at net.sf.picard.analysis.directed.CalculateHsMetrics.doWork(CalculateHsMetrics.java:83)
at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:158)
at net.sf.picard.analysis.directed.CalculateHsMetrics.main(CalculateHsMetrics
Does your header have any @PG lines?
The bam file was from Ion Torrent and was used to grab the headers used in the SAMTools conversion. Does that help? Thank you :).
No, just look and see if any lines start with @PG. If there are any, post them all.
As Devon said, the problem is with the @PG tag. The error says that the group id "bc" has been duplicated for the PG tags. All you need to do is run
samtools view -H IonXpress_009_150603.bam | grep "^@PG" | cut -f2 | sort | uniq -c
and see if you frequency for particular ID greater than 1 (Just to warn you, the tmap aligned bam file will have plenty of PG IDs). The problem is interesting as I thought Picard only cared about RGIDs to be unique but damn it also cares about PG Ids too be unique too. Have you tried setting the "VALIDATION_STRINGENCY" to be "LENIENT". It may or may not help. Otherwise you can just remove @PG tags from the header of the BAM file and the bed files and rerun the command.So try: (sorry I am still learning). Thank you :).
or to remove all occurrences of @PG
Hey, dont be sorry. This is how you learn. Actually this problem is new to me too. I would go with the first option. That is running with
VALIDATION_STRINGENCY=LENIENT
. If it still throws the same error, go withVALIDATION_STRINGENCY=SILENT
. Let me know if it doesn't work.I tried both and got the same error:
Thank you for your help :).
Ok so we need to remove @PG tags from the header of the SAM file. The error is in the SAM header. The easiest way would be to output the current header in a text file and then remove lines that start with @PG tag. Then use
samtools reheader
to create a new bam file. See samtools reheader. Another thing is that why you are appending the header of the bam file to your BI and TI bed files. Bed file doesn't need any a SAM header. It should just follow chr start end strand, so I will remove the sam header from the bed file. First try deleting sam header from the bed files and run the command. If it doesn't work try removing @PG tags from the bam file as I suggested above. Let me know.I removed all occurrences of @PG in the header and tried running picard
Both gave the same error:
Its a new error so it looks like @PG was removed, hopefully it is closer. Thank you :).
That would seem to indicate that you have duplicate @SQ lines. Why not reproduce this on a ~1000 line file and post that to github so we have look at a working (well, not working) example?
I am just repeating what Devon has already indicated. It seems that there are duplicates (chr1) in your
@SQ
lines. Check the header of your new bam file.I am not sure how to post on github (watching tutorials now).
header.txt
below was created using this command:then removed the unwanted text
followed by to create the BI= and TI=
header.txt
Thank you :)
Github was just an example, you can use anything else (dropbox, google drive, pastebin, etc.). We just need a small example that completely reproduces all the error message. That means we need at least a few alignments too.
Well this is what not I asked you to do. I didn't ask you to replace the new header in the bed files but in the original bam file and create a new bam file. Also, you don't need header in the bed files. Your bed file should follow this format (https://genome.ucsc.edu/FAQ/FAQformat.html#format1). To reiterate
Is the reheader below right (or close)
to create a new bam without the @PG (header.txt has the @PG removed)
The original format of the bed file was:
I sorted that file and added a 5th column concatenating chr:start-end
Should I have just used the original?
Thank you :).
Your bed files look fine. You can have more than 4 columns. The bed files should be sorted which you did. So you can use the original bed files. About the header, you can do it step by step. One step answer would be:
I am getting closer know thanks to your patience and expertise:
Is this an error in the way I invoke? Should it be
or
Thank you :)
The first one is correct. I think you should spend some time and go through Picard manual. Make sure all the files are sorted and indexed.
I sorted and indexed the bed files and posted the headers of them below. I am trying to read the manual but the site is blocked by our firewall so I have to use my phone. Thank you :).
when I run:
to put in the sam header I get:
sort_index_5column_xgen_probes.bed
sam_sort_index_5column_xgen_probes.bed
Perhaps you have a blank line at the end of your BED file (the error pertains to that file)?
Thank you for all your help, it is working now :).
Thank you for all your help, it is working now :).