I'm trying to extract some FastQ files from bam files. Picard can do this with SamToFastq as it says in the documentation for this tool it accepts either a bam or sam file.
But when I run it, it only extracts one read, and then exits. Here is the error message. Any help is appreciated.
[davy@xxxx picard-tools-1.70]$ java -jar SamToFastq.jar I=/home/davy/xxx_trio_data/xxxx-1.bam F=/home/davy/xxx_trio_data/1005-1.fastq
[Wed Jun 20 14:14:21 BST 2012] net.sf.picard.sam.SamToFastq INPUT=/home/davy/xxx_trio_data/xxxx-1.bam FASTQ=/home/davy/xxxx_trio_data/xxxx-1.fastq OUTPUT_PER_RG=false RE_REVERSE=true INCLUDE_NON_PF_READS=false READ1_TRIM=0 READ2_TRIM=0 INCLUDE_NON_PRIMARY_ALIGNMENTS=false VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
[Wed Jun 20 14:14:21 BST 2012] Executing as davy@xxxxx.xxxx.xxxx.xx.xx on Linux 2.6.34.9-69.fc13.x86_64 amd64; OpenJDK 64-Bit Server VM 1.6.0_18-b18; Picard version: 1.70(1215)
[Wed Jun 20 14:14:21 BST 2012] net.sf.picard.sam.SamToFastq done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=2029715456
FAQ: http://sourceforge.net/apps/mediawiki/picard/index.php?title=Main_Page
Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
at java.util.ArrayList.rangeCheck(ArrayList.java:571)
at java.util.ArrayList.get(ArrayList.java:349)
at net.sf.picard.sam.SamToFastq.doWork(SamToFastq.java:156)
at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:177)
at net.sf.picard.sam.SamToFastq.main(SamToFastq.java:118)
Edit: First few alignments from SAM file:
C010CACXX110731:5:1301:17327:162246 99 1 10025 0 76M = 10045 95 TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAAC A?DFFFD@DGGGE?CHGGD@DEFGD@DGFHD@EHHGB@FGEGF@EHHEE@DEHHF@CGHEE@EGGGDA;CDEC?DE MD:Z:76 PG:Z:bwa.2 RG:Z:C010C.5 AM:i:0 NM:i:0 SM:i:0 MQ:i:20 OQ:Z:@@CFFFFFHHHHGFIJJIIJIGIIGIGIIJGGJJJI@GEHGIEIJIJGHIGGIJGJEHIGEHHHFFDF9>AAAAC= UQ:i:0
C010CACXX110731:3:2208:5779:47927 163 1 10028 0 76M = 10343 362 CCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCT ;CEB=BDEFC=AEGFD?EDFFE<EGFHC@EGFGC?BHHGC>FFFFE>:FCDD5=BEGB?=DGDE@CC7@C:<>ED# MD:Z:76 PG:Z:bwa.11 RG:Z:C010C.3 AM:i:0 NM:i:0 SM:i:0 MQ:i:0 OQ:Z:@@CFFFFFGGDBDHIFIGGIIIEHGGHDHEIIJDDDHHI@FHEBFHA;F==@38@AD>A9AEAHF>@2;>66;?B# UQ:i:0
C010CACXX110731:6:1104:11773:149879 99 1 10032 0 68M8S = 10178 188 AACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACC ABDEGC@EEEGE@DFGEE@CGFFD?EGFED>EHCFD@CGEGE@EGADC?@@DFE?DEDEA<ECFGC?######### MD:Z:68 PG:Z:bwa.19 RG:Z:C010C.6 AM:i:0 NM:i:0 SM:i:0 MQ:i:0 OQ:Z:@@@DFDFFDFHHHIIIGJIGIIIGEIIFGIGIJ@CFHGHGIGIGI>@BB=;=CGEGE@E>>CAED@D######### UQ:i:0
C010CACXX110731:5:1301:17327:162246 147 1 10045 20 76M = 10025 -95 ACCCTACCCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAACCCTAACCCT #B?>=>%DFG?>3HHI@B@IIHAA<GGJAA@HHG>:1III?@?IHI>?>GAI>ABHGH?B@IGH=;GGH==;FCD: MD:Z:6A69 PG:Z:bwa.2 RG:Z:C010C.5 AM:i:0 NM:i:1 SM:i:20 MQ:i:0 OQ:Z:#?<5>9'@EA?;)FDFHC?HHD@@7CEHFBBIGD?9*IIHDD?HGF??:GAGGFCJIHEGFJIFD<HHFDB=F@@? UQ:i:4
B02FEACXX110730:1:1208:17291:87203 99 1 10050 0 56M20S = 10180 167 AACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACA BBCFEB>CGEH>?=GFFD?EEFFD?FEFF/<;>C@D(5@@,>?DGH,F@FGGID?##################### MD:Z:56 PG:Z:bwa.6 RG:Z:B02FE.1 AM:i:0 NM:i:0 SM:i:0 MQ:i:0 OQ:Z:@@@DD?DDHBH;F;DEGBGGCHGHIICEE):9:?:D)0:9)9BDGH(=FHIFFCF##################### UQ:i:0
B02FEACXX110730:4:2204:14605:132091 163 1 10050 0 58M18S = 10180 174 AACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCGTAACC ;ABBD?<?CA?C?CEGEC?DFE9;4@?C=8<AEGAC@E;BCB>DBCF@BADFE>@<E################### MD:Z:58 PG:Z:bwa.3 RG:Z:B02FE.4 AM:i:0 NM:i:0 SM:i:0 MQ:i:0 OQ:Z:?@@AD>D?FF=DFGGHAHIGGE88<A?C;3??FDABGF9????FB?F;@=BGG6F7C################### UQ:i:0
B02FEACXX110730:5:1302:15620:33259 99 1 10057 0 53M23S = 10182 174 ACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAAACCCTGACCCTAAGCCTAACCC ACFGB@DEEEC=DEGGD@DGHGD@EFEEEAEDDEE?AEEFC@EFGGE@DHFB######################## MD:Z:53 PG:Z:bwa.4 RG:Z:B02FE.5 AM:i:0 NM:i:0 SM:i:0 MQ:i:0 OQ:Z:@CCFDFFDFDFAFGIIIJJIJJIJJGGEGCCFBFGA@FDFHCGGIIIIHIG?######################## UQ:i:0
So I can tentatively say it's not the BAM file. I managed to obtain an example bamfile from http://code.google.com/p/gasv/downloads/ And I still get the same error message. If anyone with a working version of picard could try this file and let me know whether it works or not, then, I might be able to say it's my install of picard, or my java install. Thanks for the help.
since the error happens right away you should also post the first few lines of your samfile, it is very likely that your file is not right in some way
Thanks Istvan, but as I mentioned, I am using a bam file, the first few lines of the bam file will not be human readable. Maybe someone could point me in the direction of a sample BAM file that is known to work, and then I can verify if the problem is with my system, or with the file?
what I meant is to transform the BAM to SAM, then show it
I converted the BAM file to a SAM file using BAMtools. I tried to include the header section of the SAM file and the first few alignments in my post above but it exceeds the character limit. Instead I've included the first few alignment only, as if there were a problem with the header, I would have assumed picard would never have extracted the first read at all.
Something weird is going on with the formatting of my post above. Anyway. I downloaded another BAM file from here: http://code.google.com/p/gasv/downloads/list and ran that but it still gave me the same error. So prob not the file. Any other ideas?