Hello
I tried posting this question to the Broad's GATK help forum, as suggested in the picard documentation, but haven't yet gotten a response so I'm posting it here to the good people of Biostars. I'm using ExtractIlluminaBarcodes (picard version 2.18.7) for the first time and am encountering an error with the command:
java -jar picard.jar ExtractIlluminaBarcodes \
BASECALLS_DIR=/project/JIY3012/work/data/BaseCalls/ \
LANE=1 \
READ_STRUCTURE=250T8B250T \
BARCODE_FILE=/project/JIY3012/work/data/barcode_file \
METRICS_FILE=250T8B250T_metrics_output.txt \
NUM_PROCESSORS=36 \
MAX_MISMATCHES=0
This yields:
11:03:10.765 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/rosema1/BioInfo/bin/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Thu Jun 14 11:03:10 EDT 2018] ExtractIlluminaBarcodes BASECALLS_DIR=/project/JIY3012/work/data/BaseCalls LANE=1 READ_STRUCTURE=250T8B250T BARCODE_FILE=/project/JIY3012/work/data/barcode_file METRICS_FILE=250T8B250T_metrics_output.txt MAX_MISMATCHES=0 NUM_PROCESSORS=36 MIN_MISMATCH_DELTA=1 MAX_NO_CALLS=2 MINIMUM_BASE_QUALITY=0 MINIMUM_QUALITY=2 COMPRESS_OUTPUTS=false VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Thu Jun 14 11:03:10 EDT 2018] Executing as rosema1@usrebcs11.nafta.syngenta.org on Linux 2.6.32-696.18.7.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_60-b27; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.18.7-SNAPSHOT
INFO 2018-06-14 11:03:10 ExtractIlluminaBarcodes Processing with 36 PerTileBarcodeExtractor(s).
[Thu Jun 14 11:03:10 EDT 2018] picard.illumina.ExtractIlluminaBarcodes done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=2058354688
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" picard.PicardException: Expected CycledIlluminaFileMap to contain 8 cycles but only 0 were found!
at picard.illumina.parser.CycleIlluminaFileMap.assertValid(CycleIlluminaFileMap.java:66)
at picard.illumina.parser.IlluminaDataProviderFactory.makeParser(IlluminaDataProviderFactory.java:407)
at picard.illumina.parser.IlluminaDataProviderFactory.makeDataProvider(IlluminaDataProviderFactory.java:292)
at picard.illumina.ExtractIlluminaBarcodes$PerTileBarcodeExtractor.(ExtractIlluminaBarcodes.java:750)
at picard.illumina.ExtractIlluminaBarcodes.doWork(ExtractIlluminaBarcodes.java:317)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:282)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)
Perhaps this has something to do with my READ_STRUCTURE string (250T8B250T). These libraries were sequenced with dual unique barcodes with UMIs. I am interested in processing them using single indices (hence my attempted use of 250T8B250T), dual unique indices (250T8B8B250T), and dual unique indices with UMIs (250T8B9M8B250T). I am not confident that these READ_STRUCTURES are correct or if this is the cause of the error. Note that I tried the other READ_STRUCTURES I mentioned but got similar errors.
Additionally, my barcode file looks like this:
barcode_sequence_1 barcode_sequence_2 barcode_name library_name
CTGATCGTNNNNNNNNN ATATGCGC Dual Index UMI Adapter 1 GAR2161A459
ACTCTCGANNNNNNNNN TGGTACAG Dual Index UMI Adapter 2 GAR2161A460
TGAGCTAGNNNNNNNNN AACCGTTC Dual Index UMI Adapter 3 GAR2161A461
GAGACGATNNNNNNNNN TAACCGGT Dual Index UMI Adapter 4 GAR2161A462
CTTGTCGANNNNNNNNN GAACATCG Dual Index UMI Adapter 5 GAR2161A463
TTCCAAGGNNNNNNNNN CCTTGTAG Dual Index UMI Adapter 6 GAR2161A464
CGCATGATNNNNNNNNN TCAGGCTT Dual Index UMI Adapter 7 GAR2161A465
ACGGAACANNNNNNNNN GTTCTCGT Dual Index UMI Adapter 8 GAR2161A466
CGGCTAATNNNNNNNNN AGAACGAG Dual Index UMI Adapter 9 9
ATCGATCGNNNNNNNNN TGCTTCCA Dual Index UMI Adapter 10 10
GCAAGATCNNNNNNNNN CTTCGACT Dual Index UMI Adapter 11 11
(etc.)
I included all 384 barcodes as I am interested in observing any cross-talk that occurs.
Thank you for your help
Mark
@Mark: You may want to look at a tool written specifically for handling UMI's (UMI-tool). deML may also be a possible option.
I was originally looking at UMI-tool but then switched to the picard/fgbio approach as it is what is recommended by IDT, the supplier of the unique, dual index, UMI adapters that are being used in this study. If I can't get this approach to work, I will further explore your suggestions. Thanks
That error seems to indicate that your file format for the barcode file is incorrect. Is that a tab delimited file?
Yes, it is tab-delimited. And I have tried a few different versions of it as well with no success, including only using 3 columns when trying to demultiplex based on a single index and reordering the columns to "barcode_name library_name barcode_sequence_1 barcode_sequence_2" as suggested in some online posts.