Entering edit mode
2.6 years ago
RNG_Daemon
▴
20
I am demultiplexing a S4 sequencing run and I am running into the following eror:
INFO 2022-04-08 17:31:08 ExtractIlluminaBarcodes Extracting barcodes for tile 2666
INFO 2022-04-08 17:31:08 ExtractIlluminaBarcodes Extracting barcodes for tile 2674
ERROR 2022-04-08 17:31:08 ExtractIlluminaBarcodes Error processing tile 2667
picard.PicardException: Unrecognized data type(Cbcl) found by IlluminaDataProviderFactory!
at picard.illumina.parser.IlluminaDataProviderFactory.makeParser(IlluminaDataProviderFactory.java:400)
at picard.illumina.parser.IlluminaDataProviderFactory.makeDataProvider(IlluminaDataProviderFactory.java:249)
at picard.illumina.parser.IlluminaDataProviderFactory.makeDataProvider(IlluminaDataProviderFactory.java:228)
at picard.illumina.ExtractIlluminaBarcodes$PerTileBarcodeExtractor.run(ExtractIlluminaBarcodes.java:355)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
I checked the MD5sum of the raw data several times and I am not sure, where the error could be. This is the command I use:
picard -Xmx64g -Xms64g ExtractIlluminaBarcodes \
-B /data/BHX2/Data/Intensities/BaseCalls \
-L 1 \
--NUM_PROCESSORS 8 \
-M metrices/barcode_metrices1.txt \
-TMP_DIR /data/tmp \
-BARCODE_FILE /my_dir/barcode1.csv \
-RS 148T8B9M8B148T
I also run a check on the Basecall dir. My Picard version is 2.26.11 Here is the run Info:
<Read Number="1" NumCycles="148" IsIndexedRead="N"/>
<Read Number="2" NumCycles="17" IsIndexedRead="Y"/>
<Read Number="3" NumCycles="8" IsIndexedRead="Y"/>
<Read Number="4" NumCycles="148" IsIndexedRead="N"/>
It's dual index data with UMIs
EDIT: The first error is a "File not Found". I checked, the file does exist, though.-
picard.PicardException: File not found: (/data/BHX2/Data/Intensities/BaseCalls/L004/C275.1/L004_2.cbcl)
at picard.illumina.parser.readers.BaseBclReader.open(BaseBclReader.java:93)
at picard.illumina.parser.readers.CbclReader.readHeader(CbclReader.java:127)
at picard.illumina.parser.readers.CbclReader.readTileData(CbclReader.java:200) at picard.illumina.parser.readers.CbclReader.advance(CbclReader.java:275) at picard.illumina.parser.readers.CbclReader.hasNext(CbclReader.java:252) at picard.illumina.parser.NewIlluminaDataProvider.hasNext(NewIlluminaDataProvider.java:125)
at picard.illumina.ExtractIlluminaBarcodes$PerTileBarcodeExtractor.run(ExtractIlluminaBarcodes.java:363) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.FileNotFoundException: /data/BHX2/Data/Intensities/BaseCalls/L004/C275.1/L004_2.cbcl (Too many open files)
at java.io.FileInputStream.open0(Native Method) at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at picard.illumina.parser.readers.BaseBclReader.open(BaseBclReader.java:90)
I also let the demultiplexing run single core. Same error
I assume you don't have access to
bcl-convert
orbcl2fastq
which is why you are using Picard? Process appears to be reading othercbcl
files before it encounters this error.Yes, I have to use
picard
since it is part of our pipeline. You are right, there is another error. I edited the question. Thanks for pointing that out!You may need to check with your sys admins about this. While there are generic solutions http://www.mastertheboss.com/java/hot-to-solve-the-too-many-open-files-error-in-java-applications/ not sure if they apply in your case. Since you have tried to use a single core and still get that error.
So the issue is really that
Picard
opens to many files. I monitored the open files of the process withlsof
and it quickly exceeds 120000 files, which is the maximum that I can set withulimit -n
.I also set the
Picard
parameter--MAX_RECORDS_IN_RAM 50000000
, to limit the amount of files written, but to no avail./data/temp
is a real directory on a file system (from your command line)?The dir exists, I have write permisson and enough space. But I don't see actual files written there other than
libgkl_compressionXXXX.so
. The bash variable$TMPDIR
also points there.I added
picard -Djava.io.tmpdir=/data/tmp/
to the system call. But still, I don't see any temporary files.So in the end, I "solved" it by not using
Picard
. Usedbcl2fastq
to seperate the samples into five different fasta files (Two for the reads, two for the sequence indexes, one for the UMI) and then putting them back together withfgbio FastqToBam
. I then sorted the uBAM files withpicard SortSam
by "queryname".From here, I could continue with my pipeline.