Background:
Within a single experiment I have multiple FASTQ libraries,
where a library = library_i_I1.fastq.gz, library_i_R1.fastq.gz, library_i_R2.fastq.gz
Issue:
Half of them just run but then I hit several where I'm getting the following when cellranger tries DETECT_COUNT_CHEMISTRY
:
2024-06-29 07:28:21 [runtime] (failed) ID.Human_colon_16S8159182.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_CHEMISTRY_DETECTOR.DETECT_COUNT_CHEMISTRY
[error] Pipestance failed. Error log at:
Human_colon_16S8159182/SC_RNA_COUNTER_CS/SC_MULTI_CORE/MULTI_CHEMISTRY_DETECTOR/DETECT_COUNT_CHEMISTRY/fork0/chnk0-u026f7fb794/_errors
Log message:
The read lengths are incompatible with all the chemistries for Sample Human_colon_16S8159182 in "/workspaces/torch_ddsm/_data_pool1/E-MTAB-9536_5prime_raw_fetal".
- read1 median length = 25
- read2 median length = 25
- index1 median length = 8
The minimum read length for different chemistries are:
SFRP - read1: 26, read2: 30, index1: 0
SC5P-R2 - read1: 26, read2: 25, index1: 0
SC5P-PE - read1: 81, read2: 25, index1: 0
SC3Pv1 - read1: 25, read2: 10, index1: 14
SC3Pv2 - read1: 26, read2: 25, index1: 0
SC3Pv3 - read1: 26, read2: 25, index1: 0
SC3Pv3LT - read1: 26, read2: 25, index1: 0
SC3Pv3HT - read1: 26, read2: 25, index1: 0
Digging:
For reference, I went back to try and compare these estimates to those computed for libraries that ran without error, however...
For libraries where the run proceeds, the log only shows:
2024-06-27 14:54:11 [runtime] (chunks_complete) ID.FCA_gut8015060.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_CHEMISTRY_DETECTOR.DETECT_COUNT_CHEMISTRY
Also, for these successful runs, I ran a full text search for the term "length" in the logs and there is no information about the parameters that were computed for library detection in either metrics_summary.csv
or web_summary.html
.
Does anyone know how to get this information?
Perhaps it is doing something simple initially, like comparing the length of the input reads. In case shown above the data does not seem to match any of the options displayed as valid chemistries.
Right, if the minimum reads for 5'v2 are 25, I expected the other libraries to be median 50bp or something, that is why I wanted to know where these stats are logged for runs where the detection passes.
However, if the median for the others is pretty close to the medians reported here, I might try just skipping the chemistry check
Skip the check (you should know what you're analyzing anyway) or use something like STARsolo. Regardless, why is the cDNA read only 25bp? That's bad as short teads are prone to multimap. It should be 91bp as recommended by 10x.