I downloaded 3 SRA files. These had the same number of reads in each, one with an average length of 8bp, one with 26bp and one with 98bp. I assume in order these would be the I1, R1 and R2 files.
When I put these to run on cell ranger with the code:
cellranger count --id Sample_Output --transcriptome /transcriptome_dir/ --fastqs /fastq_dir/ --sample SRA --expect-cells 7000 \
with the files in the format Sample1_S1_L001_I1_001.fastq.gz, Sample1_S1_L001_R1_001.fastq.gz and Sample1_S1_L001_R2_001.fastq.gz this completes with CellRanger version 3.1.0 (which is installed already on my institutions HPC). However, the number of genes in each cell type is much lower than the original paper which uses the default CellRanger pipeline.
I thought the issue may be the older version of the pipeline, so I've tried running this with the newest version (6.1.2) but this fails with the following output:
[error] Pipestance failed. Error log at: Sample_Output/SC_RNA_COUNTER_CS/SC_MULTI_CORE/MULTI_CHEMISTRY_DETECTOR/_GEM_WELL_CHEMISTRY_DETECTOR/DETECT_COUNT_CHEMISTRY/fork0/chnk0-u65e35553ad/_errors \ \ Log message: FASTQ header mismatch detected at line 4 of input files
For Fastq files, the fourth line is quality score, so why would it matter if these are mismatched? For each of the 4 input samples the fourth line is :
- GGGGGIII
- #<<GAGIGGI.GGGIIIIGIGIGGGI
- AGGGGIIAGGG.GGGGG.AGGGIIGIGIIGIIIGIGIGGGGGGIGGGGG..AG..GGGIGGGGIGIGGGGGIAGGAGAGGGIGGIIGAGG<AGGGGGG
Is there something obvious I'm miss understanding?
Many thanks,
Chris
What do the fastq headers look like for your sample files? Do you know what version of cellranger was used with the published dataset. You may need to specify the version of chemistry used if new pipeline is unable to determine it.
Thank you for the fast reply GenoMax!
The fastq headers are as follows:
@SRR10027173.1 1/1
@SRR10027174.1 1/1
@SRR10027175.1 1/2
The paper doesn't specify which version they use, but given the paper came out in 2019 it would have been a version prior to version 4. So maybe I should focus more on the fact my pipeline isn't getting a similar number of genes to theirs as opposed to trying to get version 6.1 working.
Adding
--chemistry SC3Pv2
to the version 6.1 code does not fix the error. I have put this in the version 3.1 code and will update if this improves the gene count.Many thanks,
Chris
Unfortunately specifying the chemistry with the above command to CellRanger V3.1.0 gave the exact same output as not specifying, meaning the number of genes is below that which the paper finds.
Many thanks,
Chris
Looking at the GEO entry for this accession I see the following:
Are you using the correct genome build? They also seem to have used an old cellranger pipeline.
Not sure if this applies in your case : Difference between number of cells predicted by two different versions of cellranger