Question

CellRanger works on older version but not new

1

Entering edit mode

3.3 years ago

Chris ▴ 10

I downloaded 3 SRA files. These had the same number of reads in each, one with an average length of 8bp, one with 26bp and one with 98bp. I assume in order these would be the I1, R1 and R2 files.

When I put these to run on cell ranger with the code:

cellranger count --id Sample_Output --transcriptome /transcriptome_dir/ --fastqs /fastq_dir/ --sample SRA --expect-cells 7000 \

with the files in the format Sample1_S1_L001_I1_001.fastq.gz, Sample1_S1_L001_R1_001.fastq.gz and Sample1_S1_L001_R2_001.fastq.gz this completes with CellRanger version 3.1.0 (which is installed already on my institutions HPC). However, the number of genes in each cell type is much lower than the original paper which uses the default CellRanger pipeline.

I thought the issue may be the older version of the pipeline, so I've tried running this with the newest version (6.1.2) but this fails with the following output:

[error] Pipestance failed. Error log at: Sample_Output/SC_RNA_COUNTER_CS/SC_MULTI_CORE/MULTI_CHEMISTRY_DETECTOR/_GEM_WELL_CHEMISTRY_DETECTOR/DETECT_COUNT_CHEMISTRY/fork0/chnk0-u65e35553ad/_errors \ \ Log message: FASTQ header mismatch detected at line 4 of input files

For Fastq files, the fourth line is quality score, so why would it matter if these are mismatched? For each of the 4 input samples the fourth line is :

GGGGGIII
#<<GAGIGGI.GGGIIIIGIGIGGGI
AGGGGIIAGGG.GGGGG.AGGGIIGIGIIGIIIGIGIGGGGGGIGGGGG..AG..GGGIGGGGIGIGGGGGIAGGAGAGGGIGGIIGAGG<AGGGGGG

Is there something obvious I'm miss understanding?

Many thanks,

Chris

CellRanger • 1.5k views

ADD COMMENT • link updated 3.3 years ago by GenoMax 152k • written 3.3 years ago by Chris ▴ 10

1

Entering edit mode

What do the fastq headers look like for your sample files? Do you know what version of cellranger was used with the published dataset. You may need to specify the version of chemistry used if new pipeline is unable to determine it.

ADD REPLY • link 3.3 years ago by GenoMax 152k

0

Entering edit mode

Thank you for the fast reply GenoMax!

The fastq headers are as follows:

@SRR10027173.1 1/1

@SRR10027174.1 1/1

@SRR10027175.1 1/2

The paper doesn't specify which version they use, but given the paper came out in 2019 it would have been a version prior to version 4. So maybe I should focus more on the fact my pipeline isn't getting a similar number of genes to theirs as opposed to trying to get version 6.1 working.

Adding --chemistry SC3Pv2 to the version 6.1 code does not fix the error. I have put this in the version 3.1 code and will update if this improves the gene count.

Many thanks,

Chris

ADD REPLY • link 3.3 years ago by Chris ▴ 10

0

Entering edit mode

Unfortunately specifying the chemistry with the above command to CellRanger V3.1.0 gave the exact same output as not specifying, meaning the number of genes is below that which the paper finds.

Many thanks,

Chris

ADD REPLY • link 3.3 years ago by Chris ▴ 10

1

Entering edit mode

Looking at the GEO entry for this accession I see the following:

Sample demultiplexing, alignment and gene counts were performed using the CellRanger v2.0.0 pipeline using default settings. Secondary analysis was performed using the Seurat2 package Genome_build: mm10

Are you using the correct genome build? They also seem to have used an old cellranger pipeline.

Not sure if this applies in your case : Difference between number of cells predicted by two different versions of cellranger

ADD REPLY • link 3.3 years ago by GenoMax 152k