Hi, I'm quite new to scRNA-seq and I got an error as below while trying to run cellranger (version 7.2.0) for public human brain cortex data.
[error] Pipestance failed. Error log at:
FL/SC_RNA_COUNTER_CS/SC_MULTI_CORE/MULTI_CHEMISTRY_DETECTOR/DETECT_COUNT_CHEMISTRY/fork0/chnk0-u731a5891c0/_errors
Log message:
An extremely low rate of correct barcodes was observed for all the candidate chemistry choices for the input: Sample FL in "path/to/FL/fastq/files". Please check your input data.
- 0.1% for chemistry SC3Pv3
- 0.1% for chemistry SC3Pv3HT
- 0.0% for chemistry SC5P-PE
- 0.0% for chemistry SC3Pv2
- 0.0% for chemistry SC3Pv3LT
Waiting 6 seconds for UI to do final refresh.
Pipestance failed. Use --noexit option to keep UI running after failure.
The public data I used was PRJNA491456 (included only paired end datas for my analysis) and I'm sure they are all single cell RNA-seq data, and pretty sure they are all 10x data (metadata says they are prepared from Illumina HiSeq 4000). I've searched some posts related with my problem but most of the problems happened because of different library preparation method and also this data is not multiome data (https://kb.10xgenomics.com/hc/en-us/articles/17959105349389-How-can-I-analyze-only-my-Multiome-Gene-Expression-library-with-Cell-Ranger-on-10x-Genomics-Cloud-Analysis-). When I ran cellranger with another public dataset from Illumina 10x 3' it worked well, so I think this problem SHOULD be related with library prep things and finding appropriate chemistry (if this data is not from 10x) will help my problem... Any help will be appreciated!
+) input datas format are like as below;
+) it seems all the runs are from individual sample (all of the runs had different sample code) but if the runs originate from the same brain region, I grouped them together by my mind and used as input for cellranger
Hello, first thank you so much for your help! But I got some more questions:
I think the related part is before 'TopHat(version 2.0.14)...' and as far as I know there are cell barcodes in Read 2 and TSO and poly(A) sequence in Read 1 also in Illumina 10x 3' library prep method, what part in here made you to think that this data was non-10x-data?
Thanks!
With 10x technology, read 1 consists of cellbarcodes + UMI (26 or 28 bp) and Read 2 is RNA read. In this case they are clearly saying that cellbarcodes and UMI are in read 2. You will need to see what the structure of read 2 is in this data to discern where the cellbarcodes and UMI's are.The methods section of the publication associated with data should have relevant details.
Since they used TopHat you can use STAR instead. But if you are able to figure out where the barcodes+UMI are then you could use STARsolo.
That may be tricky at best, especially if different parts of RNA are sampled (10x mainly does 3-'end for scRNAseq). There will be batch effects that you would likely not be able to address.
Oh, then I was confused, thank you for clarification.
I checked the methods part of the paper and it seems this data was also generated by 3' end sequencing, then maybe I can first try doing downstream analysis after merging two datasets and if it fails I can try the same thing without merging... This dataset kinda annoys me.
Anyway thank you so much your help! Your answers really helped me a lot!