Hello,
I'm new to this area. Please help me with this. I'd like to generate test fastq files that contain only a few hundred subset of cells from paired-end scRNA seq fastq files downloaded from https://www.10xgenomics.com/resources/datasets/whole-blood-rbc-lysis-for-pbmcs-neutrophils-granulocytes-3-3-1-standard. Here are the steps I have tried:
First I generated a whitelist of barcodes using umi_tools
for R1 with the following command:
umi_tools whitelist --stdin WB_Lysis_Granulocytes_3p_Introns_8kCells_S1_L001_R1_001.fastq.gz \
--extract-method=string \
--bc-pattern=CCCCCCCCCCCCCCCCNNNNNNNNNN \
--log2stderr > whitelist.txt
Then, I used a subset of these barcodes to extract the R1_subset.fastq and R2_subset.fastq files using the following commands:
umi_tools extract --bc-pattern=CCCCCCCCCCCCCCCCNNNNNNNNNN \
--stdin WB_Lysis_Granulocytes_3p_Introns_8kCells_S1_L001_R1_001.fastq.gz \
--stdout WB_Lysis_Granulocytes_3p_Introns_8kCells_S1_L001_R1_001_extracted.fastq.gz \
--read2-in WB_Lysis_Granulocytes_3p_Introns_8kCells_S1_L001_R2_001.fastq.gz \
--read2-out=WB_Lysis_Granulocytes_3p_Introns_8kCells_S1_L001_R2_001_extracted.fastq.gz \
--whitelist=whitelist1_subset.txt
where whitelist1_subset.txt
is given like
AAAGAACTCGGCATCG
AAAGGATTCGCAGATT
AAAGGTATCCAAGCAT
AACAAAGAGAACAAGG
AACAAGATCGAATCCA
However, when I tried using cell ranger on the extracted fastq files, I encountered the following error message:
Log message:
The read lengths are incompatible with all the chemistries for Sample WB_Lysis_Granulocytes_3p_Introns_8kCells in "/mnt/ufs18/home-238/suzhe/Data/scrnaseqworkshop/WB_Lysis_Granulocytes_3p_Introns_8kCells_fastqs/subset".
- read1 median length = 2
- read2 median length = 90
- index1 median length = 0
The minimum read length for different chemistries are:
SFRP - read1: 26, read2: 30, index1: 0
SC5P-R2 - read1: 26, read2: 25, index1: 0
SC5P-PE - read1: 81, read2: 25, index1: 0
SC3Pv1 - read1: 25, read2: 10, index1: 14
SC3Pv2 - read1: 26, read2: 25, index1: 0
SC3Pv3 - read1: 26, read2: 25, index1: 0
SC3Pv3LT - read1: 26, read2: 25, index1: 0
SC3Pv3HT - read1: 26, read2: 25, index1: 0
We expect that at least 50% of the reads exceed the minimum length.
Is there anyone who can help resolve this issue? The original fastq files work with cell ranger.
Thank you very much!
Why don't you use
seqtk
to generate test fastq files for a subset of cells and then runcellranger
in it? You can do something like this-For the reference check here- https://bioinf.shenwei.me/seqkit/usage/#sample
Thank you for your reply. Actually I tried seqtk previously and it gave me an error message when using cellranger count on the generated fastq files:
Not sure how to fix this. It seems that cellranger count also requires at least 10000 reads, which makes the test experiment slow. What is the reason for that?
In addition, is getting a subset of reads the same as getting a subset of cells (barcodes)?
Here is the
log message
fromcellranger
when your reads is less than 10000-Yes, this is the log message if the reads is less than 10000. Do you know why cell ranger requires at least 10000 reads?
As stated in the log file- if you use less than 10000 reads, there will not be enough reads to detect the specific chemistry type and version.
I am just confused why 10000 reads provide sufficient information for the chemistry type and version but 9999 reads do not. What is the reason to set it to 10000?
You could ask to 10X genomics https://www.10xgenomics.com/contact
Sure. Thank you for your replies.