Hi all,
I have some 10x v3 single cell rna seq fastq files that I am trying to map to human genome using STAR aligner. However, I am getting the following error and hope that some of you can help:
EXITING because of FATAL ERROR in input read file: the total length of barcode sequence is 150 not equal to expected 28
I have checked the FASTQ file for Read 1 and see that it is full 150bp. For instance one of the reads is:
"Read ID=@A00551:244:HFHKLDSX2:1:1101:1488:1063 1 N 0 Sequence=GAGGCAAGTGGCAGATCGTTTCAACATTGTTCCTGCGCAACACAGAATAGAGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
As per STAR, the solution hint provided is the following:
"SOLUTION: make sure that the barcode read is the last file in --readFilesIn , and check that it has the correct formatting If UMI+CB length is not equal to the barcode read length, specify barcode read length with --soloBarcodeReadLength"
My question is (a). Do I need to trim the R1 reads to a length of 28bp before alignment? or (b) Should I just specify the --soloBarcodeReadLength option in STAR to be 150?
In CellRanger the length of reads is R1 is not problematic as there are options to specify trimming or the tool just ignores the rest of the reads.
Any help is appreciated.
Thank you.
What command are you running? Your probably didn't set the barcode position and length arguments.
Hi rpolicastro
I am running the following:
Thank you.
See: STARsolo config for 10x Chromium v1, v2, v3
A poster indicated that setting the proper length did not seem to work so you can simply hard trim read 1 to correct length.
Hi GenoMax
Thanks! I tried with putting the --soloBarcodeReadLength option in STAR to be 150 and there was no problem then. Mapping was completed sccessfully.
The final log out are as follows and indicate a good percentage of uniquely mapped reads.
Moreover the solo output summary is
which seems to be comparable to a CellRanger output summary for a sample that was mapped earlier. The only significant difference that I see is in the Q30 Bases in Barcode which is low in the STAR solo run as compared to the Cell Ranger (~97.5%).
Can you please suggest some trimming tool that is appropriate in case I want to hard trim R1 reads?
Thank you.
Hmm. You obviously don't have 150 bp barcodes but if you did get good alignments then I suppose
STAR
behaved likecellranger
in ignoring the rest of the read.reformat.sh
from BBMap suite will work. Useforcetrimright=NN
option to remove number of bases you want.Thanks for your suggestion. Yes, it is true that the barcodes for v3 is not 150bp and I think this is why the value for 'Q30 Bases in Barcode' statistic is low in the STAR solo run. FastQC on R1 showed that the quality was poor after 28bp. Also, I am not sure how this will impact the counts matrix and working with Seurat for downstream analysis.
I think the problem might have just been that they forgot to specify the cell barcode start position argument
--soloCBstart
. You theoretically (and practically in my experience) shouldn't have to trim the R1 read.