Question

STARsolo ; Reads are NOT mapped to transcriptome

0

Entering edit mode

12 months ago

gdfsnkfns • 0

Hello everyone,

Now I'm conducting analysis of my 10x scRNA seq data using STARsolo.

Mapping procedure seemed to work well, and output data files including Running report (Summary.csv ) were created. However, I suppose there are some strange points in the obtained results.

Running report of my analysis is following;

Number of Reads 352455678
Reads With Valid Barcodes   0.903825
Sequencing Saturation   0.755087
Q30 Bases in CB+UMI 0.964867
Q30 Bases in RNA read   0.914494
Reads Mapped to Genome: Unique+Multiple 0.922609
Reads Mapped to Genome: Unique  0.646798
**Reads Mapped to Transcriptome: Unique+Multipe Genes   0.0555096
Reads Mapped to Transcriptome: Unique Genes 0.0438785**
Estimated Number of Cells   3764
Reads in Cells Mapped to Unique Genes   13472579
Fraction of Reads in Cells  0.896591
Mean Reads per Cell 3579
Median Reads per Cell   2896
UMIs in Cells   3325376
Mean UMI per Cell   883
Median UMI per Cell 713
Mean Genes per Cell 493
Median Genes per Cell   423
Total Genes Detected    18869

I suppose the values of "Reads Mapped to Transcriptome: Unique+Multipe Genes" and "Reads Mapped to Transcriptome: Unique Genes" seems not proper. (When my colleague analyzed the same data with CellRanger, there were no problems in the sequencing and alignment quality.)

The command I executed was the following;

STAR --runThreadN 16 --genomeDir reference  --soloCBwhitelist 737K-august-2016.txt --outFileNamePrefix patient1_ --readFilesCommand gzcat --soloBarcodeReadLength 1  --clip5pNbases 39 0 --soloType CB_UMI_Simple   --soloCBstart 1   --soloCBlen 16   --soloUMIstart 17   --soloUMIlen 10 --readFilesIn Fastq/patient1_GEX/patient1_GEX_S4_L003_R2_001.fastq.gz Fastq/patient1_GEX/patient1_GEX_S4_L003_R1_001.fastq.gz

If anyone knows of any causes or solutions, I would appreciate it if you could enlighten me.

Best regards,

STARsolo • 1.5k views

ADD COMMENT • link updated 11 months ago by GenoMax 150k • written 12 months ago by gdfsnkfns • 0

0

Entering edit mode

Just checking that you used the correct whitelist. List you used seems to be from an old version of 10x/cellranger release.

ADD REPLY • link 12 months ago by GenoMax 150k

0

Entering edit mode

Thank you very much for your kind advice.

I have re-confirmed the wishlist based on your comment, but I suppose it seems to be correct...

My 10x analysis was based on the chemistry Single Cell 5' v1 and v2, so according to the 10x website (https://kb.10xgenomics.com/hc/en-us/articles/115004506263-What-is-a-barcode-whitelist), the whitelist 737k-august-2016.txt corresponds to this assay.

※When I tried another whitelist "3M-5pgex-jan-2023.txt" for Single Cell 5' v3, command did not worked and the following error message appeared.EXITING because of FATAL ERROR in input CB whitelist file: Fastq/3M-5pgex-jan-2023.txt.gz the total length of barcode sequence is 31 not equal to expected 26

If you know any sources of the updated whitelist, please let me know.

Sincerely,

ADD REPLY • link 12 months ago by gdfsnkfns • 0

GenoMax · Accepted Answer · 2024-04-09

2

Entering edit mode

12 months ago

dsull ★ 7.3k

Why do you have clip5pNbases? Are you doing the 5' protocol where an adapter exists in R1? Also, are you sure your strandedness setting is correct => maybe try it with --soloStrand Reverse?

ADD COMMENT • link 12 months ago by dsull ★ 7.3k

0

Entering edit mode

Thank you very much for your kind advice.

Our assay is based on 10x chemistry "Single Cell 5' R2-only", so adding " clip5pNbases" command is not a matter, probably I think...

I tried the command omitting "clip5pNbases", but output result was not changed.

If you have any ideas on this issue, I would appreciate it if you could let me know. Thank you so much!

ADD REPLY • link 12 months ago by gdfsnkfns • 0

1

Entering edit mode

Did you try the --soloStrand Reverse?

ADD REPLY • link 12 months ago by dsull ★ 7.3k

0

Entering edit mode

Thank you very much for your prompt response, and I'm very sorry for missing very important suggestion.

I reviewed the script again and added the command --soloStrand Reverse as you suggested.

In this time, my sequencing data seems to be properly mapped to transcript.

Following is the executed command and running summary.

(If there is anything in the script that needs to be corrected, we would appreciate it if you could point it out to us.)

STAR --runThreadN 16 --genomeDir reference  --soloCBwhitelist Fastq/737K-august-2016.txt --outFileNamePrefix patient2_ --readFilesCommand gzcat --soloBarcodeReadLength 1 --soloType CB_UMI_Simple   --soloCBstart 1   --soloCBlen 16   --soloUMIstart 17   --soloUMIlen 10 --soloStrand Reverse --outSAMtype BAM SortedByCoordinate --outSAMattributes CR UR CY UY CB UB --readFilesIn Fastq/patient1_GEX/patient1_GEX_S3_L003_R2_001.fastq.gz Fastq/patient1_GEX/patient1_GEX_S3_L003_R1_001.fastq.gz

Number of Reads 342455698
Reads With Valid Barcodes 0.903792
Sequencing Saturation 0.865151
Q30 Bases in CB+UMI 0.964867
Q30 Bases in RNA read 0.914494
Reads Mapped to Genome: Unique+Multiple 0.924652
Reads Mapped to Genome: Unique 0.749886
Reads Mapped to Transcriptome: Unique+Multipe Genes 0.641215
Reads Mapped to Transcriptome: Unique Genes 0.572322
Estimated Number of Cells 3919
Reads in Cells Mapped to Unique Genes 185200186
Fraction of Reads in Cells 0.944924
Mean Reads per Cell 47257
Median Reads per Cell 39985
UMIs in Cells 24624872
Mean UMI per Cell 6283
Median UMI per Cell 5284
Mean Genes per Cell 1932
Median Genes per Cell 1776
Total Genes Detected 23696

Your suggestion really helped me. Thank you so much!

ADD REPLY • link updated 11 months ago by GenoMax 150k • written 11 months ago by gdfsnkfns • 0

0

Entering edit mode

You can go ahead and accept @dsull's answer (I moved the comment to an answer) to provide closure to this thread (green checkmark).

ADD REPLY • link 11 months ago by GenoMax 150k