Hey everyone,
I don't know how to approach this; I've been stuck on it for a few days. To verify sure everything is operating well, I'm currently trying to run STARsolo/SoloTE on a data set that the SoloTE publication provided. I know how to work with various data sets; for example, the project I'm working on now uses a data set of human placental cell single-cell RNA-Seq data, and it works flawlessly with STARsolo. But I am having trouble using the data set that the publication provides on a sample run like here. I've never worked on paired cDNA RNA-Seq sample runs before, so I'm not sure what the parameters are that this sample run need to run correctly. I'm prefetching the SRA and fasterq-dumping them into two different files. I'm not familiar with the sequences that look like this:
+SRR9713162.492268 492268 length=150
AAFFFJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ----<7AF-<F-7<FFF--<F-----7A-AFFJFF<7-7---)7)7-7--7-<-7AA----AF7-7A7FF<FFFJFFAF<F----)))))7)7--<F--
@SRR9713162.492269 492269 length=150
TCAGCAACAGGACGTATTTCTTAGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTCATTCTAAGACTTTAAGTTCTCTGGCATGAGTTTATCTGCAATCATAAACTAAAAAATAACCCAAACACACCCCACCAAACCCAACCGTAC
+SRR9713162.492269 492269 length=150
-AFFFJJJJJFJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ--7<7-7------F7---<<-7---)-)-7---7--<-7-7-7--7-<<7----7------7----7----7)--)--)7---<--)7----
@SRR9713162.492270 492270 length=150
ACCTTTAAGGCTCTTAACCATATCCGTTTTTTTTTTCTTTTTTTTTTTTTTTTTTTCCTAGAGGAAAACCCGGTAATGATGTCGGGGTTGAGGGATAGGAGGAGGATGGGGGATAGGTGTATGAACATGAGGGTGTTTTCTCGTGTGAAT
+SRR9713162.492270 492270 length=150
How might I apply STARsolo to two lanes of cDNA files as described in the paper? Tell me if anyone can help, please. Thank you so much.
Looks like this experiment is using
Chromium Single Cell 3' v2 Reagent Kits
. As you can see from the sequence you posted, read 1 contains 26 bp usable (UMI + cell barcodes) which are followed by ploy-T stretch. Rest of read 1useless(is not going to be used bycellranger
and perhapsSTARsolo
) even though it is sequenced to 150 cycles. So read 2 is going to be your RNA read.Hmm, I wouldn't necessarily call it useless. ;) The stuff after the polyT stretch often contains biological sequences that you can do paired-end mapping with your read 2. I'd just call it 'unnecessary' (for most purposes).
Fair enough. I amended my comment.
Would it be more appropriate to use
cellranger count
as an alternative toSTARsolo
for quantifying gene expression in the paired-end single-cell cDNA RNA-Seq data? I'm just not sure if STARsolo could be ran on two paired cDNA files. I used cellranger, and it seemed to run perfectly fine without exceptions.STARsolo will work just fine on your data. You'll probably need to set
--soloBarcodeReadLength 0
so that STARsolo doesn't get confused that your R1 is 150 bp.I understand. Thank you!