Question

Stuck on how to run STARsolo on paired cDNA RNA-Seq FASTQ files

0

Entering edit mode

11 months ago

Nicholas ▴ 10

Hey everyone,

I don't know how to approach this; I've been stuck on it for a few days. To verify sure everything is operating well, I'm currently trying to run STARsolo/SoloTE on a data set that the SoloTE publication provided. I know how to work with various data sets; for example, the project I'm working on now uses a data set of human placental cell single-cell RNA-Seq data, and it works flawlessly with STARsolo. But I am having trouble using the data set that the publication provides on a sample run like here. I've never worked on paired cDNA RNA-Seq sample runs before, so I'm not sure what the parameters are that this sample run need to run correctly. I'm prefetching the SRA and fasterq-dumping them into two different files. I'm not familiar with the sequences that look like this:

+SRR9713162.492268 492268 length=150
AAFFFJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ----<7AF-<F-7<FFF--<F-----7A-AFFJFF<7-7---)7)7-7--7-<-7AA----AF7-7A7FF<FFFJFFAF<F----)))))7)7--<F--
@SRR9713162.492269 492269 length=150
TCAGCAACAGGACGTATTTCTTAGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTCATTCTAAGACTTTAAGTTCTCTGGCATGAGTTTATCTGCAATCATAAACTAAAAAATAACCCAAACACACCCCACCAAACCCAACCGTAC
+SRR9713162.492269 492269 length=150
-AFFFJJJJJFJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ--7<7-7------F7---<<-7---)-)-7---7--<-7-7-7--7-<<7----7------7----7----7)--)--)7---<--)7----
@SRR9713162.492270 492270 length=150
ACCTTTAAGGCTCTTAACCATATCCGTTTTTTTTTTCTTTTTTTTTTTTTTTTTTTCCTAGAGGAAAACCCGGTAATGATGTCGGGGTTGAGGGATAGGAGGAGGATGGGGGATAGGTGTATGAACATGAGGGTGTTTTCTCGTGTGAAT
+SRR9713162.492270 492270 length=150

How might I apply STARsolo to two lanes of cDNA files as described in the paper? Tell me if anyone can help, please. Thank you so much.

STARsolo FASTQ • 1.0k views

ADD COMMENT • link 11 months ago by Nicholas ▴ 10

0

Entering edit mode

Looks like this experiment is using Chromium Single Cell 3' v2 Reagent Kits. As you can see from the sequence you posted, read 1 contains 26 bp usable (UMI + cell barcodes) which are followed by ploy-T stretch. Rest of read 1 ~~useless~~ (is not going to be used by cellranger and perhaps STARsolo) even though it is sequenced to 150 cycles. So read 2 is going to be your RNA read.

ADD REPLY • link 11 months ago by GenoMax 151k

0

Entering edit mode

Hmm, I wouldn't necessarily call it useless. ;) The stuff after the polyT stretch often contains biological sequences that you can do paired-end mapping with your read 2. I'd just call it 'unnecessary' (for most purposes).

ADD REPLY • link 11 months ago by dsull ★ 7.5k

0

Entering edit mode

Fair enough. I amended my comment.

ADD REPLY • link 11 months ago by GenoMax 151k

0

Entering edit mode

Would it be more appropriate to use cellranger count as an alternative to STARsolo for quantifying gene expression in the paired-end single-cell cDNA RNA-Seq data? I'm just not sure if STARsolo could be ran on two paired cDNA files. I used cellranger, and it seemed to run perfectly fine without exceptions.

ADD REPLY • link 11 months ago by Nicholas ▴ 10

0

Entering edit mode

STARsolo will work just fine on your data. You'll probably need to set --soloBarcodeReadLength 0 so that STARsolo doesn't get confused that your R1 is 150 bp.