Entering edit mode
24 months ago
firestar
★
1.6k
Trying to run nf-core rnaseq pipeline for a smart-seq3 data. The pipeline requires paired-end reads to be specified as a samplesheet like this.
sample,fastq_1,fastq_2,strandedness
CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz,forward
CONTROL_REP2,AEG588A2_S2_L002_R1_001.fastq.gz,AEG588A2_S2_L002_R2_001.fastq.gz,forward
....
But my data has 4 fastq files per sample:
3.4G Nov 1 18:42 F13153_2001_S1_I1_001.fastq.gz
3.2G Nov 1 18:44 F13153_2001_S1_I2_001.fastq.gz
12G Nov 1 18:51 F13153_2001_S1_R1_001.fastq.gz
19G Nov 1 19:02 F13153_2001_S1_R2_001.fastq.gz
Anyone know how to specify this?
I'm not sure if you're the same person or not, but we're having an almost identical conversation on the nf-core Slack at the moment [ link / join ].
I don't know how the smart-seq3 library prep works, but you need to figure out where the cell identifiers are in the sequence so that the FastQ files can be split to one file per cell. Then you can run with the nf-core/rnaseq pipeline.
It's possible that the scrnaseq pipeline is able to handle this kind of data without initially splitting, I don't know. But we can continue this on Slack.
A colleague has referred me to the zUMIs pipeline which handles all scRNASeq data from 10X to smart-seq3.