Entering edit mode
2.1 years ago
dk0319
▴
70
I am interested in analyzing some scRNA-seq data from GEO and I found that the samples have multiple runs (4 in some cases). I am wondering if someone has experience with this and if there is a recommended standard procedure for handling multiple runs?
For ChIP or bulk-rna seq I would just merge the files but I am unsure if this is normal for single-cell data.
Please check if the four sample named in pattern L001/001/003/004 which can be understood by sequencing was done in 4 lane of the sequencer and you can merge the fastq file to run your downstream analysis.
This is how the metadata is presented on SRA explorer. Just to confirm I should be able to merge these as 1?
"SRX15246019: GSM4274678: BMET1-Tumor; Homo sapiens; RNA-Seq 4 ILLUMINA (Illumina HiSeq 4000) runs: 130.2M spots, 12G bases, 5.3Gb downloads"
It says it's RNA-seq.
Yes, but the data is scRNA-seq with R1, R2 and L1 fastq files
You are correct this is how the runs are labeled as L001-004. My question now is because each of these samples consists of R1, R2 and L1 fastq files, am I correct in assuming I need to merge all the R1s and R2s and L1s individually to produce 3 final fastq files?
Do you mean the samples come from different batches? In that case, you want to account for batch effect. You can find a nice explanation on how to account for batch effect here.