Question

Single Cell Pipeline for SmartSeq

0

Entering edit mode

24 months ago

hpumut • 0

Dear Community,

I want to align SmartSeq platform-based single-cell data to the human genome using Alevin and/or StarSolo and then create a Seurat object. My data is paired-end. Each FASTQ file is around 4-10 GB in size.

R1 and R2 pairs look like this:

R1:

@A00814:396:HYJJ7DMXX:2:1101:1976:1000 1:N:0:TTATAACC+NCGATATC
NTCTCTGTATCAGCATATTAGCAATAACATATTTTTAAATGAAGGTATGTA
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@A00814:396:HYJJ7DMXX:2:1101:4146:1000 1:N:0:TTATAACC+NCGATATC
NGCATCTTTATGGTGTTCTCTGTATTTCCTGAATTTGAATGTTGGCCTGCC
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@A00814:396:HYJJ7DMXX:2:1101:5376:1000 1:N:0:TTATAACC+NCGATATC
NGGGGGGAGAGCGCGGCGACGGGTCTCGCTCCCTCGGCCCCGGGATTCGGC
+
FFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFF

R2:

@A00814:396:HYJJ7DMXX:2:1101:1976:1000 2:N:0:TTATAACC+NCGATATC
GGTGCACATGAAGGCTATGTTTGCACTGTATTATGGTTTAAGTGTATAATA
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@A00814:396:HYJJ7DMXX:2:1101:4146:1000 2:N:0:TTATAACC+NCGATATC
AAACACTCTGCAGGATATTATCCAGGAGAACTTCCCCAACCTAGAAAGGCA
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFF,
@A00814:396:HYJJ7DMXX:2:1101:5376:1000 2:N:0:TTATAACC+NCGATATC
TACAGCCCCCCCGGCAGCAGCACTCGCCGAATCCCGGGGCCGAGGGAGCGA
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF

Does this data have barcodes? If yes, where is it?

TTATAACC+NCGATATC is present in all headers in the same FASTQ file. Different FASTQ files have different values at the end of the header. If TTATAACC+NCGATATC is the cell barcode, can I consider the whole FASTQ file to belong to a single cell? Or does this FASTQ file have sequence information for many cells?

I am looking forward to your assistance.

Thank you

scRNA-seq SmartSeq single-cell • 1.2k views

ADD COMMENT • link written 24 months ago by hpumut • 0

0

Entering edit mode

Yes this data is indexed. Indexes are located in header. Your guess is right. For this sample the are TTATAACC+NCGATATC. These are illumina indexes and they may be for the sample. Each cell may have its own cell barcodes. They will not be in the fastq header in general.

ADD REPLY • link updated 24 months ago by Ram 45k • written 24 months ago by GenoMax 153k

0

Entering edit mode

Thank you for your prompt reply. Thank you for the information that TTATAACC+NCGATATC is the index for the sample. Then, how can I obtain the cell barcode information? I do not have any extra files except R1 and R2 pairs.

ADD REPLY • link 24 months ago by hpumut • 0

score 0 · Answer 1 · 2023-08-25

0

Entering edit mode

24 months ago

ATpoint 89k

Smartseq is plate-based and each well in the plate (=1 cell) is a fastq file pair and there are no cellular barcodes or UMIs. Hence, you don't need any specific single-cell software. You can use salmon or STAR.

ADD COMMENT • link 24 months ago by ATpoint 89k

0

Entering edit mode

Thank you for your prompt reply. Each FASTQ file is around ~5 GB (uncompressed). Each FASTQ file belongs to one cell type, right? I am asking this question because I thought the size of the file is quite large for a single cell. I have 25 pairs of R1 and R2. Does it mean that I have sequencing information only for 25 cells?

ADD REPLY • link 24 months ago by hpumut • 0