Single Cell Pipeline for SmartSeq
2
0
Entering edit mode
16 months ago
hpumut • 0

Dear Community,

I want to align SmartSeq platform-based single-cell data to the human genome using Alevin and/or StarSolo and then create a Seurat object. My data is paired-end. Each FASTQ file is around 4-10 GB in size.

R1 and R2 pairs look like this:

R1:

@A00814:396:HYJJ7DMXX:2:1101:1976:1000 1:N:0:TTATAACC+NCGATATC
NTCTCTGTATCAGCATATTAGCAATAACATATTTTTAAATGAAGGTATGTA
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@A00814:396:HYJJ7DMXX:2:1101:4146:1000 1:N:0:TTATAACC+NCGATATC
NGCATCTTTATGGTGTTCTCTGTATTTCCTGAATTTGAATGTTGGCCTGCC
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@A00814:396:HYJJ7DMXX:2:1101:5376:1000 1:N:0:TTATAACC+NCGATATC
NGGGGGGAGAGCGCGGCGACGGGTCTCGCTCCCTCGGCCCCGGGATTCGGC
+
FFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFF

R2:

@A00814:396:HYJJ7DMXX:2:1101:1976:1000 2:N:0:TTATAACC+NCGATATC
GGTGCACATGAAGGCTATGTTTGCACTGTATTATGGTTTAAGTGTATAATA
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@A00814:396:HYJJ7DMXX:2:1101:4146:1000 2:N:0:TTATAACC+NCGATATC
AAACACTCTGCAGGATATTATCCAGGAGAACTTCCCCAACCTAGAAAGGCA
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFF,
@A00814:396:HYJJ7DMXX:2:1101:5376:1000 2:N:0:TTATAACC+NCGATATC
TACAGCCCCCCCGGCAGCAGCACTCGCCGAATCCCGGGGCCGAGGGAGCGA
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF

Does this data have barcodes? If yes, where is it?

TTATAACC+NCGATATC is present in all headers in the same FASTQ file. Different FASTQ files have different values at the end of the header. If TTATAACC+NCGATATC is the cell barcode, can I consider the whole FASTQ file to belong to a single cell? Or does this FASTQ file have sequence information for many cells?

I am looking forward to your assistance.

Thank you

scRNA-seq SmartSeq single-cell • 884 views
ADD COMMENT
0
Entering edit mode

Yes this data is indexed. Indexes are located in header. Your guess is right. For this sample the are TTATAACC+NCGATATC. These are illumina indexes and they may be for the sample. Each cell may have its own cell barcodes. They will not be in the fastq header in general.

ADD REPLY
0
Entering edit mode

Thank you for your prompt reply. Thank you for the information that TTATAACC+NCGATATC is the index for the sample. Then, how can I obtain the cell barcode information? I do not have any extra files except R1 and R2 pairs.

ADD REPLY
0
Entering edit mode
16 months ago
ATpoint 86k

Smartseq is plate-based and each well in the plate (=1 cell) is a fastq file pair and there are no cellular barcodes or UMIs. Hence, you don't need any specific single-cell software. You can use salmon or STAR.

ADD COMMENT
0
Entering edit mode

Thank you for your prompt reply. Each FASTQ file is around ~5 GB (uncompressed). Each FASTQ file belongs to one cell type, right? I am asking this question because I thought the size of the file is quite large for a single cell. I have 25 pairs of R1 and R2. Does it mean that I have sequencing information only for 25 cells?

ADD REPLY

Login before adding your answer.

Traffic: 1726 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6