Dear Community,
I want to align SmartSeq platform-based single-cell data to the human genome using Alevin and/or StarSolo and then create a Seurat object. My data is paired-end. Each FASTQ file is around 4-10 GB in size.
R1 and R2 pairs look like this:
R1:
@A00814:396:HYJJ7DMXX:2:1101:1976:1000 1:N:0:TTATAACC+NCGATATC
NTCTCTGTATCAGCATATTAGCAATAACATATTTTTAAATGAAGGTATGTA
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@A00814:396:HYJJ7DMXX:2:1101:4146:1000 1:N:0:TTATAACC+NCGATATC
NGCATCTTTATGGTGTTCTCTGTATTTCCTGAATTTGAATGTTGGCCTGCC
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@A00814:396:HYJJ7DMXX:2:1101:5376:1000 1:N:0:TTATAACC+NCGATATC
NGGGGGGAGAGCGCGGCGACGGGTCTCGCTCCCTCGGCCCCGGGATTCGGC
+
FFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFF
R2:
@A00814:396:HYJJ7DMXX:2:1101:1976:1000 2:N:0:TTATAACC+NCGATATC
GGTGCACATGAAGGCTATGTTTGCACTGTATTATGGTTTAAGTGTATAATA
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@A00814:396:HYJJ7DMXX:2:1101:4146:1000 2:N:0:TTATAACC+NCGATATC
AAACACTCTGCAGGATATTATCCAGGAGAACTTCCCCAACCTAGAAAGGCA
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFF,
@A00814:396:HYJJ7DMXX:2:1101:5376:1000 2:N:0:TTATAACC+NCGATATC
TACAGCCCCCCCGGCAGCAGCACTCGCCGAATCCCGGGGCCGAGGGAGCGA
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
Does this data have barcodes? If yes, where is it?
TTATAACC+NCGATATC
is present in all headers in the same FASTQ file. Different FASTQ files have different values at the end of the header. If TTATAACC+NCGATATC
is the cell barcode, can I consider the whole FASTQ file to belong to a single cell? Or does this FASTQ file have sequence information for many cells?
I am looking forward to your assistance.
Thank you
Yes this data is indexed. Indexes are located in header. Your guess is right. For this sample the are
TTATAACC+NCGATATC
. These are illumina indexes and they may be for the sample. Each cell may have its own cell barcodes. They will not be in the fastq header in general.Thank you for your prompt reply. Thank you for the information that TTATAACC+NCGATATC is the index for the sample. Then, how can I obtain the cell barcode information? I do not have any extra files except R1 and R2 pairs.