Hi Biostars,
I've downloaded some scRNA-Seq data from the GSA, which I am hoping to analyze. This is 10x V2 chemistry sequence data.
However the format is different from what I am familiar with. First, the reads come in 2 fastq files ("f1" and "r2"):
CRR034505_f1.fastq.gz
CRR034505_r2.fastq.gz
More importantly, both mates have equal read lengths:
zcat CRR034505_f1.fastq.gz | head -n4
@ST-E00126:655:HL5FTCCXY:5:1101:7638:1151 1:N:0:NAAGTGCT
NAGTAACCAAGACACGTATTGCGCATTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAACTAAAAAGGGGTCCCAGAATTTCAGCAGTTCTCTGATTTTTATATTTTATTCCTCTTCCTATCCAATCCCTGCCTTTTGCTTCAAGGTG
+
#AAFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFAA--7<<--AFFJ7-)-7)--<AAF7<<--7-----77---AA7AFJF7F<A----7A-<-AA<F<A7-A-77)--<F-<<--7A--<-77
zcat CRR034505_r2.fastq.gz | head -n4
@ST-E00126:655:HL5FTCCXY:5:1101:7638:1151 2:N:0:NAAGTGCT
NCAAAGAAAAAGACACATTTGGGAAGAAAAGCAGGAAAAACGTTAAAGAAAATGTACTTACCACCTGGACTCAAAAGGCAGGGATTGGATAGGAAGAGGAATAAAATATAAAAATCAGAGAACTGCTGAAATTATGTGACCACTTTTTAG
+
#AAFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJAJJJJJJJJJJJJJJJJJJJJJJJJFJFJJJJJJJJJJJJJJAJJJJFJJFAJFJJJJJJJJJJJJJJJJJJF-A7JAAF---7<<AFJFF<AJ--F-)-7<<FJAFA
I have never seen a scRNA-Seq data set that looks like this, though I have not done much work with 10x V2 chemistry in the past.
Is this normal? Has anybody encountered scRNA-Seq data like this before?
Thanks!
Dave
Fantastic, thank you!
just make sure you compare the R1 reads after deleting the other bases to the 10x whitelist to make sure those are cell barcodes. R1 is the 16bp feature barcode + 10 bp UMI https://divingintogeneticsandgenomics.com/post/understand-10x-scrnaseq-and-scatac-fastqs/ v3 is 12bp UMI if I recall.