hi, everybody, I have a question to ask. Hope to get your help and thank you.
I have a set scRNA-seq data (10×). It includes two reads. Reads 1 contain the UMI and barcode, just like follows:
@SRR7646180.1 1 length=26
GTCGTAAAGATATACGGCACAACTCT
+SRR7646180.1 1 length=26
CDDDDIIIIIHIIHIIIIIIIIIIII
@SRR7646180.2 2 length=26
GATCGTAGTTGCCTCTCAAAGAACGT
+SRR7646180.2 2 length=26
DDDDDIIIIIIHIIIIIIIIIIIIII
........
Reads 2 contain the sequence whose length is 98bp like follows:
@SRR7646180.1 1 length=98
CTAGGAAACTGGATATTCACATGTAGAAGACTGAAACTAGATGCTTATCTCTCACCACATTAAGAAAATCAAAATGGATT
+SRR7646180.1 1 length=98
CDDABIIHHHIIHIIHIIIIHIHIIIIIIIHHIIH?FHHIIIIIIIHIHHEHIIIIIIIIIIIIIIIIICHIIIIHIHII
@SRR7646180.2 2 length=98
AAGCAGTGGTATCAACGCAGAGTACATGGGGGTTCACTCCCACTTCATCCTGGCTGAAAGCAGTGCTGTGCTTTGAAATG
+SRR7646180.2 2 length=98
DDDDDIIGHIIIIIIIIHIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIHIIIIIIIHIIIHIIIIHIIIIIIIIGH
Now I get a cellbarcode list like follows:
0 AAACCTGAGCCACTAT
1 AAACCTGAGTCTCCTC
2 AAACCTGCACAACTGT
3 AAACCTGTCGAGCCCA
4 AAACCTGTCTCCGGTT
5 AAACGGGAGAAGATTC
6 AAACGGGAGTGACTCT
7 AAACGGGCAAGGGTCA
8 AAACGGGCATGTAAGA
9 AAACGGGGTCAAAGAT
These cell barcodes originate some of read 1. They represent some of the cells. How can I get the reads including these barcodes in the fastq file?
Thank you in advance!
May I ask why you want a cutom approach rather than simply running CellRanger or any orter specialized software for single-cell 10X data such as STARsolo, Salmon/Alevin or Kallisto/Bustools? What is your final goal? Barcodes can be noisy and with sequencing errors, naive approaches will likely be suboptimal here, the aforementioned software will take care of this.
Thank you. I have a scRNA-seq that contains three cell lines artificially mixing. I just want to get fastq of one of them. Now, I get these cell lines barcodes. I want to get the fastq file that just contains this kind of cell line. How should I do to get them? thank you!