Question

how to get a new fastq file according to their barcodes?

0

Entering edit mode

4.1 years ago

ruiyan_hou • 0

hi, everybody, I have a question to ask. Hope to get your help and thank you.

I have a set scRNA-seq data (10×). It includes two reads. Reads 1 contain the UMI and barcode, just like follows:

@SRR7646180.1 1 length=26
GTCGTAAAGATATACGGCACAACTCT
+SRR7646180.1 1 length=26
CDDDDIIIIIHIIHIIIIIIIIIIII
@SRR7646180.2 2 length=26
GATCGTAGTTGCCTCTCAAAGAACGT
+SRR7646180.2 2 length=26
DDDDDIIIIIIHIIIIIIIIIIIIII
........

Reads 2 contain the sequence whose length is 98bp like follows:

@SRR7646180.1 1 length=98
CTAGGAAACTGGATATTCACATGTAGAAGACTGAAACTAGATGCTTATCTCTCACCACATTAAGAAAATCAAAATGGATT
+SRR7646180.1 1 length=98
CDDABIIHHHIIHIIHIIIIHIHIIIIIIIHHIIH?FHHIIIIIIIHIHHEHIIIIIIIIIIIIIIIIICHIIIIHIHII
@SRR7646180.2 2 length=98
AAGCAGTGGTATCAACGCAGAGTACATGGGGGTTCACTCCCACTTCATCCTGGCTGAAAGCAGTGCTGTGCTTTGAAATG
+SRR7646180.2 2 length=98
DDDDDIIGHIIIIIIIIHIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIHIIIIIIIHIIIHIIIIHIIIIIIIIGH

Now I get a cellbarcode list like follows:

0       AAACCTGAGCCACTAT
1       AAACCTGAGTCTCCTC
2       AAACCTGCACAACTGT
3       AAACCTGTCGAGCCCA
4       AAACCTGTCTCCGGTT
5       AAACGGGAGAAGATTC
6       AAACGGGAGTGACTCT
7       AAACGGGCAAGGGTCA
8       AAACGGGCATGTAAGA
9       AAACGGGGTCAAAGAT

These cell barcodes originate some of read 1. They represent some of the cells. How can I get the reads including these barcodes in the fastq file?

Thank you in advance!

RNA-Seq • 2.4k views

ADD COMMENT • link updated 2.7 years ago by katze99 • 0 • written 4.1 years ago by ruiyan_hou • 0

0

Entering edit mode

May I ask why you want a cutom approach rather than simply running CellRanger or any orter specialized software for single-cell 10X data such as STARsolo, Salmon/Alevin or Kallisto/Bustools? What is your final goal? Barcodes can be noisy and with sequencing errors, naive approaches will likely be suboptimal here, the aforementioned software will take care of this.

ADD REPLY • link 4.1 years ago by ATpoint 86k

0

Entering edit mode

Thank you. I have a scRNA-seq that contains three cell lines artificially mixing. I just want to get fastq of one of them. Now, I get these cell lines barcodes. I want to get the fastq file that just contains this kind of cell line. How should I do to get them? thank you!

ADD REPLY • link 4.1 years ago by ruiyan_hou • 0

score 0 · Answer 1 · 2022-04-10

Hi, you can use --whitelist argument in UMI-Tools.

umi_tools extract --bc-pattern=CCCCCCCCCCCCCCCCNNNNNNNNNN --stdin R1.fastq.gz --stdout R1_extracted.fastq.gz --read2-in R2.fastq.gz --read2-out=R2_extracted.fastq.gz --whitelist=whitelist.txt

Remember your whitelist.txt file must be included your interest barcodes in tab-separated format, without any extra number or characters.