Entering edit mode
5.5 years ago
sambunga094
•
0
Hi,
I am new to Bioinformatics, I have 10x Mouse 1k single cell brain dataset. How do i move the Barcode(16bp) + UMI(10bp) to header of R2 ? And also is this the right way to format a fastq file?
PS: Im just learning how to process single-cell data. Please let me know if there are any tools available.
Thank you so much!!
Why are you trying to format a FASTQ file? For standard 10x workflow, you should not have to do that.
thanks for your reply. I want to process it without using cellranger.
I strongly recommend against the use of non-standard tools or even custom scripts for such a non-trivial task as UMI deduplication and quantification of single-cell data. To my knowledge all tested tools work directly on fastq files, such as
CellRanger
,alevin
or the recentkallisto/bustools
. Do yourself a favor and use them. Especially if you are new, single-cell data are not trivial to analyze and (no offense) if you are already stuck at file header manipulation, things will get very tricky downstream. I suggest you look atalevin
to do the lowlevel processing. https://salmon.readthedocs.io/en/latest/alevin.htmlWhy?
The reason I ask is because I frequently see people invent a complex protocol to solve a problem that already has a relatively simple solution.
Cell Ranger is not a "simple solution" in the sense that it requires large amounts of RAM, large amounts of temporary disk, and takes a very long time to process standard datasets. For example, in benchmarks we performed recently (https://www.biorxiv.org/content/10.1101/673285v2) we found that on the 10x hgmm10k_v3 dataset Cell Ranger required 28Gb of RAM, 1.3Tb of disk, and took 21.5 hours to run. In comparison, kallisto | bustools required 11Gb of RAM, 15Gb of disk, and 27 minutes. The differences have real implications in terms of cost (e.g. if one is processing on AWS). Furthermore, the speed of kallisto | bustools makes it possible to rerun analyses (e.g. with updated transcriptomes) thus making a workflow that is reproducible in practice and not just in theory.
Depends on how you define the Kolmogorov complexity of the solution.