Hello everyone,
I have a sample of scRNA seq data (A.Thaliana) generated by 10X Genomics. The data is composed of R1 (cell barcodes and UMIs) and R2 (actual reads) in FASTQ format. The sample has around ~7000 cells
I ran the data through STAR solo to map them to the genome. The results are a BAM file (mapped reads) and count matrix files.
I need to run the mapped reads through some sort of an algorithm. However, if I run the produced BAM file, it will be dealt with as bulk rna-seq. So, ideally, I want to split the BAM file into a separate file for each cell. Splitting the FASTQ file would be ok too then I run them separately through STAR (not sure about the efficiency of this though).
I am working on Bash terminal environment
Is there a method to do so? Any suggestions?
I am an undergraduate student and completely new to this. Any help would be appreciated. Thank you!!
This is one case where you may need to use
cellranger
to create the BAM unlesSTARsolo
does this too. The cell barcode will be encoded in the alignments with following tagYou will need to use
pysam
or similar program to parse the BAM tags to create files for individual cells. It is unclear what the utility of this application is though. Otherwise people would have written tools to do this already.Thank you for your reply. Yes, STAR solo does that too. My struggle is with splitting the BAM file to individual cells. There are some tools but I am not sure which one to use and how and if they even do the job I need. Any clarification on how to use pysam or another tool would be appreciated.
10x Genomics provides Cell Ranger to easily process the data. Why are you not using it?
Does cell ranger have a tool to split the BAM file into individual cells? That's the particular step that I'm struggling with.
STARsolo does the same job as cell ranger for mapping the reads (the BAM file which I need) and making the count matrix (I don't need for now) I think. The reason why I used STAR is that I'm working on the bash terminal for other processing and would like to keep all in the same place.
The later analysis that I would do is not present on cell ranger. It's goal is to identify and classify RNA modification in each cell.
You can run CellRanger from the command line so this shouldn't be a reason not to use it.
ATpoint Thank you for your reply! So what I am trying to do is to run scRNA data through an algorithm that uses usual bulk rna seq data that is mapped to a reference genome to detect rna modifications and classify them. The goal is to get scRNA data to run successfully through this algorithm while keeping the single cell quality. I could successfully run the produced BAM file for the mapped scRNA reads, however, it is useless since the results represents rna modifications location in the data as if it is bulk. So, I am trying to find a way to separate data from individual cells. The idea I have in mind is to split the bam file and run the algorithm in a loop over the produced files. I hope that makes sense and any help or guidance would be greatly appreciated. Thank you so much once more!
There will be very few reads mapped per individual cell. So while you may be able to run the tool your are referring to the results may not be valid/accurate. Tools make certain assumptions and if the data does not meet those then you will need to consider that scenario..
Did you have any luck in splitting the BAM file based on the 10x cell barcode? I would like to split a BAM file based to only include 5 specific cell barcodes and not sure how to do it. Thanks
Hey, if you arelady have the barcodes, you could use samtools
samtools view -h -b -f CB:Z:TAAGAGATCCTATGTT > TAAGAGATCCTATGTT.bam
Hopefully it is useful, this works well with STARsolo bam files, don't know how CellRanger handles its barcodes