I am wondering if someone could recommend any tools (not cellranger) to align and generate count matrix of Chromium Fixed RNA Profiling (FRP) data. Link to Method
I am wondering if someone could recommend any tools (not cellranger) to align and generate count matrix of Chromium Fixed RNA Profiling (FRP) data. Link to Method
What is the purpose of doing so? I tried to do that by using the bam file generated by CellRanger. One important detail if you are working with raw data. The actual sequence of the probe barcode is actually 8 sequences not only 1 sequence. For example the second barcode in your count is actually a "variant" of BC001. https://www.10xgenomics.com/support/single-cell-gene-expression-flex/documentation/steps/probe-sets/chromium-frp-probe-set-files#probe_seq_file
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
As long as you have demultiplexed sample fastq files with UMI etc
alevin fry
would likely work?Multiple samples are pooled together and each sample has specific probe barcode id (BC001 through BC008). What will be your recommendation for a nice tool to demultiplex sample fastq files? But, we can easily identify which cell originated from which sample after we have count matrix based on probe barcode. I wonder if
alevin fry
will work without demultiplexing these data.You may want to demultiplex the data using
cellranger mkfastq
and then go from there. You will need access to original data folder for this to work. Otherwise you will need to demultiplex the data using additional steps.This part
(cellranger mkfastq)
is already done. I have a pair of fastq files having 8 pooled samples in it.Perhaps it was not done right? Looks like if you use the right index codes in the samplesheet, then you should have fully demultiplexed samples at end of
cellranger mkfastq
): https://kb.10xgenomics.com/hc/en-us/articles/4403017520653-Where-can-I-find-the-Dual-Index-Kit-TS-Set-A-sample-index-sequences-That part is done correctly. I was going through github and found an issue requesting to develop function in
alevin-fry
process FRP data. Not sure if it is out there yet. May be ATpoint can chime in here.I have not processed FRP data first hand but I would think that if that part was done correctly then you should have separate sample files at this point, not a mix of 8 samples.
For FRP data,
Cell Ranger
recommends to usecellranger multi
function where it will take your fastq files ans certain parameters as input and outputs the results for individual samples in the pool.I was only referring to
mkfastq
part to demultiplex the samples (step 1 in the process). Going by this line in link aboveThe relevant set of indexes is available here: https://cdn.10xgenomics.com/raw/upload/v1655155124/support/in-line%20documents/Dual_Index_Kit_TS_Set_A.csv
If
TS
codes in this file had been used in the samplesheet then I would have thought that you will get each sample as a separate file.We had used the Index SI-TS-A5 but different probe barcodes BC001 to BC008 (see here). Therefore only a pair of fastq files at
cellranger mkfastq
step and count matrices for 8 sample aftercellranger multi
step.I see. So this is an unsupported extension of the protocol. Based on ATPoint's comment in the GitHub issue you know where the "Probe BC" is so you will need to look for 4 possible combinations to demultiplex the data further.
This functionality is probably not implemented in
alevin
otherwise @Rob would have marked the issue done.Looks like 10x has example datasets on their web site. Will take a look to see what these BC sequences look in reality. Should be possible to bin them using
seal.sh
from BBMap.Edit: From one of the test datasets it is possible to see these BC barcodes in Read 2 file.
Extracting the 8 bp section led to these counts (this data is supposed to have 4
bc
barcodes and there are top 16 counts that fit).Trick now would be to identify the 4 indexes that contribute to one
bc
code and from there binning/demultiplexing reads should be possible. These files can then be input intoalevin fry
.