Hello folks,
I am looking for a reliable though process to determine 10x chromium chemistry if the publication has not mentioned. (3 prime vs 5 prime or v1/2/3). 3p vs 5p can be detected by forward or reverse mapping but i have seen cases where forward and reverse mapping read numbers are similar. (rather than 10% vs 90%, they are 45% or 55%). I guess, versions are the easiest because V1/V2/V3 of 3p have distinct cell barcode and UMI lengths.
I know cellranger samples couple hundred thousand reads and tries multiple configurations. I found the following repo which made it quite reliable. (tip my hat to the owner)
https://github.com/cellgeni/reprocess_public_10x/blob/main/scripts/starsolo_10x_auto.sh
For the reference this the dataset. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE139495
this is the sample SRX7065207
My question is quite vague but i would like to understand what other fellow bioinformaticians think and whats their thought process? Thank you very much,
T.
Thank you for your fast response. How about 3p-v2 vs 5p-v1 or 5p-v2? As far as I know 5p chemistry also uses 16/10 barcode/umi lengths.
I agree. I also found that sometimes bam2fastq split to so many little fastqs doubles the storage. I found that STARsolo can take bam as input. All you have to do is basically set cell barcode and umi sequence and quality BAM tags and it works like a charm. (Sorry selfishly, we are trying to generate splice/unspliced counts so must use STARsolo/alevin to reprocess the fastq/bam.
Aren't these kits dual-indexed? This one here is single-index. You see this in the bam as well. R1/R2/I1 only.
Yes. You are very right. Thank you very much!. I totally ignore the indexes.