I am a beginner in bioinformatics. In scRNA analysis work, I got bit puzzled in running starsolo for 3' 10X paired-ended reads. First, I input the reads as R2 and R1 in the code, it worked with somewhat okay (not desired one) results. But In a discussion thread of solo at https://github.com/alexdobin/STAR/issues/768 , I found that starsolo owner is approving that for paired-ended reads should be input as R1 and R2, but for 5'10X reads.
Code what I ran so far for 3' 10X paired-ended reads (v.3.1) is as below-
ml star 2.7.9a
STAR --runThreadN 72 --genomeDir $REF --readFilesIn $R2 $R1 --runDirPerm All_RWX \
--readFilesCommand zcat $SORTEDBAM --soloType CB_UMI_Simple --soloCBwhitelist /test/cell_barcode_whitelist/3M-february-2018.txt \
--soloBarcodeReadLength 0 --soloUMIlen 12 --soloStrand Forward --soloUMIdedup 1MM_CR \
--soloCBmatchWLtype 1MM_multi_Nbase_pseudocounts --soloUMIfiltering MultiGeneUMI_CR \
--soloCellFilter EmptyDrops_CR --clipAdapterType CellRanger4 --outFilterScoreMin 30 \
--soloFeatures Gene GeneFull Velocyto \
--soloOutFileNames output/ genes.tsv barcodes.tsv matrix.mtx
exit
I am assuming that I input the reads in correct way since R1 reads has barcode+UMI and R2 has only sequenced reads. But Not certain even from the starsolo page. Please suggest what should be the read order for paired-ended 3' 10X data v.3.1.0. Thanks
That is correct.
See if this helps: STARsolo config for 10x Chromium v1, v2, v3 for specifying
Thank you. I have gone through the link. I found that I already put --soloCBwhitelist [whitelist] dir/3M-february-2018.txt in 2nd line and --soloUMIlen [UMI length] 12 in 3rd line in the code. I did not put --soloCBlen [CB length] and --soloUMIstart [UMI start] since I already put the --clipAdapterType CellRanger4 in the code.
I need to know the order of reads i.e. whether --readFilesIn R1.fastq.gz R2.fastq.gz OR --readFilesIn R2.fastq.gz R1.fastq.gz for paired-ended 3' 10X data v.3.1.0 mapping? Thanks
why not directly use 10X's tool like cellranger for mapping and count UMI?
Thanks for the suggestion. Analysis is already done with Cellranger 6.1. However, our group is trying to the downstream analysis using different mapping and counting methods to get the solution (if we can) of some issues like multimapping etc.
Moving this to a comment since it is not a direct answer to original question.