STARsolo mapping for paired-ended 3' 10X reads
1
0
Entering edit mode
2.6 years ago
tvibhaps ▴ 10

I am a beginner in bioinformatics. In scRNA analysis work, I got bit puzzled in running starsolo for 3' 10X paired-ended reads. First, I input the reads as R2 and R1 in the code, it worked with somewhat okay (not desired one) results. But In a discussion thread of solo at https://github.com/alexdobin/STAR/issues/768 , I found that starsolo owner is approving that for paired-ended reads should be input as R1 and R2, but for 5'10X reads.

Code what I ran so far for 3' 10X paired-ended reads (v.3.1) is as below-

ml star 2.7.9a
STAR --runThreadN 72 --genomeDir $REF --readFilesIn $R2  $R1 --runDirPerm All_RWX \
     --readFilesCommand zcat $SORTEDBAM --soloType CB_UMI_Simple --soloCBwhitelist /test/cell_barcode_whitelist/3M-february-2018.txt \
     --soloBarcodeReadLength 0 --soloUMIlen 12 --soloStrand Forward --soloUMIdedup 1MM_CR \
     --soloCBmatchWLtype 1MM_multi_Nbase_pseudocounts --soloUMIfiltering MultiGeneUMI_CR \
     --soloCellFilter EmptyDrops_CR --clipAdapterType CellRanger4 --outFilterScoreMin 30 \
     --soloFeatures Gene GeneFull Velocyto \
     --soloOutFileNames output/ genes.tsv barcodes.tsv matrix.mtx
exit

I am assuming that I input the reads in correct way since R1 reads has barcode+UMI and R2 has only sequenced reads. But Not certain even from the starsolo page. Please suggest what should be the read order for paired-ended 3' 10X data v.3.1.0. Thanks

mapping scRNA STARsolo • 2.2k views
ADD COMMENT
0
Entering edit mode

I input the reads in correct way since R1 reads has barcode+UMI and R2 has only sequenced reads

That is correct.

See if this helps: STARsolo config for 10x Chromium v1, v2, v3 for specifying

 --soloCBwhitelist [whitelist] \
    --soloCBlen [CB length] \
    --soloUMIstart [UMI start] \
    --soloUMIlen [UMI length] \
ADD REPLY
0
Entering edit mode

Thank you. I have gone through the link. I found that I already put --soloCBwhitelist [whitelist] dir/3M-february-2018.txt in 2nd line and --soloUMIlen [UMI length] 12 in 3rd line in the code. I did not put --soloCBlen [CB length] and --soloUMIstart [UMI start] since I already put the --clipAdapterType CellRanger4 in the code.

I need to know the order of reads i.e. whether --readFilesIn R1.fastq.gz R2.fastq.gz OR --readFilesIn R2.fastq.gz R1.fastq.gz for paired-ended 3' 10X data v.3.1.0 mapping? Thanks

ADD REPLY
0
Entering edit mode

why not directly use 10X's tool like cellranger for mapping and count UMI?

ADD REPLY
0
Entering edit mode

Thanks for the suggestion. Analysis is already done with Cellranger 6.1. However, our group is trying to the downstream analysis using different mapping and counting methods to get the solution (if we can) of some issues like multimapping etc.

ADD REPLY
0
Entering edit mode

Moving this to a comment since it is not a direct answer to original question.

ADD REPLY
0
Entering edit mode
2.6 years ago
GenoMax 147k

From STARsolo link:

--readFilesIn option, the 1st file has to be cDNA read, and the 2nd file has to be the barcode (cell+UMI) read,

So R2 file first followed by R1.

ADD COMMENT
0
Entering edit mode

Got it. Thanks!!

ADD REPLY

Login before adding your answer.

Traffic: 2513 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6