Hi,
I am trying to use STARsolo on scRNA-seq data for A.Thaliana (GSM4423536) to produce the count matrix. However, everytime I run STARsolo, it detects only one gene and the features.tsv file is thus empty. As a beginner, I am quite lost. I tried different datasets for A.Thaliana, but the same issue repeats. I am not sure about what I am doing wrong here or not understanding correctly. The results published using 10X cellranger detects many genes and their IDs.
One more question, is that related to next steps and clustering? I mean, would that produce an issue? my sense is it should because there is one gene when we need multiple genes to cluster cells based on their expression.
Here is the command I ran:
STAR \
--runThreadN 4 \
--genomeDir reference_genome/STAR_annotated-index/ \
--readFilesIn FASTQ_data/SRR13040580_2.fastq FASTQ_data/SRR13040580_1.fastq \
--outFileNamePrefix STARsolo_results/ \
--outReadsUnmapped Fastx \
--outSAMattributes NH HI NM MD CB UB sM sS sQ \
--outFilterMultimapNmax 1 \
--outFilterMatchNmin 30 \
--outFilterMismatchNmax 4 \
--alignIntronMax 1 \
--alignSJDBoverhangMin 999 \
--soloType CB_UMI_Simple \
--soloCellFilter EmptyDrops_CR \
--soloCBwhitelist CB_whitelist/3M-february-2018.txt \
--outSAMtype BAM SortedByCoordinate
Seems like your reference data/index is missing information STARsolo expects.
Again, I'd highly, highly recommend reading OSCA and/or speaking to a local expert. You appear to be flying blind here, so to speak, and it will only lead you to frustration and wasted effort.
I downloaded the genome fasta file and the annotation file from ensemble. Is there something I should do to solve the missing information issue?
Regarding OSCA, I started reading it, but, alongside, I want to apply what I learn so I am trying to figure this step out so that I have some matrix to work on.
Thank you!
Hi!
I could fix this issue. And yes, the annotation file needed to be filtered. I wrote the solution below.
Thank you:)
Please do not post screenshots of things unless necessary. Copying and pasting data should do just fine. Use
101010
button to format your data ascode
which will maintain formatting.