Hi,
I am analyzing a single-cell RNAseq of Chlamydomonas reinhardtii.
I downloaded the gff3 and fasta from JGI Data Portal, and converted the gff3 to gtf in order to create the index with STAR. Everything runs fine, but when I create the Seurat object, the Features are loci names instead of gene names. This is very inconvenient because I need to cluster based on a specific gene's expression and I cannot do it if I don't have the names of the genes.
I tried using Seurat.utils::RenameGenesSeurat
but it won't work because the locus to gene file I was able to obtain has less rows than loci names in the object (meaning that there is not a gene name for each locus, I guess).
Has anybody worked with C.reinhardtii single-cell before and could help me get the names of the genes instead of the loci in the final Seurat object?
Thank you in advance
It is always safer to get sequence/annotations from standard sources like NCBI/Ensembl so no manipulations are required to make the data work. https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/002/595/GCF_000002595.2_Chlamydomonas_reinhardtii_v5.5/ has the genome and GTF available, if you feel inclined to try.
You should also use
STARsolo
if this is single cell data.Thank you for your answer! Unfortunalety, NCBI doesn't have the latest version of the genome and annotations for this organism (v6.1), so I am not able to use it. Also, I tried using STARsolo but the libraries for this experiment were created using BD Rhapsody protocols, which have a very strange barcode structure so I have had to do the preprocessing through their web app, which requires that one introduces the STAR index.
That is important to know. Perhaps the issue is with the way you converted the GFF3 to GTF. What software did you use?
In general AGAT toolkit (LINK) is very useful for these conversions. If you did not use it then you may want to give it a try.