I am trying to process scRNA-seq data (10X) from a Homo Sapiens cancer cell line. I have a set of processed files including filtered matrix in .h5 and others processed by cellranger. However, my collaborators also want me to align some EBV genes, e.g., EBNA2, LMP1, etc., so that we could further perform differential analysis.
The ideal case is that I customize some places in the cellranger pipeline, and finally I would have a new filtered h5 matrix with both of Homo Sapiens genes and EBV genes, and then I would not change my downstream analysis script. But I do not have any idea about this. May I have your suggestions? Thank you very much.
Hi Dave , thank you very much for your response. I just found gff3 annotation file https://github.com/SannaAb/EBV_RNASeqPipeline/tree/main/DB, but not gtf file, so is it also applicable or I should first convert it? Actually, I found from some papers mentioned that NCBI hold the EBV gtf file and fastq file, but that NCBI just show error when I try to access, and do you have any idea about that? Thank you very much.
I believe Cellranger only accepts GTF format, so you would need to convert your GFF to GTF. I think there are lots of software that can do this, but I like to use gffread.
Unfortunately, I don't have any insights into errors that might be occurring with NCBI.
Both GFF and GTF files are available for EBV at NCBI here --> https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/002/402/265/GCF_002402265.1_ASM240226v1/
If you choose to get the GTF from here be sure to also download the genome file that matches the annotation.
Hi GenoMax, I really appreciate your help. I also find that there is .fna.gz file which is fasta format, so I am a little confused: from this turtorial, I need three files -- fasta file, gtf file, and genome file. And you also mentioned that I need to download the compatibale genome file, so there is another genome file in NCBI but is not that .fna.gz file, right? May I have your suggestions? Thank you very much!
There is no third file. I think you are referring to the option below. It is something you choose to name the output folder with
cellranger mkref
.That is the fasta format genome file for EBV that corresponds to the GTF.
Got it! I will try then, and thank you very much GenoMax!