Hi,
I'm working on a brand new animal. The animal has never even been published. We only have its genome file in the fasta format of this organism (containing the sequences of multiple contigs), and the gene file in gff3/gtf format (containing information such as transcript, exon, CDS, etc.).
I tried ArchR
, Signac
, SnapATAC
, and scATAC-pro
tools for downstream analysis of data generated by 10x cellranger-atac. But these tools are very unfriendly to the application of new species, and it is almost impossible to create the objects required by each tool with custom fasta and gff files.
Does anyone know of anyone who has published a non-model organism application of 10x scATAC-seq? And are there any downstream analysis tools that can customize species to recommend?
Thanks a lot!
i.e.: in Signac
tutorial,
counts <- Read10X_h5(filename = "../vignette_data/atac_v1_pbmc_10k_filtered_peak_bc_matrix.h5") metadata <- read.csv( file = "../vignette_data/atac_v1_pbmc_10k_singlecell.csv", header = TRUE, row.names = 1 ) chrom_assay <- CreateChromatinAssay( counts = counts, sep = c(":", "-"), genome = 'hg19', fragments = '../vignette_data/atac_v1_pbmc_10k_fragments.tsv.gz', min.cells = 10, min.features = 200, annotation = NULL ) pbmc <- CreateSeuratObject( counts = chrom_assay, assay = "peaks", meta.data = metadata )
When I do the CreateChromatinAssay()
, I could not directly use 'hg19' or anything else for 'genome =' argument. So I established a Seqinfo
object containing basic information about the genome used:
sca.contig.size<-read.csv(file = '.../fasta/sca.contig.size.csv', header = F)
Above, I used
genome.fa.fai
file in the cellranger reference file I created before.Sca <- Seqinfo(seqnames=c(sca.contig.size$V1), seqlengths=c(sca.contig.size$V2), isCircular=c(rep(F, nrow(sca.contig.size))), genome="Sca")
And then I have to establish a 'GRanges' object or a set of 'GRanges' containing annotations for the genome used.
This step is where I can't do anything about it. I have gff3 file but I don't know how to convert it to 'GRanges' object. The format introduction of 'GRanges' is very complicated, so I can only use other tools for object creation instead of manual editing like the Seqinfo
object. So I tried:
if(!require(AcidGenomes))remotes::install_github("acidgenomics/AcidGenomes") makeGRangesFromGFF(file=".../s.ca.primary.gff3", level = c("genes"))
Desperately,
Currently makeGRangesFromGFF() supports genomes from these sources:
Ensembl (GTF, GFF3).
GENCODE (GTF, GFF3).
RefSeq (GTF, GFF3).
FlyBase (GTF).
WormBase (GTF).
This goes back to the 'must use a published classic model' issue.
Hi @Chilly , I'm the developer the AcidGenomes package. Happy to help add support for the
s.ca.primary.gff3
file you mentioned. Can you provide a copy of the file?Best, Mike