Question

cellxgene DNA Methylation data

0

Entering edit mode

5 days ago

1769mkc ★ 1.2k

This data is from cellxgene ,DNA Methylation Atlas of the Mouse Brain at Single-Cell Resolution

The steps what I have followed is this

h5ad_file <- "h5ad_file/5327c540-58b7-4dd1-8af1-d112ed939b4b.h5ad"
adata <- sc$read_h5ad(h5ad_file)
head(adata)

 adata
AnnData object with n_obs × n_vars = 103982 × 39042
obs: 'AllcPath', 'CCC_Rate', 'CG_Rate', 'CG_RateAdj', 'CH_Rate', 'CH_RateAdj', 'FinalReads', 'InputReads', 'MappedReads', 'Region', 'index_name', 'uid', 'BamFilteringRate', 'MappingRate', 'Pos96', 'Plate', 'Col96', 'Row96', 'Col384', 'Row384', 'FACS_Date', 'Slice', 'BICCN_class_label', 'BICCN_subclass_label', 'BICCN_cluster_label', 'L1CellClass', 'class_umap_1', 'Order', 'RegionName', 'MajorRegion', 'SubRegion', 'DetailRegion', 'PotentialOverlap (MMB)', 'Anterior (CCF coords)', 'Posterior (CCF coords)', 'SubRegionColor', 'Replicate', 'BICCN_ontology_term_id', 'disease_ontology_term_id', 'assay_ontology_term_id', 'cell_type_ontology_term_id', 'tissue_ontology_term_id', 'development_stage_ontology_term_id', 'self_reported_ethnicity_ontology_term_id', 'sex_ontology_term_id', 'is_primary_data', 'organism_ontology_term_id', 'donor_id', 'suspension_type', 'tissue_type', 'cell_type', 'assay', 'disease', 'organism', 'sex', 'tissue', 'self_reported_ethnicity', 'development_stage', 'observation_joinid'
var: 'Unnamed: 0', 'feature_is_filtered', 'feature_name', 'feature_reference', 'feature_biotype', 'feature_length', 'feature_type'
uns: 'MajorRegion_colors', 'Region_colors', 'citation', 'schema_reference', 'schema_version', 'title'
obsm: 'X_tsne', 'X_umap'

Now when I try to see the data layers or structure in this

I get this

head(adata$obs)
      AllcPath
 10E_M_0     /gale/raidix/rdx-4/mapping/10E/CEMBA190625-10E-1-CEMBA190625-10E-2-A1/allc_CEMBA190625-10E-1-CEMBA190625-10E-2-A1_ad001.tsv.gz
10E_M_1     /gale/raidix/rdx-4/mapping/10E/CEMBA190625-10E-1-CEMBA190625-10E-2-A1/allc_CEMBA190625-10E-1-CEMBA190625-10E-2-A1_ad002.tsv.gz
10E_M_10  /gale/raidix/rdx-4/mapping/10E/CEMBA190625-10E-1-CEMBA190625-10E-2-A10/allc_CEMBA190625-10E-1-CEMBA190625-10E-2-A10_ad004.tsv.gz
10E_M_101 /gale/raidix/rdx-4/mapping/10E/CEMBA190625-10E-1-CEMBA190625-10E-2-B10/allc_CEMBA190625-10E-1-CEMBA190625-10E-2-B10_ad002.tsv.gz
10E_M_102 /gale/raidix/rdx-4/mapping/10E/CEMBA190625-10E-1-CEMBA190625-10E-2-B10/allc_CEMBA190625-10E-1-CEMBA190625-10E-2-B10_ad004.tsv.gz
10E_M_103 /gale/raidix/rdx-4/mapping/10E/CEMBA190625-10E-1-CEMBA190625-10E-2-B10/allc_CEMBA190625-10E-1-CEMBA190625-10E-2-B10_ad006.tsv.gz
             CCC_Rate   CG_Rate CG_RateAdj    CH_Rate CH_RateAdj FinalReads
10E_M_0   0.008198210 0.8226325  0.8211664 0.04163979 0.03371801    1626504
10E_M_1   0.006018933 0.7430346  0.7414785 0.02412729 0.01821801    2009998
10E_M_10  0.006569452 0.7501719  0.7485198 0.02766457 0.02123461    1383636
10E_M_101 0.006352796 0.7608976  0.7593689 0.02654676 0.02032307    2474670
10E_M_102 0.005408991 0.7529803  0.7516369 0.01949651 0.01416413    2430290
10E_M_103 0.005817363 0.7346639  0.7331113 0.02153866 0.01581329    2949180
          InputReads MappedReads Region index_name
10E_M_0      4407752     2892347    10E      ad001
10E_M_1      5524084     3657352    10E      ad002
10E_M_10     3455260     2172987    10E      ad004
10E_M_101    7245482     4778768    10E      ad002
10E_M_102    7004754     4609570    10E      ad004
10E_M_103    8645474     5564327    10E      ad006
                                              uid BamFilteringRate MappingRate
10E_M_0    CEMBA190625-10E-1-CEMBA190625-10E-2-A1        0.5623475   0.6561955
10E_M_1    CEMBA190625-10E-1-CEMBA190625-10E-2-A1        0.5495774   0.6620739
10E_M_10  CEMBA190625-10E-1-CEMBA190625-10E-2-A10        0.6367438   0.6288925
10E_M_101 CEMBA190625-10E-1-CEMBA190625-10E-2-B10        0.5178469   0.6595514
10E_M_102 CEMBA190625-10E-1-CEMBA190625-10E-2-B10        0.5272271   0.6580631
10E_M_103 CEMBA190625-10E-1-CEMBA190625-10E-2-B10        0.5300156   0.6436116
          Pos96             Plate Col96 Row96 Col384 Row384 FACS_Date Slice
10E_M_0      A1 CEMBA190625-10E-1     0     0      0      0    190625    10
10E_M_1      A1 CEMBA190625-10E-1     0     0      0      1    190625    10
10E_M_10    A10 CEMBA190625-10E-1     9     0     19      0    190625    10
10E_M_101   B10 CEMBA190625-10E-1     9     1     18      3    190625    10
10E_M_102   B10 CEMBA190625-10E-1     9     1     19      2    190625    10
10E_M_103   B10 CEMBA190625-10E-1     9     1     19      3    190625    10
          BICCN_class_label BICCN_subclass_label BICCN_cluster_label
10E_M_0                 Inh              MGE-Sst        MGE-Sst Rxra
10E_M_1                 Exc                  CA3           CA3 Cadm2
10E_M_10                Exc                  CA3           CA3 Cadm2
10E_M_101               Exc                  CA3           CA3 Cadm2
10E_M_102               Exc                  CA1           CA1 Chrm3
10E_M_103               Exc                  CA1           CA1 Chrm3
          L1CellClass class_umap_1 Order RegionName MajorRegion SubRegion
10E_M_0           Inh     8.687794    41       CA-3         HPF     CA1-3
10E_M_1       Exc-HPF    14.093295    41       CA-3         HPF     CA1-3
10E_M_10      Exc-HPF    13.630747    41       CA-3         HPF     CA1-3
10E_M_101     Exc-HPF    12.042387    41       CA-3         HPF     CA1-3
10E_M_102     Exc-HPF     6.567603    41       CA-3         HPF     CA1-3
10E_M_103     Exc-HPF     5.560691    41       CA-3         HPF     CA1-3
                      DetailRegion PotentialOverlap (MMB) Anterior (CCF coords)
10E_M_0   CA1, CA2, CA3, SUB, ProS               PA, HATA                  7500
10E_M_1   CA1, CA2, CA3, SUB, ProS               PA, HATA                  7500
10E_M_10  CA1, CA2, CA3, SUB, ProS               PA, HATA                  7500
10E_M_101 CA1, CA2, CA3, SUB, ProS               PA, HATA                  7500
10E_M_102 CA1, CA2, CA3, SUB, ProS               PA, HATA                  7500
10E_M_103 CA1, CA2, CA3, SUB, ProS               PA, HATA                  7500
          Posterior (CCF coords) SubRegionColor  Replicate
10E_M_0                     8100        #d62728 10E-190625
10E_M_1                     8100        #d62728 10E-190625
10E_M_10                    8100        #d62728 10E-190625
10E_M_101                   8100        #d62728 10E-190625
10E_M_102                   8100        #d62728 10E-190625
10E_M_103                   8100        #d62728 10E-190625
          BICCN_ontology_term_id disease_ontology_term_id
10E_M_0              ILX:0770152             PATO:0000461
10E_M_1              ILX:0770097             PATO:0000461
10E_M_10             ILX:0770097             PATO:0000461
10E_M_101            ILX:0770097             PATO:0000461
10E_M_102            ILX:0770097             PATO:0000461
10E_M_103            ILX:0770097             PATO:0000461
          assay_ontology_term_id cell_type_ontology_term_id
10E_M_0              EFO:0030027                 CL:0000617
10E_M_1              EFO:0030027                 CL:0000679
10E_M_10             EFO:0030027                 CL:0000679
10E_M_101            EFO:0030027                 CL:0000679
10E_M_102            EFO:0030027                 CL:0000679
10E_M_103            EFO:0030027                 CL:0000679
          tissue_ontology_term_id development_stage_ontology_term_id
10E_M_0            UBERON:0003876                     MmusDv:0000154
10E_M_1            UBERON:0003876                     MmusDv:0000154
10E_M_10           UBERON:0003876                     MmusDv:0000154
10E_M_101          UBERON:0003876                     MmusDv:0000154
10E_M_102          UBERON:0003876                     MmusDv:0000154
10E_M_103          UBERON:0003876                     MmusDv:0000154
          self_reported_ethnicity_ontology_term_id sex_ontology_term_id
10E_M_0                                         na         PATO:0000384
10E_M_1                                         na         PATO:0000384
10E_M_10                                        na         PATO:0000384
10E_M_101                                       na         PATO:0000384
10E_M_102                                       na         PATO:0000384
10E_M_103                                       na         PATO:0000384
          is_primary_data organism_ontology_term_id donor_id suspension_type
10E_M_0              TRUE           NCBITaxon:10090   pooled         nucleus
10E_M_1              TRUE           NCBITaxon:10090   pooled         nucleus
10E_M_10             TRUE           NCBITaxon:10090   pooled         nucleus
10E_M_101            TRUE           NCBITaxon:10090   pooled         nucleus
10E_M_102            TRUE           NCBITaxon:10090   pooled         nucleus
10E_M_103            TRUE           NCBITaxon:10090   pooled         nucleus
          tissue_type            cell_type     assay disease     organism  sex
10E_M_0        tissue     GABAergic neuron snmC-Seq2  normal Mus musculus male
10E_M_1        tissue glutamatergic neuron snmC-Seq2  normal Mus musculus male
10E_M_10       tissue glutamatergic neuron snmC-Seq2  normal Mus musculus male
10E_M_101      tissue glutamatergic neuron snmC-Seq2  normal Mus musculus male
10E_M_102      tissue glutamatergic neuron snmC-Seq2  normal Mus musculus male
10E_M_103      tissue glutamatergic neuron snmC-Seq2  normal Mus musculus male
                     tissue self_reported_ethnicity development_stage
10E_M_0   hippocampal field                      na  8-week-old stage
10E_M_1   hippocampal field                      na  8-week-old stage
10E_M_10  hippocampal field                      na  8-week-old stage
10E_M_101 hippocampal field                      na  8-week-old stage
10E_M_102 hippocampal field                      na  8-week-old stage
10E_M_103 hippocampal field                      na  8-week-old stage
          observation_joinid
10E_M_0           501c0ti%K@
10E_M_1           zuj|4iS7FH
10E_M_10          @v_7`Vi);H
10E_M_101         OH(jj&{LD0
10E_M_102         t5S{HgAGlE
10E_M_103         k^C~Cc+*Hg

So based on the above data frame, i can see its computed as well as annotated. Now since this is a
DNA Methylation data, I would like to know how can i use this object visualize or compare cluster in Seurat?

Any resources in this context would be really helpful.

cellxgene • 751 views

ADD COMMENT • link updated 1 day ago by LChart 4.7k • written 5 days ago by 1769mkc ★ 1.2k

score 2 · Answer 1 · 2024-12-31

2

Entering edit mode

3 days ago

LChart 4.7k

There are two issues here: (1) understanding the data and (2) understanding the data format.

As regards (1): this is heavily processed DNA methylation data. If you look at the .var you will almost certainly see that the features are genes and reflect a summary of the methylation state of each gene (typically an average over the promoter and gene body), possibly split by mC and hmC if the technology makes a difference. You can tell just by the shape of the data you do not have granular CpG information (n_obs × n_vars = 103982 × 39042).

As regards (2): To visualize this specifically in Seurat, you have to port the data from an AnnData python object to an R Seurat object. There are tools for doing this kind of thing, but I have had mixed success. I tend to do something like the following:


obs_df = adata.obs.copy()
obs_df['UMAP1'] = adata.obsm['X_umap'][:,0]
obs_df['UMAP2'] = adata.obsm['X_umap'][:,1]

var_df = adata.var.copy()

obs_df.to_csv('barcodes.csv')
var_df.to_csv('features.csv')
sp.io.mmwrite('values.mtx', adata.X.T)

obj <- CreateSeuratObject(
    Matrix::readMM('values.mtx'),
    meta.data=read.csv('barcodes.csv', row.names=1)
obj@meta.features <- read.csv('features.csv', row.names=1)

and do NOT follow the standard Seurat vignettes. You are starting with non-count values (gene averages), so things like cell normalization should not be performed.

ADD COMMENT • link 3 days ago by LChart 4.7k

0

Entering edit mode

Thank you for the insight, now coming to the seurat steps which you have given the code, so I have doubt regarding these steps to begin with

obs_df.to_csv('barcodes.csv')
var_df.to_csv('features.csv')
sp.io.mmwrite('values.mtx', adata.X.T)

Now as of now what I have to start is .h5ad file so where do I get these input since my only input is .h5ad in this case.

ADD REPLY • link 2 days ago by 1769mkc ★ 1.2k

1

Entering edit mode

The code you posted has adata <- sc$read_h5ad(h5ad_file) so...just do that.

ADD REPLY • link 1 day ago by LChart 4.7k

0

Entering edit mode

"do NOT follow the standard Seurat vignettes, You are starting with non-count values (gene averages)" coming to this, so what should i do if i have to find differential methylation? Can I use this FindMarkers() function on these values?

ADD REPLY • link 1 day ago by 1769mkc ★ 1.2k

0

Entering edit mode

It should work fine. The default is a wilcox test, so you'll be testing whether the ranks of average methylation tend to be higher in one population than another. Just make sure in the documentation that the features are average per-cell methylation values along the feature/gene.

ADD REPLY • link 1 day ago by LChart 4.7k