Hello,
I am analyzing some ATAC samples and I would like to add the motif information to my objects. So far I have imported my fragment files and clustered the cells.
I would like to add the filtered_tf_bc_matrix information so I can do differential expression based on the motifs but I am having trouble figuring out how to do that.
I tried to import the motif file as shown below:
# tf-bc matrix
matrix_dir = "filtered_tf_bc_matrix"
mat = scipy.io.mmread(os.path.join(matrix_dir, "matrix.mtx.gz"))
motifs_path = os.path.join(matrix_dir, "motifs.tsv")
motif_ids = [row[0] for row in csv.reader(open(motifs_path), delimiter="\t")]
motif_names = [row[1] for row in csv.reader(open(motifs_path), delimiter="\t")]
barcodes_path = os.path.join(matrix_dir, "barcodes.tsv.gz")
barcodes = [row[0] for row in csv.reader(gzip.open(barcodes_path, mode="rt"), delimiter="\t")]
# transform table to pandas dataframe and label rows and columns
matrix = pd.DataFrame.sparse.from_spmatrix(mat)
matrix.columns = barcodes
matrix.insert(loc=0, column="motif_names", value=motif_names)
matrix.insert(loc=0, column="motif_ids", value=motif_ids)
# display matrix
print(matrix)
# save the table as a CSV (note the CSV will be a very large file)
matrix.to_csv("mex_matrix.csv", index=False)
I then tried to create the anndata file by doing adatamotif = ad.AnnData(matrix)
but the object does not seem correct.
Is there another way I can add the motif information to my anndata or anndataset object?
The anndataset object I have looks like this:
AnnDataSet object with n_obs x n_vars = 20028 x 526765 backed at 'All_samples.h5ads'
contains 7 AnnData objects with keys: '1_fragments.tsv.gz', 2_fragments.tsv.gz', '3_fragments.tsv.gz', '4_fragments.tsv.gz''
obs: 'sample', 'leiden'
var: 'count', 'selected'
uns: 'reference_sequences', 'AnnDataSet', 'spectral_eigenvalue'
obsm: 'X_umap', 'X_spectral'
obsp: 'distances'
The anndata object I had look like this
[AnnData object with n_obs x n_vars = 8057 x 0 backed at '1_fragments.tsv.gz.h5ad'
obs: 'n_fragment', 'frac_dup', 'frac_mito'
uns: 'reference_sequences'
obsm: 'fragment_paired',
AnnData object with n_obs x n_vars = 3804 x 0 backed at '2_fragments.tsv.gz.h5ad'
obs: 'n_fragment', 'frac_dup', 'frac_mito'
uns: 'reference_sequences'
obsm: 'fragment_paired',
AnnData object with n_obs x n_vars = 811 x 0 backed at '3_fragments.tsv.gz.h5ad'
obs: 'n_fragment', 'frac_dup', 'frac_mito'
uns: 'reference_sequences'
obsm: 'fragment_paired',
AnnData object with n_obs x n_vars = 2368 x 0 backed at '4_fragments.tsv.gz.h5ad'
obs: 'n_fragment', 'frac_dup', 'frac_mito'
uns: 'reference_sequences'
obsm: 'fragment_paired']
Thank you