Question

Loading a pre-filtered loom file into pySCENIC

0

Entering edit mode

2.3 years ago

Jacob • 0

Hi,

I am trying to run pySCENIC, specifically the SCENIC step using a dataset that has already been filtered and treated for batch effects. I tried to just load in the dataset into the the first pyscenic step pyscenic grn, but that tells me:

pyscenic.cli.pyscenic - ERROR - Unknown file format "/gpfs/home/jig16/pySCENIC/pancreas.integrated.combined.loom1"

I am hoping that is because the variable names are different, for example, there is no column attribute for "nGene" or "nUMI", potentially they are named something different. Is there a way to rename the column/ row attributes in order to get the file to be the right format? I am not even sure if that is the right approach or if that is what is meant by "unknown file format", if anyone has any other ideas on how to get this done that would be greatly appreciated!

I am using pySCENIC from this tutorial: https://github.com/aertslab/SCENICprotocol/tree/master/notebooks/PBMC10k_SCENIC-protocol-CLI.ipynb

Thank you.

scRNA-seq pySCENIC loom • 3.1k views

ADD COMMENT • link updated 20 months ago by Ram 44k • written 2.3 years ago by Jacob • 0

0

Entering edit mode

Is it a typo, or does the input file have a loom1 extension?

ADD REPLY • link 2.3 years ago by Arup Ghosh 3.2k

0

Entering edit mode

the file has a loom1 extension, is that the problem? the loom file is able to be read by sc.read_loom.

ADD REPLY • link 2.3 years ago by Jacob • 0

0

Entering edit mode

The file extension is the issue, as Pyscenic only accepts file names ending with .loom or .h5ad extension.

https://github.com/aertslab/pySCENIC/blob/master/src/pyscenic/cli/pyscenic.py#L278

ADD REPLY • link 2.3 years ago by Arup Ghosh 3.2k

0

Entering edit mode

Thank you, Arup

I loaded in a new loom file with the correct extension, and this is the output I get:

2022-08-15 11:12:20,606 - pyscenic.cli.pyscenic - INFO - Loading expression matrix. Traceback (most recent call last): File ".conda/envs/noel_pyscenic2/bin/pyscenic", line 8, in <module> sys.exit(main()) File "/gpfs/home/jig16/.conda/envs/noel_pyscenic2/lib/python3.7/site-packages/pyscenic/cli/pyscenic.py", line 485, in main args.func(args) File "/gpfs/home/jig16/.conda/envs/noel_pyscenic2/lib/python3.7/site-packages/pyscenic/cli/pyscenic.py", line 49, in find_adjacencies_command args.gene_attribute) File "/gpfs/home/jig16/.conda/envs/noel_pyscenic2/lib/python3.7/site-packages/pyscenic/cli/utils.py", line 116, in load_exp_matrix return load_exp_matrix_as_loom(fname, return_sparse, attribute_name_cell_id, attribute_name_gene) File "/gpfs/home/jig16/.conda/envs/noel_pyscenic2/lib/python3.7/site-packages/pyscenic/cli/utils.py", line 69, in load_exp_matrix_as_loom with lp.connect(fname,mode='r',validate=False) as ds: File "/gpfs/home/jig16/.conda/envs/noel_pyscenic2/lib/python3.7/site-packages/loompy/loompy.py", line 1389, in connect return LoomConnection(filename, mode, validate=validate) File "/gpfs/home/jig16/.conda/envs/noel_pyscenic2/lib/python3.7/site-packages/loompy/loompy.py", line 86, in __init__ if "matrix" in self._file: File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "/gpfs/home/jig16/.conda/envs/noel_pyscenic2/lib/python3.7/site-packages/h5py/_hl/group.py", line 439, in __contains__ return self._e(name) in self.id File "h5py/h5g.pyx", line 462, in h5py.h5g.GroupID.__contains__ File "h5py/h5g.pyx", line 463, in h5py.h5g.GroupID.__contains__ File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "h5py/h5g.pyx", line 532, in h5py.h5g._path_valid File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "h5py/h5l.pyx", line 212, in h5py.h5l.LinkProxy.exists

RuntimeError: Unable to get link info (bad local heap signature)

EmptyDataError Traceback (most recent call last)

<ipython-input-9-ec53d29ef886> in <module> 2 3 get_ipython().system(" pyscenic grn {f_loom_path_scenic} {f_tfs} -o '/gpfs/home/jig16/pySCENIC/juliaAdj1.csv' --num_workers 4") ----> 4 adjacencies = pd.read_csv('/gpfs/home/jig16/pySCENIC/juliaAdj1.csv', index_col=False, sep='\t') 5 adjacencies.head()

~/.conda/envs/noel_pyscenic2/lib/python3.7/site-packages/pandas/io/parsers.py in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options) 608 kwds.update(kwds_defaults) 609 --> 610 return _read(filepath_or_buffer, kwds) 611 612

~/.conda/envs/noel_pyscenic2/lib/python3.7/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds) 460 461 # Create the parser. --> 462 parser = TextFileReader(filepath_or_buffer, **kwds) 463 464 if chunksize or iterator:

~/.conda/envs/noel_pyscenic2/lib/python3.7/site-packages/pandas/io/parsers.py in __init__(self, f, engine, **kwds) 817 self.options["has_index_names"] = kwds["has_index_names"] 818 --> 819 self._engine = self._make_engine(self.engine) 820 821 def close(self):

~/.conda/envs/noel_pyscenic2/lib/python3.7/site-packages/pandas/io/parsers.py in _make_engine(self, engine) 1048 ) 1049 # error: Too many arguments for "ParserBase" -> 1050 return mappingengine # type: ignore[call-arg] 1051 1052 def _failover_to_python(self):

~/.conda/envs/noel_pyscenic2/lib/python3.7/site-packages/pandas/io/parsers.py in __init__(self, src, kwds) 1896 1897 try: -> 1898 self._reader = parsers.TextReader(self.handles.handle, kwds) 1899 except Exception: 1900 self.handles.close()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.__cinit__()

EmptyDataError: No columns to parse from file

I still think that its the result of this loom file not having the correct column attribute names, do you know of the easiest way to change the column names?

ADD REPLY • link 2.3 years ago by Jacob • 0

1

Entering edit mode

Validate the exported .loom file using loompy. The LoomValidator() function will be able to provide details about the incorrect file attributes.

https://linnarssonlab.org/loompy/fullapi/loom_validator.html

ADD REPLY • link 2.3 years ago by Arup Ghosh 3.2k

0

Entering edit mode

You are amazing Arup!

I am missing a lot of things :/ see the output:

ERROR: Row attribute 'Gene' dtype object is not allowed ERROR: Column attribute 'CellID' dtype object is not allowed ERROR: Column attribute 'integrated_snn_res.0.1' dtype object is not allowed ERROR: Column attribute 'integrated_snn_res.0.2' dtype object is not allowed ERROR: Column attribute 'integrated_snn_res.0.3' dtype object is not allowed ERROR: Column attribute 'integrated_snn_res.0.4' dtype object is not allowed ERROR: Column attribute 'integrated_snn_res.0.5' dtype object is not allowed ERROR: Column attribute 'integrated_snn_res.0.6' dtype object is not allowed ERROR: Column attribute 'integrated_snn_res.0.7' dtype object is not allowed ERROR: Column attribute 'integrated_snn_res.0.8' dtype object is not allowed ERROR: Column attribute 'integrated_snn_res.0.9' dtype object is not allowed ERROR: Column attribute 'integrated_snn_res.1' dtype object is not allowed ERROR: Column attribute 'orig.ident' dtype object is not allowed ERROR: Column attribute 'seurat_clusters' dtype object is not allowed ERROR: Column attribute 'tech' dtype object is not allowed ERROR: For help, see http://linnarssonlab.org/loompy/format/ ERROR: Column attribute 'ClusterID' is missing ERROR: Row attribute 'Accession' is missing ERROR: For help, see http://linnarssonlab.org/loompy/conventions/

Still the question remains, how can I change the column attribute names on the current loom file, for example, it says ClusterID is missing, perhaps that is stored in another variable such as seurat_clusters, which would mean I need to change the name of the seurat_clusters col_attribute to ClusterID. I can work with my PI to determine exactly which one is which, but could you provide the framework to change the name, if you know it?

Thank you!

ADD REPLY • link 2.3 years ago by Jacob • 0

0

Entering edit mode

I found the following tutorial to export the Seurat object as a loom file.

https://bookdown.org/ytliu13207/SingleCellMultiOmicsDataAnalysis/scenic.html#tranfer-into-loom-files