Entering edit mode
14 months ago
Emily
▴
70
I was trying to do a DEG analysis using DESeq2 but keeps giving me InvalidIndexError
when my index object shouldn't be a problem running the pyDESeq2 package.
Here is my code:
counts = pd.read_csv('filename.csv')
counts = counts.set_index('GeneID')
counts = counts.T
metadata = pd.DataFrame(zip(counts.index, ['Ctr','Ctr','Ctr',Ctr', 'KO','KO','KO','KO', 'C2','C2','C2','C2'], columns = ['Sample', 'Condition'])
metadata = metadata.set_index('Sample')
dds = DeseqDataSet(counts=counts, metadata = metadata, design_factors ="Condition")
dds.deseq2() # at this step is where I get "InvalidIndexError"
I did get a message when running DeseqDataSet
command
/Users/anaconda3/lib/python3.11/site-packages/anndata/_core/anndata.py:1900: UserWarning: Variable names are not unique. To make them unique, call `.var_names_make_unique`.
utils.warn_names_duplicates("var")
/Users/anaconda3/lib/python3.11/site-packages/pydeseq2/dds.py:257: UserWarning: Some factor levels in the design contain underscores ('_').
They will be converted to hyphens ('-').
self.obsm["design_matrix"] = build_design_matrix(
I'm wondering if that step is preventing me from executing dds.deseq2()
Do I need to make variable names unique or can I ignore that message?
Try to make the variable names unique and see if it fixes the issue. I would have done just that before posting. Also you need to post the full error message, which will help people assist you.
I figured it out, the duplicates were causing the problems so I removed the ones that were low expressing.
It's concerning your matrix even has dups, you should figure out why that is.