I am working with RNASeq data and have output dataframes from DESeq2
with the following information in its columns:
shRNA ID | baseMean | log2FoldChange | lfcSE | stat | pvalue | padj
I ultimately replaced the shRNA ID column with the gene that corresponds to said shRNA for analyses. There is more than one shRNA for each gene because of the library prep we used.
I tried uploading some results to PantherDB for the Statistical Enrichment Test. The way my data was collected (with >1 shRNA for most genes), however, means that there is more than one value for each gene. Consequently, I get the following error message:
There are duplicate IDs in the file. For duplicates, the first id/value pair in the file will be used.
Analyses other than those displayed for the statistical enrichment will encounter the same issue.
How should I handle this error? I think it would seem important for the test to account for more than one of the shRNAs with a robust log2Fold change. I also know that I cannot edit the 'gene name' based on the information supplied by PantherDB regarding column names.
As an aside: the dataframe is organized by the largest absolute value of the log2Fold change to the smallest absolute value of the log2Fold change. This means that the largest change for each gene will be encountered first and presumably used.