scRNA-seq: depositing count matrix without providing raw data
2
0
Entering edit mode
2.4 years ago
Michael ▴ 270

I am very much in favor of always providing raw data for any NGS analysis.

However, for one particular project, I was asked to upload only the count matrices of a scRNA-seq and to not provide raw data.

As far as I know, both NCBI GEO and EMBL-EBI ArrayExpress require raw data. Do you know of a trusted repository where one can upload only the count matrices?

ArrayExpress GEO scRNA-seq • 1.5k views
ADD COMMENT
2
Entering edit mode
2.4 years ago
ATpoint 85k

An option I saw multiple times, e.g by Tabula Muris consortium, is to provide data via FigShare, but I do not know their terms and fees (if they charge).

ADD COMMENT
1
Entering edit mode

We have used FigShare to deposit code and data for reproducible analyses, for example https://www.nature.com/articles/s41597-022-01236-2

FigShare is essentially free, provides a permanent DOI and guarantees more than 10 years of working life. It is a good choice I think for general data types that don't fit into specialist formats such as GEO or ENA.

ADD REPLY
0
Entering edit mode

FigShare is essentially free

Looking at FigShare site it appears that only 20G of space is available for free. More than that will require some kind of a subscription?

ADD REPLY
0
Entering edit mode

The fact that our article was published in Scientific Data entitled us to 100GB for free in FigShare. Datasets larger than that can be published with a one-time payment (e.g., 1TB=USD2500).

OP is uploading count data, not raw data, so even the 20GB limit is very unlikely to be reached.

ADD REPLY
0
Entering edit mode

Good to know. I still think your comment below is important. Withholding data from access that is published needs to be discouraged.

ADD REPLY
0
Entering edit mode

One issue is the availability via third party sites may not be (more or less) perpetual, like NCBI/ArrayExpress.

ADD REPLY
3
Entering edit mode
2.4 years ago
Gordon Smyth ★ 7.7k

NCBI GEO does allow you to upload the count matrices without the raw sequence data, for example:

but you have to have a good reason for doing so, and the only good reason as a rule is ethical privacy issues with human samples. You can't withhold the raw data just because you or the principal investigator don't feel like doing so.

If privacy issues prevent you from making the raw data public, then the alternative and most trusted option is to upload the raw data to a controlled access repository like dbGaP or EGA (https://ega-archive.org) so that access can be restricted to serious researchers who can uphold the privacy restrictions in a secure manner. Even if you do use the GEO option with count matrices only, many journals will still expect you to provide a similar sort of data access application process for the raw data. In the two GEO examples given above, we provided contact details for the data access committee in the published article.

Any form of controlled access requires heavy involvement from your principal investigator and your institution. It is not something that you can setup by yourself if you are not the principal investigator..

ADD COMMENT

Login before adding your answer.

Traffic: 2062 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6