I sequenced 64 samples and I accidentally erased all of my raw data...The sequencing company is able to recover only 35 out of 64. But I do have all the trimmed data (after trimmomatic) Will it be a problem to publish a paper using these data?
I sequenced 64 samples and I accidentally erased all of my raw data...The sequencing company is able to recover only 35 out of 64. But I do have all the trimmed data (after trimmomatic) Will it be a problem to publish a paper using these data?
I sequenced 64 samples and I accidentally erased all of my raw data...The sequencing company is able to recover only 35 out of 64.
Since there is no way to recover the missing data files it can be a problem since no one can reproduce your results (including you). You should have plenty of libraries left over. Bite the bullet and get them resequenced and repeat the analysis.
Edit: (if the data/analysis is something critical then you may have to re-sequence all samples) otherwise,
But I do have all the trimmed data
I missed the part above. I agree with @Carlo's comments that it is reasonable to go with submission of trimmed data. It would be best to note that the data is trimmed (and that the original data is no longer available because of an error).
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
It is certainly annoying, and I mostly agree with what genomax said. However, in practice, I have yet to see a dataset for which starting from the raw reads or the trimmed reads changed much in the analysis. If we assume that the trimming was done reasonably (do you have the trimmomatic reports to make sure of that ?), the benefit of resequencing 30 samples is very small, compared to its cost in time and money + potential batch effect. I would rather invest the ressources into confirming the results using an orthogonal approach.
These are just some thought, I don't know what I would do if I was in your place. And I never want to be.
Thank you guys for the responses. I don’t think I will resequence the missing data (also my PI won’t pay for it for sure).
I did very little trimming, just removed the adaptors and trimmed a little bit towards the 3’ end. Only ~200 reads were discarded in each trimmed file...So that’s almost the same as the raw data.
Also I did fastqc before and after trimming and kept all the reports (very informative) Hopefully that would be helpful as well...
In this project RNA-seq is just the first step to get some candidate genes for the subsequent functional study. Maybe I can do a qPCR on those genes to confirm the expression levels.
(Sadly) even sharing the trimmed data is actually sharing much more information than a great deal of the studies out there...