Hello all,
We had a problem in our storage and for an unknown reason, we lost data.... I used photorec to recover the lost data, however, photorec recovers (as I understood not the total files), so for example I could recover some bam files and convert them to sam, however they are just a fragment , a couple of mb of data each file.
I realized that there are huge gb binary files, (with a real expected bam files size) and if I just read it , I can see sometimes reads ids in the middle of the binary code... however when I try to read it using samtools view, it can't read , it was complaining about EOF problem (however, I found a good script to insert an EOF end and now samtools doesn't complain about this ) and also about "fail to read the header" ...is there a way to insert a header in this possible bam file? or another option to recover these files??
Ps: we don't have backup =(
Even if you managed to recover readable files, would you trust their content?
the current readable files I don't trust at all...
Go to a professional data recovery specialist.
I would add my voice to Kevin Blighe; don't lose any more time and just go to a professional data recovery.
I still wouldn't trust the recovered data. Most likely either part of the data was overwritten or the storage (disk?) failed and bits have been corrupted. The only situation in which I would trust a recovered file is when only the link to the inode (on Linux) has been removed (i.e. the file was accidentally deleted and nothing had time to overwrite it yet).
You mean that the mere act of trying to recover the data may have damaged it further? This happened to me a few years ago. I realised the serious nature of the failure and did not try to do anything myself.
That's not a good fix, samtools is trying to warn you about a legitimate error and you're just bypassing its error checking mechanism, not addressing the underlying corruption. Don't use that script
I just want to check the content of these binary files! or I just categorized as lost ??
Unless you understand the cause/source of the corruption, without backup there's not much you can do.
At some point, it could be cheaper just to sequence again the samples (if available)
Thank you for all the tips!