Forum:Recovering bam files after unknown deletion in the storage
1
1
Entering edit mode
5.4 years ago

Hello all,

We had a problem in our storage and for an unknown reason, we lost data.... I used photorec to recover the lost data, however, photorec recovers (as I understood not the total files), so for example I could recover some bam files and convert them to sam, however they are just a fragment , a couple of mb of data each file.

I realized that there are huge gb binary files, (with a real expected bam files size) and if I just read it , I can see sometimes reads ids in the middle of the binary code... however when I try to read it using samtools view, it can't read , it was complaining about EOF problem (however, I found a good script to insert an EOF end and now samtools doesn't complain about this ) and also about "fail to read the header" ...is there a way to insert a header in this possible bam file? or another option to recover these files??

Ps: we don't have backup =(

photorec bam sequencing • 2.8k views
ADD COMMENT
2
Entering edit mode

Even if you managed to recover readable files, would you trust their content?

ADD REPLY
0
Entering edit mode

the current readable files I don't trust at all...

ADD REPLY
2
Entering edit mode

Go to a professional data recovery specialist.

ADD REPLY
1
Entering edit mode

I would add my voice to Kevin Blighe; don't lose any more time and just go to a professional data recovery.

ADD REPLY
1
Entering edit mode

I still wouldn't trust the recovered data. Most likely either part of the data was overwritten or the storage (disk?) failed and bits have been corrupted. The only situation in which I would trust a recovered file is when only the link to the inode (on Linux) has been removed (i.e. the file was accidentally deleted and nothing had time to overwrite it yet).

ADD REPLY
0
Entering edit mode

You mean that the mere act of trying to recover the data may have damaged it further? This happened to me a few years ago. I realised the serious nature of the failure and did not try to do anything myself.

ADD REPLY
1
Entering edit mode

however, I found a good script to insert an EOF end and now samtools doesn't complain about this

That's not a good fix, samtools is trying to warn you about a legitimate error and you're just bypassing its error checking mechanism, not addressing the underlying corruption. Don't use that script

ADD REPLY
0
Entering edit mode

I just want to check the content of these binary files! or I just categorized as lost ??

ADD REPLY
0
Entering edit mode

Unless you understand the cause/source of the corruption, without backup there's not much you can do.

ADD REPLY
1
Entering edit mode

At some point, it could be cheaper just to sequence again the samples (if available)

ADD REPLY
0
Entering edit mode

Thank you for all the tips!

ADD REPLY
3
Entering edit mode
5.4 years ago
jkbonfield ★ 1.3k

Firstly, don't mount those drives read-write any more. Read-only from now on or you'll exacerbate any data recovery chance by either yourself or professionals.

BAM has specific signatures that data recovery tools are unlikely to spot, but you could, perhaps extract them yourself from raw disk images (assuming it's not some complex raid stripe). If you get someone in to recover your data, make sure you explain to them the nature of the BAM format (a series of small concatenated gzip files) as it may help. Spotting a whole series of gzip signatures in the raw disk images is the BAM equivalent of what your photo-recovery tool is attempting to do with images. So it's possible, but very complex and bespoke.

Also use a modern samtools/htslib as these will check for CRC errors in BAM (older ones didn't, neither do some of the other BAM readers out there). If it's a recent tool and not complaining about CRC then the data is probably correct. However frankly if you've only recovered a few Mb from each file that is expected to be Gb then you've basically got nothing of value left. You need to weigh up the value of the data vs the cost of professional recovery services.

ADD COMMENT
0
Entering edit mode

+1 - very aptly said..

to my surprise, Photorec does understand the BAM/SAM, the specifications are very old though: https://www.cgsecurity.org/wiki/File_Formats_Recovered_By_PhotoRec

ADD REPLY

Login before adding your answer.

Traffic: 2277 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6