A question about Shredding files using Python

Forums: 

I'm running on Windows Vista...

I've just started using BleachBit and so far I'm very impressed. There were two requirements I had when searching for such a tool: the first was the ability to securely shred files and the other was the ability to add my own folders, files and registry entries to the built-in ones. CleanerML is excellent for defining my own cleaners and is far superior to what I'd previously been using in CyberScrub, which I stopped using some time ago due to my dissatisfaction with several product updates.

I've read the wiki article you've quoted in your source code which claims that on modern media, files only need to be overwritten once to be shredded.

My question is therefore probably more to do with Python.
How can you be sure that simply writing blanks to the file stream and performing a flush is actually overwriting the same disk clusters that the original file contents reside on ?
I was expecting a much more low-level implementation dealing with the MFT as I would expect the Windows swap file and Windows disk caching to do their own thing and to possibly allocate new space for the new data being written and to free the original clusters.

For freeing-up space and clearing out old files BleachBit looks very good, but I have my doubts at the moment about its ability to securely erase file content.
I'm happy to be persuaded to the contrary though.

Also, how does BleachBit deal with securely removing details of a file's original name ?

What you ask is a complicated problem, so the answer is complicated too. The most important thing to remember is "secure" is not black or white (just like driving a car is never completely "safe"): there are various levels of security or privacy, and each person should choose the most appropriate level, each with its own cost, considering his specific cirumstances.

To answer your first question, first there is the behaviour of the particular file system. On Windows, NTFS (and in Linux, ext3 in the most popular default mode) writes files to the same place, so assuming the application (OpenOffice.org, MS Word, GIMP, etc) hasn't moved it since it was first created and later resaved, secure wiping should work as expected. When wiping files or the disk, the swap file is not significant because disk cache is always in physical RAM and not swapped to disk.

That said, there are limitations. For example, say before BleachBit (or any file cleaner app) sees a file, the application (OpenOffice.org, GIMP, Microsoft Word, etc) cuts it from 3MB to 1MB because the user deleted half the content. When BleachBit (or any other file wiping app) wipes it later as 1MB, it will have no way of knowing where the 2MB was previously stored on the disk. Another thing that can happen when the application wants to save a new version of a document, it may write it to a new file, delete the old version, and rename the new version to the old name: this leaves a sort of shadow copy which not visible to the OS or to the cleaner . These two problems are generally overcome by wiping free disk space (at the cost of a long, long wait).

Another limitation is slack space: all file systems (NTFS, ext3, reiserfs, etc) store flies in a block size, which is a tiny chunk like 4096 bytes. If a file isn't a multiple of the block size, applications (including BleachBit) won't see the remainder, but a small amount of data from a previous file will remain on the physical hard drive. When shredding individual files, BleachBit writes extra data to clean the slack space, but it doesn't clean slack space when wiping free space on the whole hard drive (this is can be very tricky).

BleachBit tries to wipe traces of old file names after wiping free disk space: when the disk is full, BleachBit creates many empty files until the MFT is full too.

You can read more about these problems and others (like HPA, or host protected area) under the tag secure delete.

Even in the case of a hypothetical perfect cleaning tool, there are other issues like backups and network data: Google, your ISP, and probably government agencies know what you do online. That said, BleachBit (and most apps like it) provide some privacy from casual observers (family members, most employers, many strangers), but won't protect you against highly-resourced, highly-motivated organizations. To protect against those cases, you would start by wiping the whole hard drive (including operating system and HPA) using a program like DBAN or by physically destroying the hard drive (using a mechanical shredder or degausser). Then, you would want to take some measures online, on disk backups, against hardware loggers, blacking out your windows, etc.

Even if you did all that, a highly-motivated person would probably drug you or commit violence to get information from you.

---
Andrew, lead developer

Thanks for such a comprehensive reply, Andrew. However, I don't think my question is as complicated as you've taken it to be :) Your reply goes into great detail, similar to other discussions that I've read from you elsewhere, about what constitutes secure data deletion. That wasn't really my question.

My question is whether Python's implementation of f.open('fn', wb), doing multiple f.write(4k of blanks) followed by f.flush in BleachBit is guaranteed to actually overwrite anything of the original file data.
I believe you've answered that in part, in that, as you say, NTFS should write to the existing disk locations - although it has to be said that that's provided the file isn't compressed, EFS encrypted or a sparse file. For completeness, I also need to investigate what FAT32 would do with the exact same file operations.

Am I correct in assuming that after opening a file in binary write mode a Python f.write() operation will move the file pointer so that a subsequent write will be at the next location in the file ?

If so, for me that just leaves the issue that most wiping utilities will do multiple passes and consequently they open the file with the FILE_FLAG_WRITE_THROUGH flag set to write directly to the disk. Otherwise, only the last pattern is likely to actually get written to disk, all others being cached. BleachBit doesn't appear to set that flag, but if the Wiki article is to be believed, only one write of the entire file is necessary, so it doesn't matter if BleachBit's writing gets flushed after all the writes have taken place.

I'm currently converting Mark Russinovich's secure delete utility to Delphi so as to add a GUI front-end and shell context menu deletion options for selected files and folders.

It might be useful (not just to me) if BleachBit could call a DLL or shell out to a command-line executable for each file it wants to wipe - rather like when chess programs use a different chess engine to the built-in one. That way, users can use BleachBit to decide what to delete, but use their favourite wiping program (whether that's BCWipe, Mark R's SDelete, Eraser or my one etc. etc) to actually do the wipe. Is that something you'd consider implementing ?

When all's said and done I'm definately warming to BleachBit for both my deletion requirements: fine control over what to delete and reasonably secure wiping of file contents.

Short of completely destroying the hard drive I doubt that there is any complete way to insure data destruction. However the amount of effort need to restore a computer after using such programs is rather great and requires some resources. It is highly unlikely that someone is going to bother to do so on a personal computer in hopes of finding useful personal information. Now if your have data of high importance, such as multiple customers social and credit card numbers on your computer I would be worried. If so you best bet is to destroy the hard drive. The company I work for uses http://datakillers.com/ to do hard drive shredding. Might be a bit over kill but its kind of fun to watch. Easier the trying to destroy 35 hard drives by hand. However as the pointed out above, while it maybe possible to restore such files through a variety of sources its not something most people would begin to know how to do and not very likely that anyone capable would bother to try.