What are staple and unstaple?

staple is a program that inseparably binds together the data in a file using a cryptographic mechanism known as an All-or-nothing transform. In its most basic form (when executed as staple 0), the transformation is keyless; that is, no key is required to reverse it, however all the data is required. Thus, running unstaple on the output .staple file yields the original file, but running it on any subset of the .staple file yields nothing.

staple can also be asked to do something slightly strange: in the process of executing the All-or-nothing transform, a random key is used for encryption of the data - staple can be instructed to throw away part of that key. (The only argument staple takes is the number of key bytes to throw away; only 0, 2, and 4 are accepted currently.) As a result, the content is now only decipherable by someone who knows the key; in such a circumstance, unstaple can be used to do brute force key search and discover the key, and then reverse the transform. I discuss this feature in more detail in the FAQ section.

Downloads

staple, version 0.2 and unstaple, version 0.2

staple requires OpenSSL to compile and has only been tested on Linux, but should work on most Unix platforms. I'd also love to provide a Windows binary if anyone is willing to contribute one.

staple is released under a new BSD-style license.

Author

Barath Raghavan

FAQ

staple and unstaple are simple tools with interesting consequences, some of which are discussed below.

Basics

What is the point of an All-or-nothing transform?

A basic use is to strengthen encryption: since an all or nothing transform requires that all data be available before decoding can occur, its use as a preprocessing step for encryption means that an attack on an encryption scheme would have to succeed against all blocks in a file to succeed on any blocks in the file.


How can I staple multiple files together?

Easy: just place them all in an archive file, using a tool such as tar, zip, or rar, and then run staple on the resulting archive.


Why are staple and unstaple distributed separately?

Since unstaple contains a mechanism to brute force the key, it may constitute a circumvention device (under the DMCA) and therefore is not officially part of the staple package.


Why does staple read from standard in and write to standard out, whereas unstaple reads from a file and writes to standard out?

staple operates on stdin and stdout to make it as simple and flexible as possible. However, since the all-or-nothing package transform requires multiple passes on the data, and due to the potential copyright implications of unstaple, it needs to operate directly on a file and not accept data via stdin.


Copyright implications

(Caveat: I am not a lawyer and it is unlikely that the scenario below will actually hold up in court.)

Why would I ever want to throw away part of the key?

This might be best explained with an example. Suppose Alice creates some content A and Bob creates some content B. Alice owns the copyright to A and Bob to B. Alice runs staple 2 on A to protect the content and stores the output in A.staple, which she gives to Bob. Alice can run unstaple legally since she owns the content. Since Bob does not own the copyright to A, however, if he were to run unstaple on A and have unstaple perform a brute-force key search, he would be circumventing a copyright protection mechanism and possibly be in violation of the Digital Millennium Copyright Act (DMCA).

It gets more interesting if Bob wants to distribute Alice's content, A. Since he doesn't own the copyright, he probably can't do such distribution legally. Suppose Bob creates an archive file containing both A and B, the latter of which he owns the copyright to. He then runs staple 2 or staple 4 on the archive and publicly distributes it. Bob's friend, Charlie, who doesn't care about copyrights, runs unstaple, brute-forces the key, and recovers both A and B. Alice, however, is stuck. If she wants to prove that Bob is violating the DMCA, she must violate the DMCA herself since the only way for her to verify the contents of the stapled archive is for her to brute-force the key (thereby circumventing Bob's copyright protection mechanism) and recover its entire contents, which includes Bob's copyrighted file, B.

It has been suggested that this scenario occurs if Alice is a content producer/owner, Bob is a content piracy group, and Charlie is a user unconcerned about copyright infringement.

(Of course, if Alice got a warrant, then she could probably force Bob to reveal the contents of the archive.)


How do I force someone to decode the entire .staple file?

While a .staple file can be decoded only with the entire contents of the file, there is nothing fundamental about an All-or-nothing transform to prevent the decoding party from selectively decoding parts of a stapled file.

Considering the above example again, suppose Bob wants to force Alice to have to possess the file B before she can check whether A is part of the archive. To do so, Bob can nest the archive as follows: first he picks a random value r and, using the hash function as a random oracle, hashes the file B together with r to produce a key k. He then encrypts Alice's file A under the key k. Finally, Bob staples B, r, and the encrypted version of A. To verify that the stapled archive contains A, Alice must now decode B and r in their entirety, regenerate the key, and decrypt.

This can be performed using the following commands (supposing the two files are 'A' and 'B'):

First generate the random value r:
head -c 1k /dev/random > r

Then generate the key by hashing:
cat B r | sha1sum | cut -d ' ' -f 1 > key

Then encrypt the file A under the key:
openssl enc -aes-128-cbc -in A -out A.enc -e -K `cat key` -iv `md5sum r | cut -d ' ' -f 1`

Finally, tar and staple the files:
tar c A.enc B r | staple 2 > stuff.tar.staple