Shakespeare.txt.jpg

Warning! This is old. It was last updated in June 2013 and may be obsolete, outdated, unsafe or just embarrassing. Treat with caution.

A JPEG compression experiment

JPEG image compression is lossy. Every time you edit and save a picture, some of the original content is lost. But it's difficult to see that with the naked eye, so I compressed Shakespeare instead.

A book with text that starts '!O Romep+ Rpldo  wiepffnre arr!riov Romep@'.

“O Romep+ Rpldo wiepffnre arr!riov Romep@
Dgoy thz gatggr `me tefusf sgx n`me!”

That's the balcony scene from Romeo and Juliet, compressed at “maximum” quality in Photoshop: I loaded the text as a RAW, then outputted the compressed file back to plain text.

Even on ‘maximum’ quality, almost all the characters are replaced by their neighbours in the alphabet. On an image, that would be a minuscule change in colour, undetectable to the eye: but rearranged into a different form, even ‘maximum’ quality is enough to render the text a significant challenge to decipher.

So I tried it at various qualities, all the way down to Photoshop's ‘minimum’. Then, for the heck of it, I got them all bound as books.

Six books piled up. The spine of the top reads 'The Tragedy of Romeo and Juliet'; the rest are jumbled strings of characters.

At higher qualities, the text still maintains the character of a play, but the words grow increasingly incomprehensible.

As the quality degrades, many characters were converted into ASCII control codes: in this case, for publishing, I rendered them as spaces (save for vertical tab and carriage return, which were converted to new lines). Worse, a lot of new lines become corrupted into regular characters, reducing the play to a string of nonsense.

A jumble of text.

But the strange thing is this: on the front of each book is the JPEG image it was derived from. And, for all but the lowest quality, they appear utterly identical to the naked eye.

Book covers.

We're sensitive to data loss in text form: we can only consume a few dozens of bytes per second, and so any error is obvious. Conversely, we're almost blind to it in pictures and images: and so losing quality doesn't bother us all that much.

Should it?

Update

Hello unexpected influx of readers! A few notes for you:

Contact