This neat trick was published on wincustomize.com:
- Create a text file in Notepad (or another text editor, do not use Wordpad, Word or any another word processor).
- Type this sentence exatly, without the quotes: “this app can break”.
- Exit the text editor and open the file in Notepad (by double-clicking, or by File→Open).
- Notice that the text has transformed into “桴獩愠灰挠湡戠敲歡” (a nonsensical Chinese text).
Why did it do that? Michael Kaplan has the full explanation
, but in short it is because Notepad takes a stab at auto-detecting what character encoding the file was saved in, and fails horribly. The same happens all the time on the Web, which is why browsers have implemented various ways of guessing
what the author meant. It often works well, but sometimes it fails. Perhaps not as completely as in the Notepad example above, but enough to make pages difficult or impossible to read.
The only solution to the problem is for Web authors to make sure they declare the character encoding for the documents, scripts and style sheets they create. The easiest way to do this is to make the server software add the tag to the HTTP header, Apache can, for instance, do this with the configuration flag AddDefaultCharset
. If you cannot control the server, you can also add it as a <meta> tag
for HTML, an encoding declaration
for XML, or a @charset at-rule