I have two text files, which I'm giving download links to rather than a pastebin to preserve their contents precisely:
Both of these text files consist only of spaces, carriage returns, newlines, and the letter X, and they should be ASCII encoded. The only difference between those two files is the second file has leading and trailing blank lines removed, and some leading and trailing spaces on each line removed.
The first file is not causing any problems. For some reason, my text editors are detecting the second file as UTF-8:
- Notepad, when opened by double-clicking the text file, displays corrupt text:
- Notepad, when using File → Open, works fine as long as I explicitly choose "ANSI":
- Notepad++, while displaying the file fine, believes it is encoded as "UTF-8 (No BOM)":
In Notepad++, even if I select "convert to ANSI" and save the file, the saved file is byte-for-byte identical to the original, and both editors still detect it as UTF-8!
Both editors have no issues with the first file and correctly recognize it as ASCII (or ANSI).
I looked at the second text file in a hex editor. Indeed, it does not start with a BOM. The first few bytes of the file are
20 20 20 20 20 20 20 20, as they should be, since it starts with spaces:
My question is: Why, then, do both Notepad and Notepad++ detect the second file as UTF-8? Given that the file has no BOM header, why is this happening, and what is unique about the second file compared to the first file that is causing this? I can't figure out what's going on.