Here's an example of a link I found on YouTube in the comments section of a video.
This is the way it shows up in the comment.
If I highlight this link and copy to clipboard (ctrl+c), then go to a new browser tab and paste it (ctrl+v) in the address bar, then this is how it shows up.
It looks the same, right? But if I hit Enter I get an error.
404 – Page Not Found
The page you were looking for could not be found on the GNU web
If you followed a link that turned out to be broken, and the page with
the broken link mentions an explicit address to which to report bugs,
please use that address.
The URL also changes to the following.
If I remove
%C2%ADtml%EF%BB%BF and type in
tml so that I get back the address
http://www.gnu.org/distros/free-distros.html and then hit Enter, well now it works, and the page loads.
I thought to myself that this is very strange so I tried pasting the same text from clipboard to a plain text editor (notepad) and this is what I got.
How was the dash between h and tml introduced? This is why I was getting the 404 error. But the URL appears correctly when pasted to the address bar. Is this some kind of hidden character perhaps?
Also, if I go back to YouTube and highlight the link, I can see that there is a bump on the last three letters. The highlighting is taller around "tml". You can see that in the screen capture below.
Why is this happening? What's going on? Could it be that Google is somehow intentionally salting the link?
If I paste into Notepad++ (version 6.3) I get following.
If I try to paste into the address bar of the Google Chrome browser, there appears to be some kind of hidden character at the end of the URL. See scree capture below.
That's not a white space. It's something else… something alien! Something from planet X?
Note: The vertical line at the end is not the one I mean, that's just the text input cursor blinking.
Inspecting the html code in Firefox by using the element inspection tool.
Why is there a square within the opening wbr tag?
The "square" appears to be the soft hyphen character entity. Here follows the actual source code of this particular line.
The soft hyphen is the
­ you see here. HTML tags, such as or i.e. for bold text, are not selectable. When you highlight a text of a web page in a browser, you are not selecting the HTML tags. Nothing within
<> is shown.
So it seems that soft hyphen is the root cause of the copy and paste issue. It is not displayed on the web page, but it is selected when you highlight the text.
This is what it looks like when I paste the URL into Microsoft Word 2010 and view hidden characters.
To move the text cursor from
.ht|ml requires pressing the arrow key three times. You can tell by the image above why that is. It's because of this hidden character. With the cursor in front of that strange looking character, pressing Alt+X shows 0068. With the cursor behind that character, and in front of the letter T reveals nothing at all. The 0068 is just the Unicode code page for the letter H.