Update: Sometimes I DO over think a problem and a solution. Which is odd because SSL is also one of my (supposedly!) strong points. Skip to the comments below for something that Andrew Nacin pointed out. 😀
——————————–
Part of my professional life is to think about topics like data leakage. That’s when you do something and, without realizing it, you transmit information that you hadn’t intended to.
For example, my company may have an internal web page with this URL:
And on that page is a link to a NY Times DealBook blog posting as a reference. One of the readers in my company clicks on that link without hesitation. Why wouldn’t they click? That’s what the link is there for.
When Dealbook processes their web access logs, they’ll see a URL as the HTTP referer (I’m spelling it correctly after this) that the company or person who clicked that link may not want them to see.
How to prevent sensitive referrers from being sent from your WordPress blog?
- Install and configure YOURLS (svn revision 703). Get that working with a short domain, it’s easy to do.
- Install and activate my short Force Javascript Redirection YOURLS plugin. [download id=”1″]The useful bit is only one line.
- Install my WordPress Convert Links to Yourls plugin but don’t activate it yet.[download id=”3″]
- Modify two lines in that WordPress plugin for your configuration. Sorry, I’m not up to making an options page (yet).
- Active that WordPress plugin.
And poof! the Tin Foil Hat is in place. Any links in your post content or comment text will have their links sent to your very own link shortner and the remote site will only see the short link as the referrer.
Read on to see how it works.
How does the YOURLS part work again?
When you visit a web page your web browser transmits the originating web page where you linked as the HTTP referrer. The browser is programed so if you visit a site that redirects you via a 301 or 302 HTTP status code, the referrer gets transmitted again by your web browser to the new web server.
That’s why when you use YOURLS and have a short link on your web page, the ultimate destination still sees the original HTTP referrer URL.
That link shortener uses a 301 status code to forward your browser to the correct long link. It only redirects via JavaScript is the headers were not successfully sent. That 301 code means the target web server see’s where you came from.
But if the JavaScript method is used, then the status code is 200. The web browser goes to the short URL, the JavaScript send you to the target URL and the short URL is now the referrer that the destination sees.
That’s what my YOURLS plugin does. It uses a filter and tells YOURLS “Don’t figure it out, just use the JavaScript method for redirection”.
That will obscure the destination web server from seeing the original referrer link. All they’ll see is the short link and who cares if they see that?
To install my YOURLS plugin, extract the zip file to the YOURLS user/plugins
directory and activate it in the YOURLS admin page.
Okay, so what about that WordPress plugin?
I’m thinking of two areas in WordPress that could contain links and that’s the post content and the comment text. Both of those are easy to filter in WordPress and I’ve created a plugin that does the following:
- Finds all the links in the post and comments and pop them into an array.
- Send each one of those links to YOURLS to get the short link.
- Substitutes each of the original links with the corresponding short link.
- Returns the modified output as the post or comment.
If you don’t have it already, you will need to add the PHP cURL extension to your web server.
This plugin doesn’t modify the links in the database, it just processes them via a filter. That way the HTML for the links sent to the readers web browser gets the short links.
When the someone clicks any of the short links they are sent to the YOURLS web server which then uses JavaScript to redirect them. Again, the target web server sees that short link as the referrer and not the original post.
How do you configure the WordPress plugin?
Install the Convert Links to YOURLS plugin and modify the two variables around line 51 of convert-links-yourls.php:
$mh_api_url = 'http://yyy.yy/yourls-api.php'; $mh_signature = 'XXXXXXXXXX';
Update those two with the information from your own YOURLS configuration.
If you don’t update it, or somethings is wrong with your YOURLS setup then you won’t get the URLs substituted and the original URLs will remain intact. You won’t get an error message either so make sure your settings are correct.
So what’s the catch?
I don’t store the result of the URL shortening so each time the post/comment field is to be displayed, each URL gets sent to the shortner again. That’s not very efficient and results in your link shortner getting a request every time the link is filtered.
For a small WordPress site, that’s not a problem. Small installations don’t generate enough traffic to make your web server break a sweat.
But for web sites that create their own Slashdot effect (and you know who you are), it would compound the number of hits on your server. That would be bad during a self made DoS. Using a caching plugin will probably reduce those requests but I haven’t really checked to see.
The first time links are shortened this adds a small but noticeable lag while the URLs are being processed. Once the links are in the YOURLS database then the web pages zip like before.
On the link shortener side, if the URL is not in the database then it get’s added and the short URL is returned.
If the URL already exists in the database then that old short link is sent back without creating a duplicate short link.
Or more accurately, that’s what happened up to YOURLS 1.5.1-gamma svn revision 703 and prior. With 704 and up when the link is already shortened then the YOURLS server is returned without the short link URL.
Between the 703 and current (as of this writing) 712 revisions not that many files have been updated so I’ll report the issue on the wiki.
You can get the 703 revision using this command
$ cd yourls-root $ svn co -r 703 http://yourls.googlecode.com/svn/trunk .
DUDE. Seriously, you worry about this stuff?
No, not particularly.
Data leakage via HTTP referrers really isn’t a problem for me and if I were really concerned I would run all manner of privacy plugins and use Tor. This is really just an exercise and it was an interesting problem.
In figuring out this one solution (there are others) I was able to learn about a cool regex, how to populate that regex result into an array (using the same command) and learn more about the link shortner software that I use.
I not only got to write a small WordPress plugin but also see how Ozh’s sample plugins work. That’s some serious Cool Beans there and I’m having a great time.
Download the code and take a look, it’s all GPL 2. Later on I’ll add the license and readme.txt files. It’s not complicated and I’m always looking to make improvements and get feedback. It’s a great way to learn new things.
But now if you’ll excuse me, I need to go visit the super market. They have a sale on Reynolds Wrap and I want to re-line one of my baseball caps.
You can never really know when a good Tin Foil Hat can come in handy.
Andrew Nacin says:
Pages over SSL don’t send referers. Perhaps setting up a forced SSL situation is a bit easier 🙂
March 6, 2012 — 11:10 am
Jan Dembowski says:
Now how did I manage to miss that? *Adjusts Tin Foil Hat*
I need to try that but I’m 99.99% sure you’re correct. 😉
March 6, 2012 — 11:24 am
Jan Dembowski says:
And that is now 100% confirmed! DOH! *HEAD DESK*
THAT IS JUST SO MUCH EASIER!!! But all in all, this was a fun experiment in PHP and Tin Foil Hats. The function
preg_match_all()
is still my favorite new toy. 😀The web page I tested from is reachable from both http and https. On that server I put two links to different pages on the same target web server. Both links were to non-SSL pages.
When I went to the non-SSL version of the page, sure enough the HTTP referrer shows up on the target web server.
But when I went to the SSL version of that web page and click the links, the HTTP referrer was not forwarded. The target server’s log showed the page request but no referrer.
That is really funny that I missed that easy to do and simple solution! Sometimes it just doesn’t pay to over think a problem.
So there you have it: if you want to hide your referrer from the target web site, just make your website SSL based, enforce that and you’re all set to go.
March 6, 2012 — 12:58 pm