Menu Home

Better SSL with mod_substitute

Renewing my SSL certificates was on my to do list for months and today I’m at home recuperating from a fever that kept me up all night. Since my web server is now patched it is a good time to get new SSL certificates. So I contacted StartSSL and did the deed.

WordPress and SSL has always irked me because just putting a certificate on the web server and using the https URL would still give you elements that are loaded via http (not SSL) and your browser’s address bar would look like this.

ssl-conflict

See that yellow warning triangle over the lock? It irks me. It does. It’s a personality flaw, a blemish, an imperfection. It loudly announces to the world that I’m Doing It All Wrong™. I see that on my site and I hang my head in shame.

OK it’s not really that big a deal. I could play with WordPress SSL plugins but part of my background is configuring applications on servers and Apache2 has a useful module called mod_substitute.

I have two configuration files for my site. One is for the http version and the other is for SSL. It’s like two separate virtual hosts with the same directories.

After I enabled mod_substitute I added these lines to my SSL config.

<Location />
 AddOutputFilterByType SUBSTITUTE text/html
 Substitute "s|href="http://blog.dembowski.net/|href="https://blog.dembowski.net/|"
 Substitute "s|href='http://blog.dembowski.net/|href='https://blog.dembowski.net/|"
 Substitute "s|src=' http:|src='|"
 Substitute "s|src="http:|src="|"
</Location>
# NOTE: Remove the space before the http above

I’m using the alternate delimiter “|” because I don’t want to escape out the URL slashes.

That’s probably too many lines. The first two Substitute lines replaces any URLs of mine from http:// to https://. The next two are for any reference that load elements using plain “http:”. I don’t substitute those with “https:” but instead make those URLs “//” without an explicit protocol.

Doing that gets this image in my browser’s address bar.

ssl-conflict-gone

Green is good. Order is restored.

Why didn’t I use a WordPress HTTPS plugin?

Because I’m lazy and not feeling well. Also using mod_substitute lets me filter the HTML output after WordPress has generated it but before it is sent to the web browser. That gives me more confidence that I’ll get all of the URLs that I want to change.

I’m only using this trick on the SSL version of my site. It’s not a perfect solution and I’m curious to find what this breaks. I had to disable Jetpack’s Photon option because some of my images were not being sent to that CDN properly and there may be other thing as well.

This is not something for everyone (if you’re on a shared host for example) but if you can load Apache2 modules and restart your web server then this may work for you too.

Update: Using (.*) instead of “blog” works for my other vhosts as well. Nope, that breaks LOTS. reverting back.

Categories: Geek

Tagged as:

Jan Dembowski

6 replies

  1. The only thing that is missing, is a rule that will not also change the canonical tags!
    What’s the best way to solve this?

    1. @Toby: What’s the HTML for the canonical tags? The mod_substitute code can be modified to replace anything but in order to do that I’ll need a sample of what you mean.

      Currently I’m using nginx which has a different method for that.

      1. So for http the canonical would point to itself – it should only rewrite anything apart from canonical tags.

        I assume it could be achive by a pretty weird regex within the rule?

        i was also thinking about doing a dummy replace first so that it will skip the replacemet if it runs through all rules sequentially.

        1. @Tobias:

          I assume it could be achive by a pretty weird regex within the rule?

          Perhaps, but I think if you set the Site URL and WordPress Address to a https:// URL then I think it would sort itself out.

  2. The question is – in general – how to rewrite all urls apart from the canonical tag.

    Example 1: Canonical should point to HTTP and be indexed by Google instead of https while a user should only navigate within HTTPS once a HTTPS url is opened.

    Example 2: no protocols should be contained in the HTML – all absolute protocol-urls should be rewritten to // instead of http:// or https:// – however not for the canonical tag as a // canonical reference is pretty useless.

    1. I don’t agree with your examples and using mod_substitute is probably not the best answer for you. 😉

      The canonical URL for this post is this:

      <link rel="canonical" href="https://blog.dembowski.net/2014/better-ssl-with-mod_substitute/" />
      

      You could add mod_substitute rules after the ones above to replace

      <link rel="canonical" href="https://
      

      with

      <link rel="canonical" href="//
      

      And that should work for you. But again, I disagree with your examples.

      Google has already said that https pages will get a small boost. All web browsers should be able to handle the 301 redirect to a valid https and there is no search engine penalty for that 301 redirect. All of my http requests to this site get sent to the https version.

      Yes, with enough traffic an encrypted site becomes noticeably slower than a clear text version. But by enough traffic I mean thousands of page hits per second. That’s a scalability problem and with a CDN you can solve that one.