9 Useful mod_rewrite Recipes

Jul 18, 12:17 am

[The following excerpt is from Chapter 7 of The Definitive Guide to Apache mod_rewrite by Rich Bowen.]

Throughout the first several chapters of this book, you were exposed to numerous simple rewrite examples. Some of them were actually useful, while others existed purely to introduce concepts. This chapter takes a practical turn, in that the examples offered actually solve real-world problems. While not all of these will be directly applicable to you, working through the examples and their explanations will nonetheless provide you with a better understanding of what’s involved in crafting rewrite solutions to problems that you experience on your own website.

Specifically, we’ll discuss the following topics:

  • Adjusting URLs: mod_rewrite gets used a lot to make ugly URLs less ugly, or perhaps easier to type, remember, and bookmark.
  • Renaming and reorganization: When your website gets redesigned, mod_rewrite can help you make sure that all the existing bookmarks and links to your website keep working, while gradually moving people over to the new structure.

Adjusting URLs

The most common use of mod_rewrite appears to be adjusting URLs from one layout to another. This may be because the URLs on a site have been changed around for some reason, or perhaps it’s because the site administrator believes that the URLs are too ugly and something else would be easier on the eyes.

We’ll start with a simple example of this and then move on to some slightly more complex examples.

Problem: We Want to Rewrite Path Information to a Query String (Example 1)

We have a URL that looks like this: http://example.com/vegetables.php?carrots. We’d like it to look like this instead: http://example.com/vegetables/carrots.

Solution

mod_rewrite:
RewriteEngine On
RewriteRule ^/vegetables/(.*) /vegetables.php?$1 [PT]

Discussion

The effect of this rule will be that anything appearing after /vegetables/ will be put into the query string. This is the simplest possible example of this class of rewrites, and it’s almost always simpler than what you actually wanted to do. But it’s a good starting place to see how you might approach something like this.

As always, if we are in fact using this in an .htaccess file, we need to remove the leading slash on the rule. It would therefore become

mod_rewrite:
RewriteRule ^vegetables/(.*) vegetables.php?$1 [PT]

If you are trying to split off more arguments than just one, you will want to move on to the next example.

Problem:We Want to Rewrite Path Information to a Query String (Example 2)

We have a URL that looks like this: http://example.com/cgi-bin/book.cgi?author=bowen&topic=modrewrite. We’d rather have it look like this: http://example.com/book/bowen/modrewrite.

Solution

mod_rewrite:
RewriteEngine On
RewriteRule ^/book/([^/]*)/([^/]*) \
/cgi-bin/book.cgi?author=$1&topic=$2 [PT]

Discussion

The solution here is an oversimplification on a couple of counts. In particular, it requires that the requested URL look exactly like the URL that is described in the pattern. That is, in this case, it must be /book/ followed by something, followed by slash, followed by something. If there are more (or fewer) "something"s, then the pattern won’t match. The "something", in this case, is the pattern [^/]*, which means "some not-slash characters"

If you want a more flexible solution than this one, read on or consider using the RewriteMap directive discussed in Chapter 5.

Problem:We Want to Rewrite Path Information to a Query String (Example 3)

We have a URL that might have one, two, three, or four arguments, and we want to rewrite them to query string arguments. That is, we want URLs of the following form:

example:
http://example.com/pets
http://example.com/pets/mammals
http://example.com/pets/mammals/dogs
http://example.com/pets/mammals/dogs/shorthaired
http://example.com/pets/mammals/docs/shorthaired/dachshunds

to be mapped internally to the following:

example:
http://example.com/pets.php
http://example.com/pets.php?class=mammals
http://example.com/pets.php?class=mammals&family=dogs
http://example.com/pets.php?class=mammals&family=dogs&hair=shorthaired
http://example.com/pets.php?class=mammals&family=dogs&hair=shorthaired&species=dachshund

In each case, however, we want the URL displayed in the browser to appear as the original, not as the longer target URL.

Solution

We rewrite the URLs with the following rules, and then instruct the PHP file in question to ignore blank arguments:

mod_rewrite:
RewriteEngine On
RewriteRule ^/pets/?([^/]*)/?([^/]*)/?([^/]*)/?([^/]*)/? \
/pets.php?class=$1&family=$2&hair=$3&species=$4 [PT]

Discussion

Although it would be possible, with a series of RewriteCond directives, to deal with each case individually, it’s just not worth the effort. Most well-written code will deal gracefully with blank arguments, and it makes a lot more sense to do this processing in the PHP (or CGI, or whatever) than to try to do it in the RewriteRule, where it is far less efficient.

The rule that we’re using here makes everything optional, starting with the first slash after pets, so any number of arguments will be sufficient. The question mark (?) character makes the slash itself optional, and the character class [^/] ("not slash") is optional by virtue of the *, which, as you recall, means "zero or more."

This is a good technique for handling a site that is hierarchical. As we proceed deeper into the URL structure of the website, each additional argument will be passed to the handling script.

Problem:We Have More Than Nine Arguments

Regular expressions (at least in the context of Apache mod_rewrite rules) limit us to $1 through $9. What if we have more than nine arguments?

Solution

Use RewriteMap with a prg: map type. Such a RewriteMap program would need to loop over the entire contents of the string and process it sequentially. A simple example of such a RewriteMap follows.

In the configuration file, we might have the following:

mod_rewrite:
RewriteMap manyargs prg:/bin/splitargs
RewriteEngine On
RewriteRule ^/pets/(.*)$ /pets.php?${manyargs:$1} [PT]

while /bin/splitargs itself would be something like this:

Perl:
#!/usr/bin/perl
$|=1;
my $i=0;
my @args = split !/!, $_;
foreach my $arg (@args) {
$i++;
$return .= "&arg$i=$arg";
}
$return =~ s/^&//;
print $return;

This rule set will allow for an arbitrary number of arguments to appear on the URL line and will generate a query string with those arguments named sequentially.

Discussion

There’s no way to have a $10, for the simple reason that $10 is indistinguishable from $1 followed by a 0. Thus, it would be unclear whether we are attempting to use the variable $1 or the variable $10 in the context of a rewrite expression. So, if we want more than nine backreferences in our RewriteRule, we’re going to have to bring out the big guns and use a RewriteMap program. Within a RewriteMap, we can do whatever transforms we like without regard to a limited number of arguments.

Renaming and Reorganization

Another common use for mod_rewrite is to change the name of pages, or portions of a website, without having existing links to the site suddenly become invalid. Usually, this can be done with the Redirect directive, but in the case of a sitewide change, Redirect can become cumbersome. Also, there is often a requirement (or at least a desire) that the end user not be made aware of the change, by virtue of seeing the URL change in their browser address bar, and so mod_rewrite must be called on to make the redirection invisible.

The next few recipes show a few variations on this theme, and then we’ll move on to look at rules that deal with having the "right" server name, for some definition of "right." There are a variety of reasons for requiring a particular server name, ranging from cookies to SSL to vanity, and the rules presented here enforce that choice.

Problem:We’ve Switched from ColdFusion to PHP, but We Want All Old URLs to Continue Working

We’ve done a page-for-page migration of our website from CFM files to PHP files. The old URLs should continue to work because people have bookmarked them.

Solution

mod_rewrite:
RewriteEngine On
RewriteRule (.*)\.cfm $1.php [PT,L]

Discussion

If we actually want to have the URL change in the browser to the new nomenclature, then we should replace the PT with an R. This will result in a round-trip back to the browser, and the user will see in their address bar (if they notice) the new URL and have an opportunity to bookmark the new address in preference to the old one.

$1 will contain the entire URI, including any directory path information. Only the final file extension will be changed. Any query string information that had been passed to the original URL will also be retained, which is behavior that we would not get if we had used a simple Redirect or RedirectMatch directive.

Problem:We’re Looking in More Than One Place for a File

In the course of rearranging our website, we split our image files between two directories. It seemed like a good idea at the time, but there are two different places that files could have ended up. We want to look in both places when a file is requested.

Solution

mod_rewrite:
<Directory /usr/local/apache/htdocs/images>
RewriteEngineOn
RewriteCond %{REQUEST_FILE} !-f
RewriteRule (.*) /pictures/$1 [R,L]
</Directory>

Discussion

Frequently, when people ask questions of this nature in various support forums, such as mailing lists, IRC, or newsgroups, the answer that they receive is "You really should fix your directory structure instead." That is, of course, a good answer. However, as we all know, there are situations where it’s not quite that easy.

The -f test checks to see if the file is in the requested location. If it’s not, then we move on to the RewriteRule, and the image is requested from the other location instead. This process could be continued with another rewrite rule set in that other directory, too, if the problem was bigger than just two alternate locations.

The long-term solution is to figure out which image references are pointing the wrong place and fix them.

Problem: Some of Our Content Is on Another Server

In a slightly more involved version of the previous example, we’ve split our content between two web servers. For example, perhaps we’ve put images on one server and the rest of our content on another, in an effort to balance the load between two servers. This, too, seemed like a good idea, but some people have bookmarks and aren’t ending up on the right server.

Solution

mod_rewrite:
Rewrite failing requests to the other server:
RewriteEngine On
RewriteCond %{REQUEST_URI} !-U
RewriteRule (.*) http://other.example.com$1 [R]

Discussion

The -U test is rather time-consuming. It checks ahead of time whether the requested URI will result in a Not Found request. This includes traversing Aliases, UserDirs, and other URL mapping functions. While this takes longer than simply doing a -f check, it is also far more robust, as a -f check will only look for files within the document directory.

Problem:We Require a Canonical Hostname

Although there are several possible hostnames that can be used to reach our website, we want to require that everyone use one in particular.

Solution

mod_rewrite:
RewriteEngine On
RewriteCond %{HTTP_HOST} !^www\.example\.com$ [NC]
RewriteRule (.*) http://www.example.com$1 [R,L]

Discussion

It is extremely common for people to copy this recipe incorrectly and end up with exactly the opposite of what they want. It is therefore very important to understand what this recipe is actually saying.

The RewriteCond says, "If the requested host is NOT www.example.com . . ." The RewriteRule then says, ". . . then redirect the request to www.example.com."

So, if we are modifying this for use on our server, we’ll replace www.example.com in both the RewriteCond and the RewriteRule with the hostname that we want to force everyone to use. In the RewriteCond, the dots are escaped because it is a regular expression. In the RewriteRule, they are not, because it is a literal target string.

The [NC] on the RewriteCond allows people to use an uppercase (or mixed-case) version of the hostname. If you wish to be even more restrictive than that, you can remove the [NC] and be just as controlling as you want.

Problem:We’re Viewing the Wrong SSL Host

As you may be aware, you’re able to have only one SSL host per IP address. This causes a problem if you have multiple name-based hosts on that same IP address. Someone going to https://other.example.com/ may end up getting the content from https:// www.example.com/ and be rather confused.

We want to make sure that if someone asks for an HTTPS connection to the wrong hostname, they get sent back to the HTTP version of that hostname.

Solution

mod_rewrite:
<VirtualHost 10.7.14.9:443>
...
RewriteEngine On
RewriteCond %{HTTP_HOST} !^www\.example\.com$
RewriteRule (.*) http://%{HTTP_HOST}$1 [R,L]
...
</VirtualHost>

Discussion

Because the rewrite happens after the SSL connection has already been negotiated, the user will probably still receive the browser warning about getting the wrong SSL certificate. This will likely be a little confusing. However, we won’t end up with them at the wrong SSL host, as they’ll get redirected as soon as the SSL handshake is done. Note that this rule set goes inside the SSL virtual host. Putting it elsewhere will likely cause it to be ignored.



The Definitive Guide to Apache mod_rewrite

The preceding excerpt is from Chapter 5 of The Definitive Guide to Apache mod_rewrite by Rich Bowen.

Organizing websites is highly dynamic and often chaotic. Thus, it is crucial that host web servers manipulate URLs in order to cope with temporarily or permanently relocated resources, prevent attacks by automated worms, and control resource access.

The Apache mod_rewrite module has long inspired fits of joy because it offers an unparalleled toolset for manipulating URLs. The Definitive Guide to Apache mod_rewrite guides you through configuration and use of the module for a variety of purposes, including basic and conditional rewrites, access control, virtual host maintenance, and proxies.

This book was authored by Rich Bowen, noted Apache expert and Apache Software Foundation member, and draws on his years of experience administering, and regular speaking and writing about, the Apache server.



    1. this is one of the best tutorials online that I have seen about mod_rewrite. I learned a lot of good stuff. Thank you for posting this.



    1. Ok, in order to eliminate some network latency I was wondering how can you serve a page with mod_rewrite without having the server send back a 301 or 302 and the browser re-issue the request to the new location. This is probably the expected behaviour when not using the R flag but in my tests the server always sends back the new location telling the browser to fetch that one instead.



    1. I figured it out, using the [PT,L] flags.


    1. Mee says:

      What would you suggest in lieu of the cookie flag for those of us using apache 1.3.37?



    1. Finally!Finally someone looked at this problem with not knowing – how many parameters you can have in the row. Though I discovered that I have been to this solution very near and found your page searching for another topic. That is – parsing parameters goes well, but there is this big problem with all requests for css, js, gif, jpg you-name-it files.

      Little example:
      real request is www.example.com/jon/doe
      rewrite works great and result is www.example.com/index.php?p1=john&p2=doe
      but.. in index.php resulting html is request for css file css/style.css, but in mod_rewrite log it shows up as a request for jon/css/style.css

      I really will lose my mind with this one..



    1. Very useful, I use one to adjust URL or change permanently a link with the flag 301 redirection



    1. How to keep OLD hyperlinks properly ? When URL rewrite occurs then OLD relative hyperlinks are rewritten.

      Pls help

      Bikram
      webseos@gmail.com



    1. I have a site on Business Models:http://www.provenmodels.com</a> .. I have applied the above mentioned technology in my site




Add your comments

Please keep your comments relevant to this blog entry: inappropriate or purely promotional comments may be removed. To add hyperlink, please follow this example: "your link text":http://your.link.url