14

I have written a web application which I run under a dedicated server for hosting the web application. Instances of this web application are available at different domains, and each domain has its own copy of the web application files, allowing for customization as necessary.

I'm running Apache/2.2.16 under Debian Squeeze.

I do all of the configuration under a VirtualHost directive and do not use .htaccess files.

To simplify the apache configuration, I am wanting to maintain a single Directory directive like such:

<Directory "/srv/www/*/public/">
  RewriteEngine on
  RewriteCond %{REQUEST_FILENAME} !-f
  RewriteCond %{REQUEST_FILENAME} !-d
  RewriteCond %{REQUEST_URI} !=/favicon.ico
  RewriteCond %{REQUEST_URI} !=/robots.txt
  RewriteRule ^(.+)$ /index.php?q=$1 [L,QSA]
</Directory>

However, the RewriteRule produces the wrong results because while using the wildcard Directory value, it fails to strip the per-directory prefix. Here is the output of the rewrite log:

[rid#b9832078/initial] (3) [perdir /srv/www/*/public/] applying pattern '^(.+)$' to uri '/srv/www/domain1/public/login'
[rid#b9832078/initial] (4) [perdir /srv/www/*/public/] RewriteCond: input='/srv/www/domain1/public/login' pattern='!-f' => matched
[rid#b9832078/initial] (4) [perdir /srv/www/*/public/] RewriteCond: input='/srv/www/domain1/public/login' pattern='!-d' => matched
[rid#b9832078/initial] (4) [perdir /srv/www/*/public/] RewriteCond: input='/login' pattern='!=/favicon.ico' => matched
[rid#b9832078/initial] (4) [perdir /srv/www/*/public/] RewriteCond: input='/login' pattern='!=/robots.txt' => matched
[rid#b9832078/initial] (2) [perdir /srv/www/*/public/] rewrite '/srv/www/domain1/public/login' -> '/index.php?q=/srv/www/domain1/public/login'
[rid#b9832078/initial] (3) split uri=/index.php?q=/srv/www/domain1/public/login -> uri=/index.php, args=q=/srv/www/domain1/public/login
[rid#b9832078/initial] (1) [perdir /srv/www/*/public/] internal redirect with /index.php [INTERNAL REDIRECT]
[rid#b9847440/initial/redir#1] (3) [perdir /srv/www/*/public/] applying pattern '^(.+)$' to uri '/srv/www/domain1/public/index.php'
[rid#b9847440/initial/redir#1] (4) [perdir /srv/www/*/public/] RewriteCond: input='/srv/www/domain1/public/index.php' pattern='!-f' => not-matched
[rid#b9847440/initial/redir#1] (1) [perdir /srv/www/*/public/] pass through /srv/www/domain1/public/index.php

The problem is that the RewriteRule 'uri' is the filesystem path rather than the url path, which results in the query string being incorrect: q=/srv/www/domain1/public/login

Explicitly specifying the Directory path like such:

<Directory "/srv/www/domain1/public/">
  RewriteEngine on
  RewriteCond %{REQUEST_FILENAME} !-f
  RewriteCond %{REQUEST_FILENAME} !-d
  RewriteCond %{REQUEST_URI} !=/favicon.ico
  RewriteCond %{REQUEST_URI} !=/robots.txt
  RewriteRule ^(.+)$ /index.php?q=$1 [L,QSA]
</Directory>

Works just fine, and here is the output of the rewrite log showing the correct behavior (the difference being the new first additional line providing the correct input to the rest of the rewrite resulting in the correct query string: q=login):

[rid#b9868048/initial] (3) [perdir /srv/www/domain1/public/] strip per-dir prefix: /srv/www/domain1/public/login -> login
[rid#b9868048/initial] (3) [perdir /srv/www/domain1/public/] applying pattern '^(.+)$' to uri 'login'
[rid#b9868048/initial] (4) [perdir /srv/www/domain1/public/] RewriteCond: input='/srv/www/domain1/public/login' pattern='!-f' => matched
[rid#b9868048/initial] (4) [perdir /srv/www/domain1/public/] RewriteCond: input='/srv/www/domain1/public/login' pattern='!-d' => matched
[rid#b9868048/initial] (4) [perdir /srv/www/domain1/public/] RewriteCond: input='/login' pattern='!=/favicon.ico' => matched
[rid#b9868048/initial] (4) [perdir /srv/www/domain1/public/] RewriteCond: input='/login' pattern='!=/robots.txt' => matched
[rid#b9868048/initial] (2) [perdir /srv/www/domain1/public/] rewrite 'login' -> '/index.php?q=login'
[rid#b9868048/initial] (3) split uri=/index.php?q=login -> uri=/index.php, args=q=login
[rid#b9868048/initial] (1) [perdir /srv/www/domain1/public/] internal redirect with /index.php [INTERNAL REDIRECT]
[rid#b987d5f8/initial/redir#1] (3) [perdir /srv/www/domain1/public/] strip per-dir prefix: /srv/www/domain1/public/index.php -> index.php
[rid#b987d5f8/initial/redir#1] (3) [perdir /srv/www/domain1/public/] applying pattern '^(.+)$' to uri 'index.php'
[rid#b987d5f8/initial/redir#1] (4) [perdir /srv/www/domain1/public/] RewriteCond: input='/srv/www/domain1/public/index.php' pattern='!-f' => not-matched
[rid#b987d5f8/initial/redir#1] (1) [perdir /srv/www/domain1/public/] pass through /srv/www/domain1/public/index.php

I expect I'm running into a bug with Apache, but if that isn't the case, what am I doing wrong?

While I appreciate input to changing the approach to another workable solution, I'd accept an answer that solves it in the approach I've taken (eg not using .htaccess) unless it can be shown this approach is not solvable.

So is there something that has to change to the RewriteCond/Rules when used within a wildcard Directory?

Side note for the curious: For further simplification I use a single VirtualHost using VirtualDocumentRoot - however this is unrelated as this issue is replicated with using 'DocumentRoot' and testing under a single domain.

EDIT

Ok, I've revisited this based on regilero's answer and here is what occurs - moving the Rewrite, as is, out of the Directory results in a slight initial problem of the query string changing from "login" to "/login", this is fixed by modifying the RewriteRule to be: RewriteRule ^/(.+)$ /index.php?q=$1 [L,QSA] which fixes my previous "inexplicably fails" comment.

Following that, all static files fail to load, here is the rewrite log showing this problem:

[rid#b7bc7fa0/initial] (2) init rewrite engine with requested uri /login
[rid#b7bc7fa0/initial] (3) applying pattern '^/(.+)$' to uri '/login'
[rid#b7bc7fa0/initial] (4) RewriteCond: input='/login' pattern='!-f' => matched
[rid#b7bc7fa0/initial] (4) RewriteCond: input='/login' pattern='!-d' => matched
[rid#b7bc7fa0/initial] (4) RewriteCond: input='/login' pattern='!=/favicon.ico' => matched
[rid#b7bc7fa0/initial] (4) RewriteCond: input='/login' pattern='!=/robots.txt' => matched
[rid#b7bc7fa0/initial] (2) rewrite '/login' -> '/index.php?q=login'
[rid#b7bc7fa0/initial] (3) split uri=/index.php?q=login -> uri=/index.php, args=q=login
[rid#b7bc7fa0/initial] (2) local path result: /index.php
[rid#b7bc7fa0/initial] (2) prefixed with document_root to /srv/www/domain1/public/index.php
[rid#b7bc7fa0/initial] (1) go-ahead with /srv/www/domain1/public/index.php [OK]
[rid#b7be6b80/initial] (2) init rewrite engine with requested uri /static/css/common.css
[rid#b7be6b80/initial] (3) applying pattern '^/(.+)$' to uri '/static/css/common.css'
[rid#b7be6b80/initial] (4) RewriteCond: input='/static/css/common.css' pattern='!-f' => matched
[rid#b7be6b80/initial] (4) RewriteCond: input='/static/css/common.css' pattern='!-d' => matched
[rid#b7be6b80/initial] (4) RewriteCond: input='/static/css/common.css' pattern='!=/favicon.ico' => matched
[rid#b7be6b80/initial] (4) RewriteCond: input='/static/css/common.css' pattern='!=/robots.txt' => matched
[rid#b7be6b80/initial] (2) rewrite '/static/css/common.css' -> '/index.php?q=static/css/common.css'
[rid#b7be6b80/initial] (3) split uri=/index.php?q=static/css/common.css -> uri=/index.php, args=q=static/css/common.css
[rid#b7be6b80/initial] (2) local path result: /index.php
[rid#b7be6b80/initial] (2) prefixed with document_root to /srv/www/domain1/public/index.php
[rid#b7be6b80/initial] (1) go-ahead with /srv/www/domain1/public/index.php [OK]

But like I said in my comment to regilero's answer, this is solved by prefixing the RewriteCond directives TestString with %{DOCUMENT_ROOT}. However, using %{DOCUMENT_ROOT} does not work when using VirtualDocumentRoot.

It does not seem right to me that the %{DOCUMENT_ROOT} prefix should be necessary.

EDIT

REQUEST_FILENAME

The full local filesystem path to the file or script matching the request, if this has already been determined by the server at the time REQUEST_FILENAME is referenced. Otherwise, such as when used in virtual host context, the same value as REQUEST_URI.

which explains the need for the DOCUMENT_ROOT prefix.

I've updated the rewrite rules to this:

RewriteCond %{REQUEST_URI} !=/favicon.ico
RewriteCond %{REQUEST_URI} !=/robots.txt
RewriteCond %{REQUEST_URI} !^/static/
RewriteRule ^/(.+)$ /index.php?q=$1 [PT,L,QSA]

Which works ok (Note: the PT flag is necessary to avoid prematurely translating the url path to a file system path when using VirutalDocumentRoot). The main change in behavior here is that a RewriteCond will be necessary for all entry points into the application - similar to the /static line.

EDIT

Here is my final incarnation of Rewrite directives in the VirtualHost outside of any Directory directives:

RewriteEngine on
RewriteCond %{REQUEST_URI} !^/static/
RewriteCond %{REQUEST_URI} !=/favicon.ico
RewriteCond %{REQUEST_URI} !=/robots.txt
RewriteRule ^/(.+)$ /index.php?q=$1 [NS,PT,L,QSA]
RewriteRule ^/$ /index.php [NS,PT,L,QSA]

I've added the NS flag to avoid an extra internal evaluation and added the second RewriteRule directive in favor of using mod_dir and DirectoryIndex. My application expects no q= parameter for the root url, else a single RewriteRule of RewriteRule ^/(.*)$ /index.php?q=$1 [NS,PT,L,QSA] would be sufficient if the application was updated to accept an empty q= parameter for the root url. I may do that in the future.

chris
  • 811
  • 1
  • 8
  • 13

1 Answers1

10

Very nice and detailled question.

You have quite certainly hit a bug, or at least an undocumented rewriteRule domain. Documentation states that:

  • The rewrite engine may be used in .htaccess files and in sections, with some additional complexity.
  • To enable the rewrite engine in this context, you need to set "RewriteEngine On" and "Options FollowSymLinks" must be enabled. If your administrator has disabled override of FollowSymLinks for a user's directory, then you cannot use the rewrite engine. This restriction is required for security reasons.
  • When using the rewrite engine in .htaccess files the per-directory prefix (which always is the same for a specific directory) is automatically removed for the RewriteRule pattern matching and automatically added after any relative (not starting with a slash or protocol name) substitution encounters the end of a rule set. See the RewriteBase directive for more information regarding what prefix will be added back to relative substutions.

So no mention of the fact <Directory> instruction with wildcards won't be able to strip the per-directory prefix. And playing with RewriteBase won't help you, it's done to rebuild final Url not alter the perdir work.

But as you can see on the start there's the "with some additional complexity" sentence. Directory manipulations done by mod-rewrite are slower and more complex than general out-of-directory RewriteRules. This is stated as well in this documentation, mainly because of the perdir strip manipulation. And this means you can also write your rewriteRule out of the <Directory> section, in your VirtualHost.

  • it will be faster
  • it will not be hit by this bug
  • it may have some side effects if some non-existing files should'nt be mapped to your index.php?q=$1 rule in some other directories. But I'm quite sure this is not a problem in your case.

So simply write (without the wildcard directory):

RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !=/favicon.ico
RewriteCond %{REQUEST_URI} !=/robots.txt
RewriteRule ^(.+)$ /index.php?q=$1 [L,QSA]

And it should work, let me known if this leads to new problems.

Edit:

Ok, forogot the fact REQUEST_FILENAME is not yet complelty defined in VirtualHost context, it's documented, it's 'normal', when the condition is applied the file search on the real path is not done yet, this is why you must add the document root. So in fact your final solution should be :

RewriteEngine on
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_FILENAME} !-f
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !=/favicon.ico
RewriteCond %{REQUEST_URI} !=/robots.txt
RewriteRule ^/(.+)$ /index.php?q=$1 [L,QSA]

I tried a second one, avoiding DOCUMENT_ROOT, by using late evaluation of REQUEST_FILENAME ( %{LA-U:REQUEST_FILENAME} contains the final path, which is in fact the full path to index.php in case of non existent files), but the only way I got it working is by adding a second Rule and a Or condition in the second, less simple, so the first solution is certainly better (KISS).

  RewriteCond %{LA-U:REQUEST_FILENAME} !-f [OR]
  RewriteCond %{LA-U:REQUEST_FILENAME} !/index.php
  RewriteCond %{LA-U:REQUEST_FILENAME} !-d
  RewriteCond %{REQUEST_URI} !=/favicon.ico
  RewriteCond %{REQUEST_URI} !=/robots.txt
  RewriteRule ^/(.+)$ /index.php?q=$1 [L,QSA]

  RewriteCond %{LA-U:REQUEST_FILENAME} /index.php
  RewriteRule ^/(.+)$ /index.php?q=$1 [L,QSA]
regilero
  • 27,883
  • 6
  • 54
  • 94
  • It doesn't, I have previously tried, and it requires prefixing the RewriteCond with %{DOCUMENT_ROOT} - but even then, it inexplicably fails. I'll rerun it shortly with that setup to give the rewrite log output. Also, using %{DOCUMENT_ROOT} does not work in Apache2.2 with VirtualDocumentRoot - Although that has recently been fixed a few days ago for Apache2.3 – chris Jun 15 '11 at 16:21
  • So instead of VirtualDocumentRoot method you could try mod_macro, all bugs introdcued in mod_vhost_alias aren't there with virtualhosts defined by mod_macro (but you'll need one line of conf per new user). – regilero Jun 15 '11 at 16:42
  • I've updated my question with the results of moving the rewrite directives outside of the Directory directive. – chris Jun 15 '11 at 16:53
  • mod_macro looks interesting - but I'm hesitant to pull in additional 3rd party modules at this stage - I'll come back to it if necessary. – chris Jun 15 '11 at 17:02
  • Accepting the answer as I think your statement of "You have quite certainly hit a bug, or at least an undocumented rewriteRule domain." answers the question as well as your alternate approach meets my need. – chris Jun 15 '11 at 19:00
  • @chris: adited with your solution and with some tests on late biding – regilero Jun 16 '11 at 12:13