20

I wish to redirect all URLs with underscores to their dashed equivalent.

E.g. /nederland/amsterdam/car_rental becomes /nederland/amsterdam/car-rental. For this I'm using the technique described here: How to replace underscore to dash with Nginx. So my location block is matched to:

location ~ (_) 

But I only want to do this on URLs not in the /admin namespace. To accomplish this I tried combining the regex with a negative lookup: Regular expression to match a line that doesn't contain a word?. The location now matches with:

(?=^(?!\/admin))(?=([^_]*))

Rubular reports the string /nederland/amsterdam/car_rental to match the regex, while /admin/stats_dashboard is not matched, just as I want it. However when I apply this rule to the nginx config, the site ends up in redirect loops. Is there anything I've overlooked?

UPDATE: I don't actually want to rewrite anything in the /admin namespace. The underscore-to-dash rewrite should only take place on all URLs not in the /admin namespace.

Community
  • 1
  • 1
richard
  • 12,450
  • 7
  • 33
  • 38

3 Answers3

12

The Nginx location matching order is such that locations defined using regular expressions are checked in the order of their appearance in the configuration file and the search of regular expressions terminates on the first match.

With this knowledge, in your shoes, I will simply define one location using a regular expression for "admin" above that for the underscores you got from the Stack Overflow Answer you linked to.

location ~ (\badmin\b) {
    # Config to process urls containing "admin"
}
location ~ (_) {
    # Config to process urls containing "_"
}

Any request with admin in it will be processed by the first location block no matter whether it has an underscore or not because the matching location block appears before that for the underscores.

** PS **

As another answer posted by cnst a couple of days after mine shows, the link to the documentation on the location matching order I posted also indicates that you may also use the ^~modifier to match the /admin folder and skip the location block for the underscores.

I personally tend not to use this modifier and prefer to band regex based locations together with annotated comments but it is certainly an option.

However, you will need to be careful, depending on your setup, as requests starting with "/admin", but longer, may be matching with the modifier and lead to unexpected results.

As said, I prefer my regex based approach safe in the knowledge that no one will start to arbitrarily change the order of things in the config file without a clear understanding.

Community
  • 1
  • 1
Dayo
  • 11,009
  • 5
  • 44
  • 65
  • 2
    I agree this is a better approach than try to use only one expression. – Gustavo Straube Aug 13 '15 at 16:32
  • I was just going to answer this, but you were faster. =) Upvoted. – Rafael Beckel Aug 19 '15 at 04:43
  • String prefix matching (as in the other answer) is faster than regex matching — there's nothing in the question to signal that `/admin` is not a `$uri` prefix string (such an idea was instead volunteered in the earliest answer that has since been deleted), as such, this answer is not optimal. Besides, having to needlessly rely on the order of the `location` directives is a sure way to have obscure Heisenbugs down the line when someone decides to re-arrange the code months or years from now, forgetting the nuance that the order of the location directives does matter. – cnst Aug 19 '15 at 22:59
  • @Dayo in the `/admin` namespace I do not want to process anything. I want to replace underscores to hypens on all URLS **not** in the `/admin` namespace. With this setup requests to `/admin/test_underscore` still seem to be processed by the "_" block. – richard Aug 22 '15 at 15:22
  • That should not be the case if the config was applied as was given. – Dayo Aug 22 '15 at 19:12
  • @Dayo The `\badmin\b` location block is empty since I don't process anything there. When I try to access that location, a 404 Not Found is thrown. Is there anything I should declare within the location block? – richard Aug 27 '15 at 14:56
  • Post your config to pastebin or similar and add a link here – Dayo Aug 27 '15 at 17:05
  • @richard Just as you wrote the config to process requests with underscores, you have to do the same for the "admin" ones. You can't just leave it empty. How to do this is however a totally different question requiring a whole different set of background information to be answered. In summary, you have now moved on to a totally different subject & it is not a simple clarification to be dealt with in the comments section. Your follow on question should be just that, a separate follow on question as the question remains regardless of which the two solutions you apply. – Dayo Aug 28 '15 at 07:00
7
^(?!\/admin\b).*

You just need this simple regex with lookahead.See demo.

https://regex101.com/r/uF4oY4/16

Your regex will fail /nederland/amsterdam/car_rental too as it has _.So only the string /nederland/amsterdam/car will be considered.

or

you can use

rewrite ^(?!\/admin\b)([^_]*)_(.*)$ $1-$2;
vks
  • 63,206
  • 9
  • 78
  • 110
  • This sample code is not efficient, because you'll have to go through `$uri` multiple times to get rid of all the underscores. – cnst Aug 19 '15 at 21:33
6

You've not explicitly mentioned one way or the other, but it does appear that you likely only have a single /admin namespace, which forms the prefix of $uri and would match a ^/admin.*$ regex; let me provide two non-conflicting configuration options based on such an assumption.


As others suggested, you might want to use a separate location for /admin.

However, unlike the other answer, I would advise you to define it by a prefix string, and use the ^~ modifier to not check the regular expressions after a successful match.

location ^~ /admin {
}

Alternatively, or even additionally for an extra peace of mind and a fool-proof approach, instead of using what appears to be a non-POSIX regular expression from the linked answer (if my reading of re_format(7) on OpenBSD is to be believed), consider something that's much simpler, guaranteed to be understood by most people who'd claim they know what REs are, and work everywhere, not to mention likely be more efficient, considering that you already know that it's the ^/admin.* path that you want to exclude:

location ~ ^/[^a][^d][^m][^i][^n].*_.* {
}

To accomplish your goal, you could use either one of these two solutions, or even both to be more rigid and fool-proof.

cnst
  • 21,785
  • 2
  • 73
  • 108
  • You're right about the single `/admin` namespace. While your solution seems good, it doesn't work on the path `/amsterdam/test`. Apparently because it starts with an "a". – richard Aug 21 '15 at 17:36