0

In my apache-access-logs I get a lot of invalid requests comming (probably) from robots.

All of the invalid urls follow the same pattern and I would like to filter them with a regex.

Here are some samples:

/oaoa/oao/oa.php
/fcfc/fcf/fc.php 
/mcmc/mcm/mc.php 
/rxrx/rxr/rx.php 
/wlwl/wlw/wl.php 
/nini/nin/ni.php 
/gigi/gig/gi.php 
/jojo/joj/jo.php 
/okok/oko/ok.php 

I can see the pattern, but I don't know how to build a (php-)regex that matches this pattern but not things like this. :-(

/help/one/xy.php
/some/oth/er.php

I hope anyone of you guys knows a solution, if it is possible at all.

georg
  • 195,833
  • 46
  • 263
  • 351
  • Welcome to SO! I've removed your signature, [please don't sign your posts](http://stackoverflow.com/help/behavior) - we know who you are! ;) – georg Jan 20 '15 at 22:31
  • Nice problem but you should really try and show what you've tried. – HamZa Jan 20 '15 at 23:25

3 Answers3

1

If this is your exact input, the following regex should do the trick

/\/(.)(.)\1\2\/\1\2\1\/\1\2\.php/

https://regex101.com/r/rU2sE6/2

georg
  • 195,833
  • 46
  • 263
  • 351
0

For these very specific cases you listed, here is a simple regex that will match them:

/([a-z])([a-z])\1\2/\1\2\1/\1\2.php

The \1 and \2 are references to the first and second groups. The forward slashes may need to be escaped. This is essentially saying match one character, then another, followed by the first character matched, then the second character matched, with a slash, etc.

Qantas 94 Heavy
  • 14,790
  • 31
  • 61
  • 78
Evan OJack
  • 567
  • 2
  • 5
0

Note: Interesting problem although you should have showed us what you've tried. Which is why I'm putting this answer as Community Wiki to not earn any reputation.

So the trick is to capture the characters in a group and then assert that it is present in the next chunk. A bit cryptic I guess but here's the regex:

^                 # Assert begin of line
(?:               # Non-capturing group
   (              # Capturing group 1
      /           # Match a forward slash
      [^/]+       # Match anything not a forward slash one or more times
   )              # End of capturing group 1
   [^/]           # Match anything not a forward slash one time
   (?=\1)         # Assert that what we've matched in group 1 is ahead of us
                  # (ie: a forward slash + the characters - the last character)
)+                # End of non-capturing group, repeat this one or more times
\1\.php           # Match what we've matched in group 1 followed by a dot and "php"
$                 # Assert end of line

Do not forget to use the m modifier and x modifier.

Online demo

Community
  • 1
  • 1
HamZa
  • 13,530
  • 11
  • 51
  • 70