0

As from here and here, I was under the impression that bots don't set referer url.

But i just discovered otherwise, unless the situations are different. We have this javascript call:

<script>aCallToToWebApiEndAndUpdateDom(params)</script> 

and from the api end, we create some user session to log views, and along we also log the userAgent and urlReferrer. oh boy, i just found a record with the following:

url referer: the actual page visited
user agent : Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

Am i missing one or two points? Is this the normal behavior? and if I want to log only human visits, it is the case of detecting them bots manually instead of checking for empty referer.

waitforit
  • 2,501
  • 3
  • 25
  • 54
  • 1
    Normal users can block their referrer too. Or can inject some malicious stuff, you cannot rely on that at all. – emix Mar 29 '18 at 08:08
  • thats good to know. thanks – waitforit Mar 29 '18 at 08:09
  • Most of the request headers can be set to almost anything by the bot, you can never really rely on any information in the headers, unless you correlate it with something server side :) – scagood Mar 29 '18 at 08:34
  • @scagood how would you correlate the headers server side? userAgent for example. – waitforit Mar 29 '18 at 08:42
  • 1
    I mean things like the [Authorization Header](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Authorization) as you can actively check that's valid – scagood Mar 29 '18 at 08:43
  • I'm not sure I get that. here i'm talking of all visits (authenticated or not) or to put it in short, authentication is not a requirement here. – waitforit Mar 29 '18 at 08:51
  • I'm just saying you cannot trust anything in the request directly. :D – scagood Mar 29 '18 at 08:57
  • I would imagine that bots can do whatever they choose. There might be some convention, but nothing which says it's guaranteed. – ADyson Mar 29 '18 at 09:04
  • 1
    A bot has no requirement to adhere to standards or best practices beyond those that allow it to function. Some maybe advanced enough to interpret JavaScript. Others maybe so simplified they leave out headers altogether. Any attempts to differentiate client types should be treated as a rough guide only because the only source of information you have to base your numbers on can be forged. – Gary Ott Mar 29 '18 at 09:09

0 Answers0