138

I'm building an analytic tool and I can currently get the user's IP address, browser and operating system from their user agent.

I'm wondering if there is a possibility to detect the same user without using cookies or local storage? I'm not expecting code examples here; just a simple hint of where to look further.

Forgot to mention that it would need to be cross-browser compatible if it's the same computer/device. Basically I'm after device recognition not really the user.

Konrad Borowski
  • 9,885
  • 2
  • 50
  • 68
slash197
  • 8,782
  • 6
  • 38
  • 68
  • 6
    Not really - at least not any way that you could rely upon to be accurate. Maybe a hash of all three combined, however if more than one person in a house uses the same browser and OS, it still wouldn't work. Also, most ISP's provide Dynamic IP addresses, meaning they change every-so-often and won't be able to be relied upon for identification purposes either. – Jon Apr 12 '13 at 08:47
  • Why would you not want to you use sessions? – Man Vs Code Apr 18 '13 at 20:31
  • @ManVsCode I don't see how sessions can help me here. – slash197 Apr 18 '13 at 20:43
  • 2
    Then you don't know what sessions are. Your use case is exactly what sessions were designed for. Sessions have nothing to do with logging in or authentication. Your web server will tell a client to send a cookie with a session identifier. You identify that client using the session id they send you. – Man Vs Code Apr 18 '13 at 20:50
  • 1
    @ManVsCode yes, you are right. but when the user closes his browser and opens it again or after the session expires he will get a new ID so it will come up as a new user which is not helpful at all. – slash197 Apr 18 '13 at 20:57
  • 1
    You have to set the expiration every time the cookie is written in order to have the browser persist it. If your cookie expires, then the browser deletes it. – Man Vs Code Apr 18 '13 at 20:59
  • 1
    @ManVsCode which part of I can't use cookies did you not understand from the title? – slash197 Apr 18 '13 at 21:01
  • 1
    Why can't you use cookies? Please explain this. If you are going to be rude, then no one will help you. – Man Vs Code Apr 18 '13 at 21:02
  • 1
    @ManVsCode One legitimate issue with using cookies and sessions is that different browsers on the same machine will have different sessions. Slash mentioned that he wants to identify machines rather than browsers. If he has some particularly important reason for this, sessions may not do the job. – user1618143 Apr 18 '13 at 21:12
  • There is no bullet-proof way to do this. You are creating the cookie. You can re-issue the same cookie to the same IP address. A user is not likely to change their IP address but it can easily be defeated by a technically savy user. – Man Vs Code Apr 18 '13 at 21:15
  • 1
    @ManVsCode Agreed. The question as framed has no perfect solution. The best solution is probably to drop the "identify specific devices" requirement and use cookies, but there may be some compelling reason not to. Other approaches are inevitably going to be at least somewhat unreliable. (And if there is a good solution, it represents a significant privacy concern.) – user1618143 Apr 18 '13 at 21:31
  • 4
    Cookies would still work ? Why are you avoiding using cookies ? – Baba Apr 18 '13 at 22:20
  • 1
    what i want to know, how google does? i delete cookies, ip changed, browser changed and my machine still recognized ! @slash197 i understand your request i have been struggling for a long time without finding any solution to this. you have to force cookies with far futur dates. – Mbarry Apr 19 '13 at 00:20
  • @Mbarry even if all that is deleted the user is accessing a site .. and mostly like session cookies would be re added .... There is no single method of achieving this ... you need to combine so many probabilities – Baba Apr 20 '13 at 12:18
  • 2
    It's really simple and I use it all the time, ask the user to enter a username and a password!!! – Amit Kriplani Apr 23 '13 at 10:03
  • 1
    I'm afraid for what you're asking there's simply no solution that is going to offer consistently accurate results with the constraints you have. As such the only solution is a probabilistic solution as per Baba's answer below. I know his answer seems to not be what you're looking for but you honestly have only two choices - accept an imperfect probabilistic solution, or find a way to reduce your constraints to allow say, cookies. The probabilistic option is what many sites use to offer suggestions to unidentified users but only to offer suggestions, and not to treat as identifying information. – Xefan Apr 23 '13 at 15:13
  • 2
    Here is a minimal javascript solution (non cross-browser in this case): https://github.com/carlo/jquery-browser-fingerprint/ I mention it, because it brought me to the notion that many plugins are installed cross-browser by default, without any choice on the user's part. Sorting those out carefully (which isn't a small task, but still...) could potentially lead to a tangible browser-agnostic property of a larger device-based fingerprint. – hexalys Apr 24 '13 at 23:26
  • A simple question: How do you know if a leaf came from this tree when there are a thousand similar trees? – Alvin K. Apr 25 '13 at 01:52
  • Simple just test for `DNA` you would get the `tree` – Baba Apr 25 '13 at 09:40
  • 1
    My answer: Ask the leaf (which is synonymous to @AmitKriplani above) - here is the [moral](http://cs.txstate.edu/~br02/cs1428/ShortStoryForEngineers.htm) aka importance of thinking simple – Alvin K. May 27 '13 at 09:13

12 Answers12

403

Introduction

If I understand you correctly, you need to identify a user for whom you don't have a Unique Identifier, so you want to figure out who they are by matching Random Data. You can't store the user's identity reliably because:

  • Cookies Can be deleted
  • IP address Can change
  • Browser Can Change
  • Browser Cache may be deleted

A Java Applet or Com Object would have been an easy solution using a hash of hardware information, but these days people are so security-aware that it would be difficult to get people to install these kinds of programs on their system. This leaves you stuck with using Cookies and other, similar tools.

Cookies and other, similar tools

You might consider building a Data Profile, then using Probability tests to identify a Probable User. A profile useful for this can be generated by some combination of the following:

  1. IP Address
    • Real IP Address
    • Proxy IP Address (users often use the same proxy repeatedly)
  2. Cookies
  3. Web Bugs (less reliable because bugs get fixed, but still useful)
    • PDF Bug
    • Flash Bug
    • Java Bug
  4. Browsers
  5. HTML5 & Javascript
    • HTML5 LocalStorage
    • HTML5 Geolocation API and Reverse Geocoding
    • Architecture, OS Language, System Time, Screen Resolution, etc.
    • Network Information API
    • Battery Status API

The items I listed are, of course, just a few possible ways a user can be identified uniquely. There are many more.

With this set of Random Data elements to build a Data Profile from, what's next?

The next step is to develop some Fuzzy Logic, or, better yet, an Artificial Neural Network (which uses fuzzy logic). In either case, the idea is to train your system, and then combine its training with Bayesian Inference to increase the accuracy of your results.

Artificial Neural Network

The NeuralMesh library for PHP allows you to generate Artificial Neural Networks. To implement Bayesian Inference, check out the following links:

At this point, you may be thinking:

Why so much Math and Logic for a seemingly simple task?

Basically, because it is not a simple task. What you are trying to achieve is, in fact, Pure Probability. For example, given the following known users:

User1 = A + B + C + D + G + K
User2 = C + D + I + J + K + F

When you receive the following data:

B + C + E + G + F + K

The question which you are essentially asking is:

What is the probability that the received data (B + C + E + G + F + K) is actually User1 or User2? And which of those two matches is most probable?

In order to effectively answer this question, you need to understand Frequency vs Probability Format and why Joint Probability might be a better approach. The details are too much to get into here (which is why I'm giving you links), but a good example would be a Medical Diagnosis Wizard Application, which uses a combination of symptoms to identify possible diseases.

Think for a moment of the series of data points which comprise your Data Profile (B + C + E + G + F + K in the example above) as Symptoms, and Unknown Users as Diseases. By identifying the disease, you can further identify an appropriate treatment (treat this user as User1).

Obviously, a Disease for which we have identified more than 1 Symptom is easier to identify. In fact, the more Symptoms we can identify, the easier and more accurate our diagnosis is almost certain to be.

Are there any other alternatives?

Of course. As an alternative measure, you might create your own simple scoring algorithm, and base it on exact matches. This is not as efficient as probability, but may be simpler for you to implement.

As an example, consider this simple score chart:

+-------------------------+--------+------------+
|        Property         | Weight | Importance |
+-------------------------+--------+------------+
| Real IP address         |     60 |          5 |
| Used proxy IP address   |     40 |          4 |
| HTTP Cookies            |     80 |          8 |
| Session Cookies         |     80 |          6 |
| 3rd Party Cookies       |     60 |          4 |
| Flash Cookies           |     90 |          7 |
| PDF Bug                 |     20 |          1 |
| Flash Bug               |     20 |          1 |
| Java Bug                |     20 |          1 |
| Frequent Pages          |     40 |          1 |
| Browsers Finger Print   |     35 |          2 |
| Installed Plugins       |     25 |          1 |
| Cached Images           |     40 |          3 |
| URL                     |     60 |          4 |
| System Fonts Detection  |     70 |          4 |
| Localstorage            |     90 |          8 |
| Geolocation             |     70 |          6 |
| AOLTR                   |     70 |          4 |
| Network Information API |     40 |          3 |
| Battery Status API      |     20 |          1 |
+-------------------------+--------+------------+

For each piece of information which you can gather on a given request, award the associated score, then use Importance to resolve conflicts when scores are the same.

Proof of Concept

For a simple proof of concept, please take a look at Perceptron. Perceptron is a RNA Model that is generally used in pattern recognition applications. There is even an old PHP Class which implements it perfectly, but you would likely need to modify it for your purposes.

Despite being a great tool, Perceptron can still return multiple results (possible matches), so using a Score and Difference comparison is still useful to identify the best of those matches.

Assumptions

  • Store all possible information about each user (IP, cookies, etc.)
  • Where result is an exact match, increase score by 1
  • Where result is not an exact match, decrease score by 1

Expectation

  1. Generate RNA labels
  2. Generate random users emulating a database
  3. Generate a single Unknown user
  4. Generate Unknown user RNA and Values
  5. The system will merge RNA information and teach the Perceptron
  6. After training the Perceptron, the system will have a set of weightings
  7. You can now test the Unknown user's pattern and the Perceptron will produce a result set.
  8. Store all Positive matches
  9. Sort the matches first by Score, then by Difference (as described above)
  10. Output the two closest matches, or, if no matches are found, output empty results

Code for Proof of Concept

$features = array(
    'Real IP address' => .5,
    'Used proxy IP address' => .4,
    'HTTP Cookies' => .9,
    'Session Cookies' => .6,
    '3rd Party Cookies' => .6,
    'Flash Cookies' => .7,
    'PDF Bug' => .2,
    'Flash Bug' => .2,
    'Java Bug' => .2,
    'Frequent Pages' => .3,
    'Browsers Finger Print' => .3,
    'Installed Plugins' => .2,
    'URL' => .5,
    'Cached PNG' => .4,
    'System Fonts Detection' => .6,
    'Localstorage' => .8,
    'Geolocation' => .6,
    'AOLTR' => .4,
    'Network Information API' => .3,
    'Battery Status API' => .2
);

// Get RNA Lables
$labels = array();
$n = 1;
foreach ($features as $k => $v) {
    $labels[$k] = "x" . $n;
    $n ++;
}

// Create Users
$users = array();
for($i = 0, $name = "A"; $i < 5; $i ++, $name ++) {
    $users[] = new Profile($name, $features);
}

// Generate Unknown User
$unknown = new Profile("Unknown", $features);

// Generate Unknown RNA
$unknownRNA = array(
    0 => array("o" => 1),
    1 => array("o" => - 1)
);

// Create RNA Values
foreach ($unknown->data as $item => $point) {
    $unknownRNA[0][$labels[$item]] = $point;
    $unknownRNA[1][$labels[$item]] = (- 1 * $point);
}

// Start Perception Class
$perceptron = new Perceptron();

// Train Results
$trainResult = $perceptron->train($unknownRNA, 1, 1);

// Find matches
foreach ($users as $name => &$profile) {
    // Use shorter labels
    $data = array_combine($labels, $profile->data);
    if ($perceptron->testCase($data, $trainResult) == true) {
        $score = $diff = 0;

        // Determing the score and diffrennce
        foreach ($unknown->data as $item => $found) {
            if ($unknown->data[$item] === $profile->data[$item]) {
                if ($profile->data[$item] > 0) {
                    $score += $features[$item];
                } else {
                    $diff += $features[$item];
                }
            }
        }
        // Ser score and diff
        $profile->setScore($score, $diff);
        $matchs[] = $profile;
    }
}

// Sort bases on score and Output
if (count($matchs) > 1) {
    usort($matchs, function ($a, $b) {
        // If score is the same use diffrence
        if ($a->score == $b->score) {
            // Lower the diffrence the better
            return $a->diff == $b->diff ? 0 : ($a->diff > $b->diff ? 1 : - 1);
        }
        // The higher the score the better
        return $a->score > $b->score ? - 1 : 1;
    });

    echo "<br />Possible Match ", implode(",", array_slice(array_map(function ($v) {
        return sprintf(" %s (%0.4f|%0.4f) ", $v->name, $v->score,$v->diff);
    }, $matchs), 0, 2));
} else {
    echo "<br />No match Found ";
}

Output:

Possible Match D (0.7416|0.16853),C (0.5393|0.2809)

Print_r of "D":

echo "<pre>";
print_r($matchs[0]);


Profile Object(
    [name] => D
    [data] => Array (
        [Real IP address] => -1
        [Used proxy IP address] => -1
        [HTTP Cookies] => 1
        [Session Cookies] => 1
        [3rd Party Cookies] => 1
        [Flash Cookies] => 1
        [PDF Bug] => 1
        [Flash Bug] => 1
        [Java Bug] => -1
        [Frequent Pages] => 1
        [Browsers Finger Print] => -1
        [Installed Plugins] => 1
        [URL] => -1
        [Cached PNG] => 1
        [System Fonts Detection] => 1
        [Localstorage] => -1
        [Geolocation] => -1
        [AOLTR] => 1
        [Network Information API] => -1
        [Battery Status API] => -1
    )
    [score] => 0.74157303370787
    [diff] => 0.1685393258427
    [base] => 8.9
)

If Debug = true you would be able to see Input (Sensor & Desired), Initial Weights, Output (Sensor, Sum, Network), Error, Correction and Final Weights.

+----+----+----+----+----+----+----+----+----+----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+-----+----+---------+---------+---------+---------+---------+---------+---------+---------+---------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----+----+----+----+----+----+----+----+----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----------+
| o  | x1 | x2 | x3 | x4 | x5 | x6 | x7 | x8 | x9 | x10 | x11 | x12 | x13 | x14 | x15 | x16 | x17 | x18 | x19 | x20 | Bias | Yin | Y  | deltaW1 | deltaW2 | deltaW3 | deltaW4 | deltaW5 | deltaW6 | deltaW7 | deltaW8 | deltaW9 | deltaW10 | deltaW11 | deltaW12 | deltaW13 | deltaW14 | deltaW15 | deltaW16 | deltaW17 | deltaW18 | deltaW19 | deltaW20 | W1 | W2 | W3 | W4 | W5 | W6 | W7 | W8 | W9 | W10 | W11 | W12 | W13 | W14 | W15 | W16 | W17 | W18 | W19 | W20 | deltaBias |
+----+----+----+----+----+----+----+----+----+----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+-----+----+---------+---------+---------+---------+---------+---------+---------+---------+---------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----+----+----+----+----+----+----+----+----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----------+
| 1  | 1  | -1 | -1 | -1 | -1 | -1 | -1 | 1  | 1  | 1   | 1   | 1   | 1   | 1   | -1  | -1  | -1  | -1  | 1   | 1   | 1    | 0   | -1 | 0       | -1      | -1      | -1      | -1      | -1      | -1      | 1       | 1       | 1        | 1        | 1        | 1        | 1        | -1       | -1       | -1       | -1       | 1        | 1        | 0  | -1 | -1 | -1 | -1 | -1 | -1 | 1  | 1  | 1   | 1   | 1   | 1   | 1   | -1  | -1  | -1  | -1  | 1   | 1   | 1         |
| -1 | -1 | 1  | 1  | 1  | 1  | 1  | 1  | -1 | -1 | -1  | -1  | -1  | -1  | -1  | 1   | 1   | 1   | 1   | -1  | -1  | 1    | -19 | -1 | 0       | 0       | 0       | 0       | 0       | 0       | 0       | 0       | 0       | 0        | 0        | 0        | 0        | 0        | 0        | 0        | 0        | 0        | 0        | 0        | 0  | -1 | -1 | -1 | -1 | -1 | -1 | 1  | 1  | 1   | 1   | 1   | 1   | 1   | -1  | -1  | -1  | -1  | 1   | 1   | 1         |
| -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --  | --  | --  | --  | --  | --  | --  | --  | --  | --  | --  | --   | --  | -- | --      | --      | --      | --      | --      | --      | --      | --      | --      | --       | --       | --       | --       | --       | --       | --       | --       | --       | --       | --       | -- | -- | -- | -- | -- | -- | -- | -- | -- | --  | --  | --  | --  | --  | --  | --  | --  | --  | --  | --  | --        |
| 1  | 1  | -1 | -1 | -1 | -1 | -1 | -1 | 1  | 1  | 1   | 1   | 1   | 1   | 1   | -1  | -1  | -1  | -1  | 1   | 1   | 1    | 19  | 1  | 0       | 0       | 0       | 0       | 0       | 0       | 0       | 0       | 0       | 0        | 0        | 0        | 0        | 0        | 0        | 0        | 0        | 0        | 0        | 0        | 0  | -1 | -1 | -1 | -1 | -1 | -1 | 1  | 1  | 1   | 1   | 1   | 1   | 1   | -1  | -1  | -1  | -1  | 1   | 1   | 1         |
| -1 | -1 | 1  | 1  | 1  | 1  | 1  | 1  | -1 | -1 | -1  | -1  | -1  | -1  | -1  | 1   | 1   | 1   | 1   | -1  | -1  | 1    | -19 | -1 | 0       | 0       | 0       | 0       | 0       | 0       | 0       | 0       | 0       | 0        | 0        | 0        | 0        | 0        | 0        | 0        | 0        | 0        | 0        | 0        | 0  | -1 | -1 | -1 | -1 | -1 | -1 | 1  | 1  | 1   | 1   | 1   | 1   | 1   | -1  | -1  | -1  | -1  | 1   | 1   | 1         |
| -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --  | --  | --  | --  | --  | --  | --  | --  | --  | --  | --  | --   | --  | -- | --      | --      | --      | --      | --      | --      | --      | --      | --      | --       | --       | --       | --       | --       | --       | --       | --       | --       | --       | --       | -- | -- | -- | -- | -- | -- | -- | -- | -- | --  | --  | --  | --  | --  | --  | --  | --  | --  | --  | --  | --        |
+----+----+----+----+----+----+----+----+----+----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+-----+----+---------+---------+---------+---------+---------+---------+---------+---------+---------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----+----+----+----+----+----+----+----+----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----------+

x1 to x20 represent the features converted by the code.

// Get RNA Labels
$labels = array();
$n = 1;
foreach ( $features as $k => $v ) {
    $labels[$k] = "x" . $n;
    $n ++;
}

Here is an online demo

Class Used:

class Profile {
    public $name, $data = array(), $score, $diff, $base;

    function __construct($name, array $importance) {
        $values = array(-1, 1); // Perception values
        $this->name = $name;
        foreach ($importance as $item => $point) {
            // Generate Random true/false for real Items
            $this->data[$item] = $values[mt_rand(0, 1)];
        }
        $this->base = array_sum($importance);
    }

    public function setScore($score, $diff) {
        $this->score = $score / $this->base;
        $this->diff = $diff / $this->base;
    }
}

Modified Perceptron Class

class Perceptron {
    private $w = array();
    private $dw = array();
    public $debug = false;

    private function initialize($colums) {
        // Initialize perceptron vars
        for($i = 1; $i <= $colums; $i ++) {
            // weighting vars
            $this->w[$i] = 0;
            $this->dw[$i] = 0;
        }
    }

    function train($input, $alpha, $teta) {
        $colums = count($input[0]) - 1;
        $weightCache = array_fill(1, $colums, 0);
        $checkpoints = array();
        $keepTrainning = true;

        // Initialize RNA vars
        $this->initialize(count($input[0]) - 1);
        $just_started = true;
        $totalRun = 0;
        $yin = 0;

        // Trains RNA until it gets stable
        while ($keepTrainning == true) {
            // Sweeps each row of the input subject
            foreach ($input as $row_counter => $row_data) {
                // Finds out the number of columns the input has
                $n_columns = count($row_data) - 1;

                // Calculates Yin
                $yin = 0;
                for($i = 1; $i <= $n_columns; $i ++) {
                    $yin += $row_data["x" . $i] * $weightCache[$i];
                }

                // Calculates Real Output
                $Y = ($yin <= 1) ? - 1 : 1;

                // Sweeps columns ...
                $checkpoints[$row_counter] = 0;
                for($i = 1; $i <= $n_columns; $i ++) {
                    /** DELTAS **/
                    // Is it the first row?
                    if ($just_started == true) {
                        $this->dw[$i] = $weightCache[$i];
                        $just_started = false;
                        // Found desired output?
                    } elseif ($Y == $row_data["o"]) {
                        $this->dw[$i] = 0;
                        // Calculates Delta Ws
                    } else {
                        $this->dw[$i] = $row_data["x" . $i] * $row_data["o"];
                    }

                    /** WEIGHTS **/
                    // Calculate Weights
                    $this->w[$i] = $this->dw[$i] + $weightCache[$i];
                    $weightCache[$i] = $this->w[$i];

                    /** CHECK-POINT **/
                    $checkpoints[$row_counter] += $this->w[$i];
                } // END - for

                foreach ($this->w as $index => $w_item) {
                    $debug_w["W" . $index] = $w_item;
                    $debug_dw["deltaW" . $index] = $this->dw[$index];
                }

                // Special for script debugging
                $debug_vars[] = array_merge($row_data, array(
                    "Bias" => 1,
                    "Yin" => $yin,
                    "Y" => $Y
                ), $debug_dw, $debug_w, array(
                    "deltaBias" => 1
                ));
            } // END - foreach

            // Special for script debugging
             $empty_data_row = array();
            for($i = 1; $i <= $n_columns; $i ++) {
                $empty_data_row["x" . $i] = "--";
                $empty_data_row["W" . $i] = "--";
                $empty_data_row["deltaW" . $i] = "--";
            }
            $debug_vars[] = array_merge($empty_data_row, array(
                "o" => "--",
                "Bias" => "--",
                "Yin" => "--",
                "Y" => "--",
                "deltaBias" => "--"
            ));

            // Counts training times
            $totalRun ++;

            // Now checks if the RNA is stable already
            $referer_value = end($checkpoints);
            // if all rows match the desired output ...
            $sum = array_sum($checkpoints);
            $n_rows = count($checkpoints);
            if ($totalRun > 1 && ($sum / $n_rows) == $referer_value) {
                $keepTrainning = false;
            }
        } // END - while

        // Prepares the final result
        $result = array();
        for($i = 1; $i <= $n_columns; $i ++) {
            $result["w" . $i] = $this->w[$i];
        }

        $this->debug($this->print_html_table($debug_vars));

        return $result;
    } // END - train
    function testCase($input, $results) {
        // Sweeps input columns
        $result = 0;
        $i = 1;
        foreach ($input as $column_value) {
            // Calculates teste Y
            $result += $results["w" . $i] * $column_value;
            $i ++;
        }
        // Checks in each class the test fits
        return ($result > 0) ? true : false;
    } // END - test_class

    // Returns the html code of a html table base on a hash array
    function print_html_table($array) {
        $html = "";
        $inner_html = "";
        $table_header_composed = false;
        $table_header = array();

        // Builds table contents
        foreach ($array as $array_item) {
            $inner_html .= "<tr>\n";
            foreach ( $array_item as $array_col_label => $array_col ) {
                $inner_html .= "<td>\n";
                $inner_html .= $array_col;
                $inner_html .= "</td>\n";

                if ($table_header_composed == false) {
                    $table_header[] = $array_col_label;
                }
            }
            $table_header_composed = true;
            $inner_html .= "</tr>\n";
        }

        // Builds full table
        $html = "<table border=1>\n";
        $html .= "<tr>\n";
        foreach ($table_header as $table_header_item) {
            $html .= "<td>\n";
            $html .= "<b>" . $table_header_item . "</b>";
            $html .= "</td>\n";
        }
        $html .= "</tr>\n";

        $html .= $inner_html . "</table>";

        return $html;
    } // END - print_html_table

    // Debug function
    function debug($message) {
        if ($this->debug == true) {
            echo "<b>DEBUG:</b> $message";
        }
    } // END - debug
} // END - class

Conclusion

Identifying a user without a Unique Identifier is not a straight-forward or simple task. it is dependent upon gathering a sufficient amount of Random Data which you are able to gather from the user by a variety of methods.

Even if you choose not to use an Artificial Neural Network, I suggest at least using a Simple Probability Matrix with priorities and likelihoods - and I hope the code and examples provided above give you enough to go on.

Community
  • 1
  • 1
Baba
  • 89,415
  • 27
  • 158
  • 212
  • @Baba What do you mean by "Using Blobs" to fingerprint a browser? – billmalarky Mar 13 '14 at 19:17
  • @billmalarky [HTML5 blob object using File API](https://developer.mozilla.org/en-US/docs/Web/API/Blob) – Baba Mar 14 '14 at 08:22
  • 1
    @Baba How would one use that to fingerprint a browser? Just check what is currently in it at any given time? – billmalarky Mar 14 '14 at 20:04
  • @Baba great work, I've always tried to have some multi-levels strategy to identify a user, but as you said cache can be cleared, IPs changed, users behind proxies or NAT - *especially those people* -, cookies deleted, etc.. but even with all this much effort it is a matter probability, also if the bad user is using **Tor** browser for example, *most* if not all of detecting strategies mentioned won't work. I liked https://www.browserleaks.com/ but with Tor all came back undefined or unknown – Mi-Creativity Dec 26 '15 at 22:04
  • Just a Note intended only at _"removing some dust"_ from this gem of a publication: List of broken links as of 07.09.17: - `Implement Bayesian inference using PHP`, all the 3 parts. - `Frequency vs Probability` - `Joint Probability` - `Input (Sensor & Desired), Initial Weights, Output (Sensor, Sum, Network), Error, Correction and Final Weights` – Ziezi Sep 07 '17 at 13:39
29

This technique (to detect same users without cookies - or even without ip address) is called browser fingerprinting. Basically you crawl as information about the browser as you can - better results can be achieved with javascript, flash or java (f.ex. installed extensions, fonts, etc.). After that, you can store the results hashed, if you want.

It's not infallible, but:

83.6% of the browsers seen had a unique fingerprint; among those with Flash or Java enabled, 94.2%. This does not include cookies!

More info:

pozs
  • 29,269
  • 4
  • 47
  • 55
  • i think, it's still the answer. if you need to identify a device, you only need to get those data - f.ex. OS, generic extensions (and its' versions), installed fonts, etc ... – pozs Apr 18 '13 at 20:49
  • This is not going to work well. Every browser supports sessions and cookies. Use the right tool for the job. – Man Vs Code Apr 18 '13 at 20:55
  • 1
    @slash197 what about file cache? i mean using 1px x 1px transparent flash media along with an xml file holding a unique generated id inside (the xml should be created once on the server before it's been downloaded to user local HD) this way even if the user deletes cookies or logout, you can still have a bridge using action script sendAndLoad method. – Mbarry Apr 19 '13 at 00:57
  • The minimum of change will affect the hash result. for example the version of shock wave player. possible solution with locally stored xml cache file with unique key generated + hidden 1px x 1px flash media (action script) on the browser, this way you get rid of cookies, session expiration issue if that was the main issue. you can still have the bridge between your sql database and the key on the user local machine. – Mbarry Apr 19 '13 at 01:05
  • @Mbarry I'm not much of a flash fan but if in the browser there's a flash blocking add-on like I have that 1x1 pixel flash media would be disabled, am I rught? – slash197 Apr 20 '13 at 05:43
  • Maybe stupid question but I am wondering about an hour - Where do I store hashed fingerprint? In my personal database (f.e. MySQL)? – kelly Dec 26 '16 at 17:50
7

The above mentioned thumbprinting works, but can still suffer colisions.

One way is to add UID to the url of each interaction with the user.

http://someplace.com/12899823/user/profile

Where every link in the site is adapted with this modifier. It is similar to the way ASP.Net used to work using FORM data between pages.

Justin Alexander
  • 1,864
  • 3
  • 19
  • 24
  • I thought of that but that's the easiest way for a user to modify it – slash197 Apr 21 '13 at 11:58
  • 1
    not of the id is a self referencing hash. Makes it cryptographically secure. – Justin Alexander Apr 21 '13 at 12:39
  • Also, this method is ok when someone's browsing the site but how do you propose handling the case when a returning user comes back after a week and simply types in the websites address, without id? – slash197 Apr 21 '13 at 16:17
  • 1
    @slash197 in that case why dont you tell user to login, that what happens even when user deletes the cookies. – Akash Kava Apr 21 '13 at 19:15
6

Have you looked into Evercookie? It may or may not work across browsers. An extract from their site.

"If a user gets cookied on one browser and switches to another browser, as long as they still have the Local Shared Object cookie, the cookie will reproduce in both browsers."

Alexis Tyler
  • 1,515
  • 6
  • 26
  • 44
  • I wonder if it works with JavaScript disabled. Do you have any experience? – Voitcus May 27 '13 at 11:58
  • It's called evercookie for a reason, it'll work no matter what. It's near impossible for them to remove the cookie. – Alexis Tyler Aug 03 '13 at 07:53
  • It won't work no matter what. From the first line of the description: 'evercookie is a javascript API...'. It will not work if javascript is disabled. – gdw2 Dec 03 '14 at 22:28
  • Doesn't have to be even js disabled. Ghostery and uBlock drops evercookie – opengrid May 19 '16 at 10:33
4

You could do this with a cached png, it would be somewhat unreliable (different browsers behave differently, and it'll fail if the user clears their cache), but it's an option.

1: set up a Database that stores a unique user id as a hex string

2: create a genUser.php (or whatever language) file that generates a user id, stores it in the DB and then creates a true color .png out of the values of that hex string (each pixel will be 4 bytes) and return that to the browser. Be sure to set the content-type and cache headers.

3: in the HTML or JS create an image like <img id='user_id' src='genUser.php' />

4: draw that image to a canvas ctx.drawImage(document.getElementById('user_id'), 0, 0);

5: read the bytes of that image out using ctx.getImageData, and convert the integers to a hex string.

6: That is your unique user id that's now cached on the your users computer.

hobberwickey
  • 5,274
  • 1
  • 24
  • 28
  • He wants something that can track the user "across browsers" which won't work here (each browser has its own cache database). – EricLaw Sep 30 '15 at 16:10
  • Where are you seeing that, his question only asks for "Forgot to mention that it would need to be cross-browser compatible", i.e. work in any browser. – hobberwickey Oct 01 '15 at 16:51
  • His question is poorly written. `I'm after device recognition` is the giveaway for what he wants, and he elaborates here: http://stackoverflow.com/questions/15966812/user-recognition-without-cookies-or-local-storage/16171013?noredirect=1#comment23078610_16131552 – EricLaw Oct 03 '15 at 18:23
2

You can do it with etags. Although I am not sure if this legal as a bunch of lawsuits were filed.

If you properly warn your users or if you have something like an intranet website it might be ok.

Brian McGinity
  • 5,329
  • 5
  • 33
  • 46
  • Etags are not cross browser compatible. – slash197 Apr 21 '13 at 18:58
  • 1
    Etags are part of the HTTP/1.1 spec. All popular browsers support etags, you would pretty much need to write your own browser to not support ETag/If-None-Match headers. – Brian McGinity Apr 22 '13 at 13:18
  • I didn't say it doesn't support it, I said its not cross browser compatible. If a tag is saved in Firefox it's not available in chrome so the content will be downloaded again since there's no cache. – slash197 Apr 22 '13 at 16:21
  • Now I understand what you were saying. You're right. Each browser has it's own cache store, hence different etags. – Brian McGinity Apr 22 '13 at 18:04
1

You could potentially create a blob to store a device identifier ...

the downside is that the user needs to download the blob ( you can force the download ), as the browser can't access the File System to directly save the file.

reference:

https://www.inkling.com/read/javascript-definitive-guide-david-flanagan-6th/chapter-22/blobs

DanielDMO
  • 86
  • 1
  • 9
1

Based on what you have said :

Basically I'm after device recognition not really the user

Best way to do it is to send the mac address which is the NIC ID.

You can take a look at this post : How can I get the MAC and the IP address of a connected client in PHP?

Jean-François Fabre
  • 126,787
  • 22
  • 103
  • 165
Mehdi Karamosly
  • 4,996
  • 1
  • 25
  • 46
  • Sorry, but NIC ID is easy spoofable. It's definitely not the best way. – asgs Jul 13 '15 at 18:10
  • @asgs browser fingerprinting would be better maybe, or what would be the best way in your opinion ? – Mehdi Karamosly Jul 13 '15 at 19:48
  • There is no best way, that is the sad part about it. However, that and Browser FingerPrinting in combination with the Probability study that Baba has presented above would be the best in my opinion. – asgs Jul 14 '15 at 04:27
0

Inefficient, but may give you the desired results, would be to poll an API on your side. Have a background process on the client side which sends user data at an interval. You will need a user identifier to send to your API. Once you have that you can send along any information associated to that unique identifier.

This removes the need for cookies and localstorage.

rexposadas
  • 2,689
  • 3
  • 24
  • 46
0

I can't believe, http://browserspy.dk still has not been mentioned here! The site describes many features (in terms of pattern recognition), which could be used to build a classifier.

And of cause, for evaluating the features I'd suggest Support Vector Machines and libsvm in particular.

Valentin Heinitz
  • 6,846
  • 10
  • 49
  • 99
0

Track them during a session or across sessions?

If your site is HTTPS Everywhere you could use the TLS Session ID to track the user's session

Neil McGuigan
  • 41,314
  • 10
  • 106
  • 137
-2
  1. create a cross-platform dummy (nsapi)plugin and generate a unique name for the plugin name or version when the user downloads it (eg after login).
  2. provide a installer for the plugin / install it per policy

this will require the user to willingly install the identifier.

once the plugin is installed, the fingerprint of any (plugin enabled) browser will contain this specific plugin. To return the info to a server, a algorithm to effectively detect the plugin on client-side is needed, otherwise IE and Firefox >= 28 users will need a table of possible valid identifies.

This requires a relatively high investment into a technology that will likely be shut down by the browser-vendors. When you are able to convince your users to install a plugin, there may also be options like install a local proxy, use vpn or patch the network drivers.

Users that do not want to be identified (or their machines) will always find a way to prevent it.

  • Hi welcome to stack overflow. Please note; `this will require the user to willingly install the identifier.` is probably not what the original poster (OP) meant. – Stefan May 24 '14 at 23:42