10

What's the best way to describe the algorithmic complexity of collusion detection for a ten-million-player online poker site?

Assume (I don't think these assumptions make much difference so feel free to ignore them, but just to clarify):

  • That the site has 10,000,000 registered users.
  • That these players have played a total of 5 billion hands.
  • That the only information you're given is the "master hand history database" for the site, containing all player hole cards and betting actions for each hand.
  • In other words, you may NOT take shortcuts such as examining IP addresses, looking for unusual rake/profit patterns, and so forth.
  • Assume you are given a function which, when passed a group of exactly N (where N is between 2 and 10) players, returns TRUE if ALL of the players in the group have colluded TOGETHER. If some but not all of the players are colluders, the function returns FALSE. A return value of TRUE is made with (for example) 75% confidence.

Your job is to produce an exhaustive list of every player who's colluded, along with a complete list of the players he's colluded with. I have recently heard this problem described as NP-hard but is this accurate? Sometimes we call things "NP" or "NP-hard" that are merely "hard".

Thanks!

Matthew Flaschen
  • 255,933
  • 45
  • 489
  • 528
  • I don't have an answer (yet?), but another question. :) If I call haveColluded("Bob", "Jane", "Mary"), and: 1. Bob colluded with Jane in hand 1. 2. Bob colluded with Mary in hand 2. 3. Jane colluded with Mary in hand 3. (assume those are the only games played) what does it return? – Matthew Flaschen Apr 26 '09 at 11:27
  • In that case, assuming Bob, Jane, and Mary are sitting at the same table, the function returns TRUE. You've identified a 3-player collusion group and not every player in that group needs to be active during the subset of hands you're looking at. Of course, HaveColluded is somewhat "magical" but I felt it was necessary to restrict the problem. Feel free to posit your own definition of HaveColluded here if that simplifies things! :-) –  Apr 26 '09 at 11:31
  • @Coding the Wheel: If anyone else had asked this question, I would have told them to ask you. :) – Bill the Lizard Apr 27 '09 at 02:07
  • 1
    Since your post i have been reading up on the topic as i have never played online Poker or even real poker in a casino i wouldn't know where to begin with the haveColluded method below but i would say look into the works done by David Sklansky & Mason Malmuth as they keep popping up as i look into the topic. – Random Developer Apr 27 '09 at 17:58
  • you should scrub and post the data set somewhere and turn this into a stackoverflow coding challenge. ;) – paxos1977 Jan 07 '10 at 01:05
  • Sklansky and Malmuth are _great_ for a description of strategy but I've never seen them mention collusion detection. – aaronasterling Sep 17 '10 at 03:54

4 Answers4

4

The brute-force approach I see immediately is:

Set colluders = new Set();
for(Player p1 : allPlayers)
{
  for(Player p2 : allPlayers)
  {
    if(!p1.equals(p2) && haveColluded(p1, p2))
    {
      colluders.add(p1);
      colluders.add(p2);
    }
  }
}

I don't see a point to calling haveColluded with larger argument counts than 2 because that could give false negatives. I suppose though it depends how costly the function is. But the above results in O(n^2) calls to haveColluded (n being number of players). The function itself would seemingly be O(m), where m is the number of games they played together. Thus, the algorithm seems well under O(n^3). To be NP-hard, you have to prove "A problem H is NP-hard if and only if there is an NP-complete problem L that is polynomial time Turing-reducible to H [...] In other words, L can be solved in polynomial time by an oracle machine with an oracle for H." (http://en.wikipedia.org/wiki/NP-hard). I have studied NP-complete problems (e.g. 3-SAT, Travelling salesman problem, etc.) and I don't see how you'd prove that. But then again, it does seem suspiciously similar to the clique problem.

Matthew Flaschen
  • 255,933
  • 45
  • 489
  • 528
  • Thanks for the informative answer. I also don't see how you'd "prove" that it's NP-hard, but it bears a suspicious resemblance to problems which are NP-hard. Of course, having the "haveColluded" function simplifies things. IRL the problem is (if you ask me) intractable except in cases of obvious collusion (ie, where 6 players log in from the same IP or something like that). –  Apr 26 '09 at 12:44
  • 2
    This depends on the properties of the `haveColluded()` function. Perhaps 10 players colluding together can only be detected by calling the function on all 10 of them. If this is the case, the problem is much harder. – Rafał Dowgird Apr 27 '09 at 07:32
3

Looks like clique detection, which is NP-hard. On the other hand, the clique size is limited here (10), so brute-force is n^10 at worst.

Edit:The key question here is what the properties of the collusion function are. Can 10 players colluding together always be detected by calling the function on two smaller sets (say 5) players?

Rafał Dowgird
  • 38,640
  • 11
  • 73
  • 89
  • I don't believe this is the 'clique detection problem'. He isn't even being asked to detect cliques of a given size. He is being asked whether or not a graph of up to 10 nodes is fully connected. This is a fairly trivial problem. – paxos1977 Jan 07 '10 at 01:15
  • It's definitely the clique problem, as I see it, and instead of "knowing x" his decision is "colluded with x". – Noon Silk Jan 07 '10 at 01:30
  • @ceretullis: No. He is being asked for a complete list of nodes (in a huge graph) that are members of a subgraph that has a property determined by the `haveColluded()` function. This is completely different and much harder than checking a single graph of size 10 for cliqueness. – Rafał Dowgird Jan 07 '10 at 08:18
  • @silky & rafal. If you're able pre-compute the graph of players that have colluded together letting vertices represent players and edges represent collusion between players (and edge weights represent collusion confidence levels), then the function determining whether a list of N players have *all* colluded together is a matter of determining whether vertices in the list form a fully connected graph. – paxos1977 Jan 09 '10 at 06:40
  • 1
    @silky & rafal. I'm picturing iterating over the 5 billion hands one at a time, examining each game. If during the hand player X does something statistically unlikely which appears to benefit player Y, then a small value is added to the weight of edge XY (undirected edges). You only have to look at individual games, letting the weights accumulate over time. If some 'likely' threshold is exceeded then X and Y are considered to have colluded. – paxos1977 Jan 09 '10 at 07:00
1

I would split this into two steps:

  1. Iterate over 5 billion hands of poker examining the play in each hand. Employ some algorithm, let's call it algorithm A, on each hand. As you go you build a collusion graph where vertices represent players and undirected weighted edges represent some confidence of collusion between two players. When algorithm A triggers on suspicion of player X colluding with player Y, some value is added to the weighted edge XY in the collusion graph. As you progress through the hands that have been played the edge weights accumulate over time. When some threshold has been reached, then the edge represents collusion between X and Y.

  2. Then the function determining whether a list of N player vertices have all colluded together is a matter of verifying the subgraph containing the N vertices is fully-connected (meaning every node has an edge weight greater than the collusion threshold to every other node in the subgraph). IIRC, determining this is O(n*lg(n) ).

paxos1977
  • 135,245
  • 26
  • 85
  • 125
1

Under your model, what you describe should be fairly easy. You are given an implicit graph (vertices are players, edges correspond to having played a game together). You want a subgraph of that graph.

If the collusion function were perfectly reliable you just call it on every pair of vertices in the graph, and you get the subgraph.

That subgraph is probably fairly disconnected. I would expect the resulting graph to be disconnected or very weakly connected; large well connected subgraphs will fall out quickly by doing some min-cuts.

Note that we can restrict ourselves to looking at only pairs, because the collusion function should obey (in terms of confidence level) Collude(A,B,C)<Collude(A,B).

Constructing this global collusion function is the part that seems hard.

Captain Segfault
  • 1,618
  • 8
  • 11