Find all subsets of a finite metric space for which the sum of distances is less than a given number

Question

I have five elements A, B, C, D and E.

The distance between each of the elements is given by the matrix below:

Distances =
    [0  5   3   8   15;
     5  0   7   5   20;
     3  7   0   12  12;
     8  5   12  0   8;
     7  20  12  8   0]

I want to choose all combinations of elements such that the sum of distances is less than 10.

It can be done recursively by:

First find sets of 2-item eligible combinations.
Then, find sets of 3-item eligible combinations by adding another item to the previously-found eligible 2-item combinations.
Etc.

Doing it by hand for the above example I get the following combinations:

A,  
B,  
C,  
D,  
E,  
A   B,
A   C,
A   D,
B   C,
B   D,
D   E,  
A   B   C

How would I do this systematically in Octave, if the number of elements is large (say 250)?

Please show us the relevant part of the code that you tried. Also, by "the sum of distances" do you mean along the path defined by those elements, or (for three or more elements) the sum of distances between all pairs of points? (I.e. for `A B C` add distances `AB, AC, BC`.) — Rory Daulton, Dec 24 '16 at 00:01
How is the sum of distances in A,B,C less than 10? I see the distances of 3, 5, 7 with the sum of 15. — , Dec 24 '16 at 03:16
@zaq Maybe OP meant: `A->B + A->C = 5 + 3 = 8 < 10`. (Then `C->A->B` would be valid)... I agree that the question is confusing... — Dev-iL, Dec 24 '16 at 06:41

score 1 · Answer 1 · answered Dec 24 '16 at 08:25

Several general points:

Since the original question was tagged with matlab, I will show a solution which I tested there.
This solution uses the functions VChooseK and VChooseKRO found on FEX, which need to be compiled into MEX using an appropriate compiler.
Even though the question talks about distances, and there's little sense in adding up discontinuous paths (i.e. A->C + B->D), since this is not specified explicitly in the question as something invalid, the solution below outputs them as well.
The solution is shown for the example given in the OP. It should be modified slightly to output readable results for 250 nodes, (i.e. change the node "names" from letters to numbers seeing how 26 < 250).
Outputs are currently only printed. Some modifications need to be made (in the form of temporary variables) if further computations are required on the result.

function q41308927
%% Initialization:
nodes = char((0:4) + 'A');
D = [0  5   3   8   15;
     5  0   7   5   20;
     3  7   0   12  12;
     8  5   12  0   8;
     7  20  12  8   0];
thresh = 10;
d = triu(D); % The problem is symmetric (undirected), so we only consider the upper half.
% Also keep track of the "letter form":
C = reshape(cellstr(VChooseKRO(nodes,2)), size(D)).'; % "C" for "Combinations"
%{
C = 

  5×5 cell array

    'AA'    'AB'    'AC'    'AD'    'AE'
    'BA'    'BB'    'BC'    'BD'    'BE'
    'CA'    'CB'    'CC'    'CD'    'CE'
    'DA'    'DB'    'DC'    'DD'    'DE'
    'EA'    'EB'    'EC'    'ED'    'EE'
%}
C = C(d>0); d = d(d>0);
assert(numel(C) == numel(d)); % This is important to check
%% Find eligible sets of size n
for k = 1:numel(nodes)  
  if numel(d)<k
    break
  end
  % Enumerate combinations:
  C = C(VChooseK(1:numel(C),k));
  d = sum(VChooseK(d,k),2);  
  % Filter combinations:
  if any(d < thresh)    
    C(d >= thresh,:) = [];
    d = d(d < thresh);
    disp(sortrows(C)); % This is just to show it works like the manual example
  else
    break  
  end    
end

The output of the above is:

'AB'
'AC'
'AD'
'BC'
'BD'
'DE'

'AB'    'AC'
'AC'    'BD'

score 0 · Accepted Answer · answered Dec 24 '16 at 23:17

This is a plain Octave (or Matlab) solution. The matrix Distances is as in the question. The algorithm builds a 0-1 matrix a in which each column encodes a set with sum of distances less than limit (for example 10).

The matrix a is initialized with identity, because all one-elements subsets are admissible (sum of distances is 0). Then each column is picked c = a(:,m); and the possibility of adding another element is investigated (cand = c; cand(k) = 1; means adding k-th element). Without loss of generality, it is enough to consider adding only the elements after the last current element of the set.

The product cand'*Distances*cand is twice the sum of distances of the candidate set cand. So, if it's less than twice the limit, the column is added: a = [a cand];. At the end, the matrix a is displayed in transposed form, for readability.

limit = 10;
n = length(Distances);    
a = eye(n, n);
col1 = 1;
col2 = n;

while (col1 <= col2)
  for m = col1:col2
    c = a(:,m);
    for k = max(find(c>0))+1:n
      cand = c;
      cand(k) = 1;
      if cand'*Distances*cand < 2*limit
        a = [a cand];
      end
    end
  end
  col1 = col2 + 1;
  col2 = length(a);
end

disp(a')

Output:

   1   0   0   0   0
   0   1   0   0   0
   0   0   1   0   0
   0   0   0   1   0
   0   0   0   0   1
   1   1   0   0   0
   1   0   1   0   0
   1   0   0   1   0
   0   1   1   0   0
   0   1   0   1   0
   0   0   0   1   1

With a 250 by 250 matrix, the performance will greatly depend on how large limit is. If it is so large that all or most of 2^250 sets qualify, this is going to run out of memory. (2^250 is more than 10^75, so I don't think you'd ever want to see anywhere near that many sets).

And this is to have output in a more readable form:

for m = 1:length(a)
  disp(find(a(:,m))')
end

Output:

Thanks a ton Zaq .This is very helpful. – Abhijith N Dec 25 '16 at 00:37 — Abhijith N, Dec 25 '16 at 00:37

Find all subsets of a finite metric space for which the sum of distances is less than a given number

2 Answers2