Fast powerset implementation with complement set

Question

I would like to have a function

powersetWithComplements :: [a] -> [([a], [a])]

Such that for example:

powersetWithComplements [1,2,3] = [([],[1,2,3]),([3],[1,2]),([2],[1,3]),([2,3],[1]),([1],[2,3]),([1,3],[2]),([1,2],[3]),([1,2,3],[])]

It is easy to obtain some implementation, for example

powerset :: [a] -> [[a]]
powerset = filterM (const [False, True])

powersetWithComplements s = let p = powerset s in zip p (reverse p)

Or

powersetWithComplements s = [ (x, s \\ x) | x <- powerset s]

But I estimate that the performance of both these would be really poor. What would be an optimal approach? It is possible to use different data structure than the [] list.

It’s worth noting that if you want to find the entire powerset (ie no lazy shortcutting) of a list of length n, you need to do at least O(2^n) work. Thus such a function may never be “fast” for reasonably long lists. — Dan Robertson, Dec 11 '17 at 15:03
@DanRobertson: that is correct. But if we calculate the complement for every of these items, we will thus (if the complement `(\\)` is calculated in *O(n^2)*), have a *O(n^2 2^n)* algorithm. I agree that *O(n^2)* will probably be small compared to *O(2^n)*, but it is still a factor to take into account. — Willem Van Onsem, Dec 11 '17 at 15:08

Willem Van Onsem · Accepted Answer · 2017-12-11T15:06:46.510

Well you should see a powerset like this: you enumerate over the items of the set, and you decide whether you put these in the "selection" (first item of the tuple), or not (second item of the tuple). By enumerating over these selections exhaustively, we get the powerset.

So we can do the same, for instance using recursion:

import Control.Arrow(first, second)

powersetWithComplements [] = [([],[])]
powersetWithComplements (x:xs) = map (second (x:)) rec ++ map (first (x:)) rec
    where rec = powersetWithComplements xs

So here the map (second (x:) prepends all the second items of the tuples of the rec with x, and the map (second (x:) does the same for the first item of the tuples of rec. where rec is the recursion on the tail of the items.

Prelude Control.Arrow> powersetWithComplements [1,2,3]
[([],[1,2,3]),([3],[1,2]),([2],[1,3]),([2,3],[1]),([1],[2,3]),([1,3],[2]),([1,2],[3]),([1,2,3],[])]

The advantage of this approach is that we do not generate a complement list for every list we generate: we concurrently build the selection, and complement. Furthermore we can reuse the lists we construct in the recursion, which will reduce the memory footprint.

In both time complexity and memory complexity, the powersetWithComplements function will be equal (note that this is complexity, of course in terms of processing time it will require more time, since we do an extra amount of work) like the powerset function, since prepending a list is usually done in O(1)), and we now build two lists (and a tuple) for every original list.

Elegant solution. Using a [difference list](https://hackage.haskell.org/package/dlist-0.8.0.3/docs/Data-DList.html) will eliminate the appending overhead, which is significant here. — luqui, Dec 12 '17 at 01:44
Actually, to my surprise, it isn't that much faster with dlist. [benchmark](https://github.com/luqui/experiments/blob/master/pset.hs) — luqui, Dec 12 '17 at 02:19
But doing a "CPS transformation" of sorts improves things a lot. See my answer. — luqui, Dec 12 '17 at 03:15

score 2 · Answer 2 · answered Dec 12 '17 at 03:05

Since you are looking for a "fast" implementation, I thought I would share some benchmark experiments I did with Willem's solution.

I thought using a DList instead of a plain list would be a big improvement, since DLists have constant-time append, whereas appending lists is linear in the size of the left argument.

psetDL :: [a] -> [([a],[a])]
psetDL = toList . go
    where
    go [] = DList.singleton ([],[])
    go (x:xs) = (second (x:) <$> rec) <> (first (x:) <$> rec)
        where
        rec = go xs

But that did not have a significant effect.

I suspected this is because we are traversing both sublists anyway because of the fmap (<$>). We can avoid the traversal by doing something similar to CPS-converting the function, passing down the accumulated sets as parameters rather than returning them.

psetTail :: [a] -> [([a],[a])]
psetTail = go [] []
    where
    go a b [] = [(a,b)]
    go a b (x:xs) = go a (x:b) xs <> go (x:a) b xs

This yielded a 220% improvement on a list of size 20. Now since we aren't traversing the lists from fmapping, we can get rid of the append traversal by using a DList:

psetTailDL :: [a] -> [([a],[a])]
psetTailDL = toList . go [] []
    where
    go a b [] = DList.singleton (a,b)
    go a b (x:xs) = go a (x:b) xs <> go (x:a) b xs

Which yields an additional 20% improvement.

Roman Czyborra · Answer 3 · 2018-04-19T19:31:10.077

I guess the best is inspired by your reverse discovery

partitions s=filterM(const[False,True])s
        `zip`filterM(const[True,False])s

rather than a likely stackoverflower

partitions[]=[([],[])]
partitions(x:xs)=[p|(f,t)<-partitions xs,p<-[(l,x:r),(x:l,r)]]

or a space-and-time-efficient finite list indexer

import Data.Array
import Data.Bits
import Data.List
partitions s=[(map(a!)f,map(a!)t)
             |n<-[length s],a<-[listArray(0,n-1)s],
              m<-[0..2^n-1],(f,t)<-[partition(testBit m)[0..n-1]]]

Fast powerset implementation with complement set

3 Answers3