We are given a pattern string: 'foo' and a source string: 'foobaroofzaqofom' and we need to find all occurrences of word pattern string in any order of letters. So for a given example solution will looks like: ['foo', 'oof', 'ofo'].

I have a solution, but i'm not sure that it is the most efficient one:

  1. Create hash_map of chars of pattern string where each char is a key and each value is a counter of chars in pattern. For a given example it would be {{f: 1}, {o: 2}}
  2. Look through the source string and if found one of the elements from hash_map, than try to find all the rest elements of pattern
  3. If all elements are found than it is our solution, if not going forward

Here is an implementation in c++:

set<string> FindSubstringPermutations(string& s, string& p)
    set<string> result; 
    unordered_map<char, int> um;

    for (auto ch : p)
        auto it = um.find(ch);
        if (it == um.end())
            um.insert({ ch, 1 });
            um[ch] += 1;

    for (int i = 0; i < (s.size() - p.size() + 1); ++i)
        auto it = um.find(s[i]);
        if (it != um.end())
            decltype (um) um_c = um;
            um_c[s[i]] -= 1;
            for (int t = (i + 1); t < i + p.size(); ++t)
                auto it = um_c.find(s[t]);
                if (it == um_c.end())
                else if (it->second == 0)
                    it->second -= 1;

            int sum = 0;
            for (auto c : um_c)
                sum += c.second;

            if (sum == 0)
                result.insert(s.substr(i, p.size()));

    return result;

Complexity is near O(n), i don't know how to calculate more precisely.

So the question: is there any efficient solution, because using hash_map is a bit of hacks and i think there may be more efficient solution using simple arrays and flags of found elements.

2 Answers2


You could use a order-invariant hash-algorithm that works with a sliding window to optimize things a bit.

An example for such a hash-algorithm could be

int hash(string s){
    int result = 0;

    for(int i = 0; i < s.length(); i++)
        result += s[i];

    return result;

This algorithm is a bit over-simplistic and is rather horrible in all points except performance (i.e. distribution and number of possible hash-values), but that isn't too hard to change.

The advantage with such a hash-algorithm would be:

hash("abc") == hash("acb") == hash("bac") == ...

and using a sliding-window with this algorithm is pretty simple:

string s = "abcd";

hash(s.substring(0, 3)) + 'd' - 'a' == hash(s.substring(1, 3));

These two properties of such hashing approaches allow us to do something like this:

int hash(string s){
    return sum(s.chars);

int slideHash(int oldHash, char slideOut, char slideIn){
    return oldHash - slideOut + slideIn;

int findPermuted(string s, string pattern){
    int patternHash = hash(pattern);
    int slidingHash = hash(s.substring(0, pattern.length()));

    if(patternHash == slidingHash && isPermutation(pattern, s.substring(0, pattern.length())
        return 0;

    for(int i = 0; i < s.length() - pattern.length(); i++){
        slidingHash = slideHash(slidingHash, s[i], s[i + pattern.length()]);

        if(patternHash == slidingHash)
            if(isPermutation(pattern, s.substring(i + 1, pattern.length())
                return i + 1;

    return -1;

This is basically an altered version of the Rabin-Karp-algorithm that works for permuted strings. The main-advantage of this approach is that less strings actually have to be compared, which brings quite a bit of an advantage. This especially applies here, since the comparison (checking if a string is a permutation of another string) is quite expensive itself already.

The above code is only supposed as a demonstration of an idea. It's aimed at being easy to understand rather than performance and shouldn't be directly used.

The above "implementation" of an order-invariant rolling hash algorithm shouldn't be used, since it performs extremely poor in terms of data-distribution. Of course there are obviously a few problems with this kind of hash: the only thing from which the hash can be generated is the actual value of the characters (no indices!), which need to be accumulated using a reversible operation.

A better approach would be to map each character to a prime (don't use 2!!!). Since all operations are modulo 2^(8 * sizeof(hashtype)) (integer overflow), we need to generate a table of the multiplicative inverses modulo 2^(8 * sizeof(hashtype)) for all used primes. I won't cover generating these tables, as there's plenty of resources available on that topic here already.

The final hash would then look like this:

map<char, int> primes = generatePrimTable();
map<int, int> inverse = generateMultiplicativeInverses(primes);

unsigned int hash(string s){
    unsigned int hash = 1;
    for(int i = 0; i < s.length(); i++)
        hash *= primes[s[i]];

    return hash;

unsigned int slideHash(unsigned int oldHash, char slideOut, char slideIn){
    return oldHash * inverse[primes[slideOut]] * primes[slideIn];

Keep in mind that this solution works with unsigned integers.

Typical rolling hashfunction for anagrams

  • using product of primes
  • This will only work for relatively short patterns
  • The hashvalues for allmost all normal words will fit into a 64 bit value without overflow.
  • Based on this anagram matcher

/* braek; */
/* 'foobaroofzaqofom' */

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

typedef unsigned long long HashVal;
static HashVal hashchar (unsigned char ch);
static HashVal hashmem (void *ptr, size_t len);

unsigned char primes26[] =
{ 5,71,79,19,2,83,31,43,11,53,37,23,41,3,13,73,101,17,29,7,59,47,61,97,89,67, };
static HashVal hashchar (unsigned char ch)
HashVal val=1;

if (ch >= 'A' && ch <= 'Z' ) val = primes26[ ch - 'A'];
else if (ch >= 'a' && ch <= 'z' ) val = primes26[ ch - 'a'];

return val;

static HashVal hashmem (void *ptr, size_t len)
size_t idx;
unsigned char *str = ptr;
HashVal val=1;

if (!len) return 0;
for (idx = 0; idx < len; idx++) {
        val *= hashchar ( str[idx] );

return val;

unsigned char buff [4096];
int main (int argc, char **argv)
size_t patlen,len,pos,rotor;
int ch;
HashVal patval;
HashVal rothash=1;

patlen = strlen(argv[1]);
patval = hashmem( argv[1], patlen);
// fprintf(stderr, "Pat=%s, len=%zu, Hash=%llx\n", argv[1], patlen, patval);

for (rotor=pos=len =0; ; len++) {
        if (ch == EOF) break;

        if (ch < 'A' || ch > 'z') { pos = 0; rothash = 1; continue; }
        if (ch > 'Z' && ch < 'a') { pos = 0; rothash = 1; continue; }
                /* remove old char from rolling hash */
        if (pos >= patlen) { rothash /= hashchar(buff[rotor]); }
                /* add new char to rolling hash */
        buff[rotor] = ch;
        rothash *= hashchar(buff[rotor]);

        // fprintf(stderr, "%zu: [rot=%zu]pos=%zu, Hash=%llx\n", len, rotor, pos, rothash);

        rotor = (rotor+1) % patlen;
                /* matched enough characters ? */
        if (++pos < patlen) continue;
                /* correct hash value ? */
        if (rothash != patval) continue;
        fprintf(stdout, "Pos=%zu\n", len);

return 0;


$ ./a.out foo < anascan.c

Update. For people who don't like product of primes, here is a taxinumber sum of cubes (+ additional histogram check) implementation. This is also supposed to be 8-bit clean. Note the cubes are not necessary; it wotks equally well with squares. Or just the sum. (the final histogram check will have some more work todo)

/* braek; */
/*  'foobaroofzaqofom' */
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

typedef unsigned long long HashVal;
static HashVal hashchar (unsigned char ch);
static HashVal hashmem (void *ptr, size_t len);

static HashVal hashchar (unsigned char ch)
HashVal val=1+ch;

return val*val*val;

static HashVal hashmem (void *ptr, size_t len)
size_t idx;
unsigned char *str = ptr;
HashVal val=1;

if (!len) return 0;
for (idx = 0; idx < len; idx++) {
        val += hashchar ( str[idx] );

return val;
int main (int argc, char **argv)
size_t patlen,len,rotor;
int ch;
HashVal patval;
HashVal rothash=1;
unsigned char *patstr;
unsigned pathist[256] = {0};
unsigned rothist[256] = {0};
unsigned char cycbuff[1024];

patstr = (unsigned char*) argv[1];
patlen = strlen((const char*) patstr);
patval = hashmem( patstr, patlen);

for(rotor=0; rotor < patlen; rotor++) {
        pathist [ patstr[rotor] ] += 1;
fprintf(stderr, "Pat=%s, len=%zu, Hash=%llx\n", argv[1], patlen, patval);

for (rotor=len =0; ; len++) {
        if (ch == EOF) break;

                /* remove old char from rolling hash */
        if (len >= patlen) {
                rothash -= hashchar(cycbuff[rotor]);
                rothist [ cycbuff[rotor] ] -= 1;
                /* add new char to rolling hash */
        cycbuff[rotor] = ch;
        rothash += hashchar(cycbuff[rotor]);
        rothist [ cycbuff[rotor] ] += 1;

        // fprintf(stderr, "%zu: [rot=%zu], Hash=%llx\n", len, rotor, rothash);

        rotor = (rotor+1) % patlen;
                /* matched enough characters ? */
        if (len < patlen) continue;
                /* correct hash value ? */
        if (rothash != patval) continue;
                /* correct histogram? */
        if (memcmp(rothist,pathist, sizeof pathist)) continue;
        fprintf(stdout, "Pos=%zu\n", len-patlen);

return 0;

