What is the fastest way to find the positions of the first common entry in two vectors in c++?

Question

I have two vectors u = {32, 25, 13, 42, 55, 33} and v = {18, 72, 53, 39, 13, 12, 28} for which I would like to determine the position of their first common entry, 13. For these example vectors, these positions are 3 and 5. What is the fastest way to find these positions? I have to do this operation many-many times.

What will the "first common entry" for `{10, 20, 30}` and `{15, 30, 20}`? — MikeCAT, Mar 23 '21 at 13:15
How big will the vectors (the number of elements) and each elements be? — MikeCAT, Mar 23 '21 at 13:16
My vectors form a tree, so I believe that such examples cannot occur. So we can say that the positions of both 30 and 20 are good answers. — ELTE Gaussian, Mar 23 '21 at 13:17
The vectors are not too long, they have 50-100 integer entries. — ELTE Gaussian, Mar 23 '21 at 13:19
@Jarod42 And then check what the position of the nearest common entry is in the second vector? — ELTE Gaussian, Mar 23 '21 at 13:25
@Jarod42 I think no, because the two vectors form two paths which cannot repeat verteces. — ELTE Gaussian, Mar 23 '21 at 13:30
You say *these positions are 3 and 5* - is that deliberate (using 1-based indexing)? — Adrian Mole, Mar 23 '21 at 14:02
@ELTEGaussian Honestly, why are you asking yourself this kind of question ? Any algorithm will be enough efficient for vector of large size (<100 000).. — Ben_LCDB, Mar 23 '21 at 14:10

Jarod42 · Accepted Answer · 2021-03-23T13:58:03.010

Assuming you don't have duplicate, you might use the following:

std::pair<std::size_t, std::size_t>
common_entry_indexes(const std::vector<int>& u, const std::vector<int>& v)
{
    const std::size_t min_size = std::min(u.size(), v.size());
    std::map<int, std::size_t> s; // might be `std::unordered_map`

    for (std::size_t i = 0; i != min_size; ++i)
    {
         if (auto [it, inserted] = s.insert({u[i], i}); !inserted) { return {i, it->second}; }
         if (auto [it, inserted] = s.insert({v[i], i}); !inserted) { return {it->second, i}; }
    }
    for (std::size_t i = min_size; i != u.size(); ++i)
    {
         if (auto [it, inserted] = s.insert({u[i], i}); !inserted) { return {i, it->second}; }
    }
    for (std::size_t i = min_size; i != v.size(); ++i)
    {
         if (auto [it, inserted] = s.insert({v[i], i}); !inserted) { return {it->second, i}; }
    }
    return {-1, -1}; // Not found
}

Demo

Duplicate might be handled with extra check.
Complexity should be O(max(N, M) * log(N + M)) with map (and O(max(N, M)) in average with std::unordered_map)

Silly question: Why bother with `NumberWithIndex` to do this with a `set`, rather than just using `std::map` (or `std::unordered_map` instead) to avoid the need for a custom `struct` with custom comparator? — ShadowRanger, Mar 23 '21 at 13:52
@ShadowRanger: Indeed :-) (I might say naming, with `index` instead of `second`). — Jarod42, Mar 23 '21 at 13:55

Adrian Mole · Answer 2 · 2021-03-23T14:29:22.477

1

If not the fastest, then surely the simplest (assuming, as in your question, you want 1-based indexing, so we can use {0, 0} as a "not found" signal and the size_t type for the indexes):

#include <utility>   // For std::pair
#include <algorithm> // For std::find
#include <vector>
#include <iostream>

std::pair<size_t, size_t> FirstCommon(std::vector<int>& a, std::vector<int>& b)
{
    for (size_t i = 0; i < a.size(); ++i) {
        auto f = std::find(b.begin(), b.end(), a.at(i));
        if (f != b.end()) return { i + 1, f - b.begin() + 1 }; // Found a[i] in b
    }
    return { 0, 0 };
}

int main()
{
    std::vector<int> u = { 32, 25, 13, 42, 55, 33 };
    std::vector<int> v = { 18, 72, 53, 39, 13, 12, 28 };
    auto match = FirstCommon(u, v);
    std::cout << "Position is {" << match.first << "," << match.second << "}." << std::endl;
    return 0;
}

edited Mar 23 '21 at 14:29

answered Mar 23 '21 at 14:14

Adrian Mole

30,672
69
32
52

1

Note: for vector sizes as you have indicated (50 - 100 elements), then the overheads involved in loop- and data-initializations in more complex (but *technically* faster) algorithms may outweigh any performance increase those algorithms would confer. – Adrian Mole Mar 23 '21 at 14:25
I think OP expects for `{1, 2, 3, 4}`, `{4, x, x, 3}` to find element 4 and not element 3. – Jarod42 Mar 23 '21 at 14:28
@Jarod42 In the third comment on the question, seems that either will be acceptable (but unlikely to occur). – Adrian Mole Mar 23 '21 at 14:31
1

MikeCAT's example is really a edge case: index 2/3 vs 3/2, just order change. 4/1 versus 2/3 is more "interesting". – Jarod42 Mar 23 '21 at 14:37
@Jarod42 Agreed. I guess my function could be called twice (swapping argument order) then the calling module can decide which pair is best. – Adrian Mole Mar 23 '21 at 14:38

Ronald Souza · Answer 3 · 2021-05-21T17:59:53.333

In addition to "small-sized vectors" and "no duplicates", if it is the case that your keys are always within a small range (e.g. "no key will ever be greater than 10.000"), then you can leverage this extra info to achieve an O(max(N,M)) solution ( @Jarod's is O(max(N,M) * log(N+M)) and @Adrian's is O(N*M)).

First, establish a large enough prime (i.e. a prime larger than the largest key) and then start to build a hashmap up to the point where the first collision happens.

std::pair<size_t, size_t> findFirstMatch(const std::vector<int>& u, const std::vector<int>& v, const int& prime) {
    std::vector<size_t> hashmap(prime, INT_MAX);    // ---> 'INT_MAX' returned if no common entries found.
    size_t smallerSz = std::min(u.size(), v.size());
    std::pair<size_t, size_t> solution = { INT_MAX, INT_MAX };
    bool noCollision = true;

    // Alternate checking, to ensure minimal testing:
    for (size_t i = 0; i < smallerSz; ++i) {
        //One step for vetor u:
        size_t& idx = hashmap[u[i] % prime];
        if (idx < INT_MAX) {    // ---> Collision!
            solution = { i, idx };
            noCollision = false;
            break;
        }
        idx = i;

        //One step for vector v:
        idx = hashmap[v[i] % prime];
        if (idx < INT_MAX) {    // ---> Collision!
            solution = { idx, i };
            noCollision = false;
            break;
        }
        idx = i;
    }

    //If no collisions so far, then the remainder of the larger vector must still be checked:
    if(noCollision){
        bool uLarger = u.size() > v.size();
        const std::vector<int>& largerVec = (uLarger) ? u : v;
        for (size_t i = smallerSz; i < largerVec.size(); ++i) {
            size_t& idx = hashmap[largerVec[i] % prime];
            if (idx < INT_MAX) {    // ---> Collision!
                if (uLarger) solution = { i, idx };
                else         solution = { idx, i };
                break;
            }
            idx = i; 
        }
    }
    return solution;
}

USAGE

int main()
{
    std::vector<int> u = { 32, 25, 13, 42, 55, 33 }, v = { 18, 72, 53, 39, 13, 12, 28 };
    const int prime = 211; // ---> Some suitable prime...

    std::pair<size_t, size_t> S = findFirstMatch(u, v, prime);
    std::cout << "Solution = {" << S.first << "," << S.second << "}." << std::endl;
    return 0;
}

It outputs "{2, 4}" instead of "{3, 5}" because the first index is 0. Feel free to modify it.

What is the fastest way to find the positions of the first common entry in two vectors in c++?

3 Answers3