23

A is an array of integers.

All the values are between 0 to A.Length-1

it means 0 <= A[i] <= A.Length-1

I am supposed to find repeating elements; and if there are several repeating elements, then choose the one that has lower index for the repeated item.

for example:

a = [3, 4, 2, 5, 2, 3]

then

result = 2

This was an interview question. I used another array to store items and check when it is repeating. Then it gave me time-out for some test cases. The interviewer advised to only loop over the array only once, and do not create any additional data structure.

Aryan Firouzian
  • 1,649
  • 5
  • 24
  • 32
Kiana Montazeri
  • 301
  • 1
  • 9
  • 5
    I'd suggest using a `HashSet` (but that is using an extra variable). When `Add` returns `false` you got your answer. Whether it is efficient enough - not sure without seeing the sample inputs and the benchmarking code. – mjwills Aug 29 '18 at 13:04
  • I also asked what if I create a hash table, and he declined that. I suppose it has sth to do with the value range. – Kiana Montazeri Aug 29 '18 at 13:07
  • 2
    You can use the input itself as a hashset. Every time you see a value, add A.Length to the item that correspond to that index. No need for another data structure. If you find an item that is already > A.length.. you have a repetition. Track them – Juan Leni Aug 29 '18 at 13:08
  • If their concern is only memory consumption and no duplication of the array, I think that the easiest and quickest solution is to iterate over each element in the array to see if you have seen it before. The should give you an `O(n^2)` algorithm which consumes the size of the array + the size of an int. – npinti Aug 29 '18 at 13:11
  • 3
    Have a look at this: https://www.geeksforgeeks.org/find-the-two-repeating-elements-in-a-given-array/ – Fabjan Aug 29 '18 at 13:29
  • 2
    Try following : int[] a = { 3, 4, 2, 5, 2, 3 }; int[] b = a.Select((x, i) => new { number = x, index = i }).GroupBy(x => x.number).Select(x => x.Min(y => y.index)).ToArray(); – jdweng Aug 29 '18 at 13:38
  • Are you looking for the index or the value (`2` is both of them in this example). – Olivier Jacot-Descombes Aug 29 '18 at 20:32

4 Answers4

22

No need for another data structure. You can use the input itself as a hashset.

Every time you see a value, add A.Length to the item that corresponds to that index. As values might have been already incremented, you should look at the value as A[i] mod A.length.

If you find an item that is already >= A.length.. you have a repetition. (Remember that the problem states that all items are in the interval [0, A.Length-1])

Track the lowest index that has been found as repeated.

This results in O(N) complexity (single pass) and no use of an additional data structure, i.e. Size O(1)

The key concept behind this approach is that hashsets work this way. Conceptually, this is indirectly related to the pigeonhole principle. https://en.wikipedia.org/wiki/Pigeonhole_principle

Note: During the interview it would be important to ask implementation specific questions, discuss limitations, assumptions, etc.: - What is the data type of the items in the list? - if values are in the range [0..A.length-1], are all items unsigned or can I use negative numbers if I wanted? - etc.

During the interview, I would not claim this is a perfect answer, instead, I would discuss the assumptions with the interviewer and adjust accordingly. For instance, another answer suggested using negative numbers but it is possible that the data type of items is an unsigned type, etc.

The interview is supposed to trigger a technical discussion to explore both your knowledge and creativity.

Juan Leni
  • 5,371
  • 2
  • 43
  • 69
6

Notice: Solution fails if there is element with value of zero. Olivier's solution can handle such cases.

Making element with index of A[i] negative. It only go through the loop once.

for(int i=0; i<A.Length; i++)
    {
        if (A[Math.Abs(A[i])] < 0){ return Math.Abs(A[i]);}
        A[Math.Abs(A[i])] = -A[Math.Abs(A[i])];
    }
Aryan Firouzian
  • 1,649
  • 5
  • 24
  • 32
3

I would like to refine @AryanFirouzian's solution and to return all duplicates by using yield return. Also, using a temp variable simplifies the code.

public static IEnumerable<int> FindDuplicates(int[] A)
{
    for (int i = 0; i < A.Length; i++) {
        int absAi = Math.Abs(A[i]);
        if (A[absAi] < 0) {
            yield return absAi;
        } else {
            A[absAi] *= -1;
        }
    }
}

However, this solution does not return the element with the lower index and if there are more than 2 identical copies then it will return the same value more than once. Another issue is that 0 cannot be made negative.

A better solution eliminates repeated results, but still returns the second index and has a problem with 0 values. It also returns the index itself to demonstrate the wrong index problem

public static IEnumerable<(int index, int value)> FindDuplicates(int[] A)
{
    for (int i = 0; i < A.Length; i++) {
        int x = A[i] % A.Length;
        if (A[x] / A.Length == 1) {
            yield return (i, x);
        }
        A[x] += A.Length;
    }
}

Tested with

var A = new int[] { 3, 4, 2, 5, 2, 3, 3 };
foreach (var item in FindDuplicates(A)) {
    Console.WriteLine($"[{item.index}] = {item.value}");
}

It returns

[4] = 2
[5] = 3

My final solution which eliminates all these problems (at least I hope so): It encodes the first index itself by adding (i + 1) * A.Length to the first occurrence of a value. (i + 1) because i can be 0. The index can then be decoded with the reverse operation (A[x] / A.Length) - 1.

Then, because we want to return a result only on the first repeating value, we set the value to a negative value to exclude it from further processing. Subsequently, the original value can be retrieved with Math.Abs(A[i]) % A.Length.

public static IEnumerable<(int index, int value)> FindDuplicates(int[] A)
{
    for (int i = 0; i < A.Length; i++) {
        int x = Math.Abs(A[i]) % A.Length;
        if (A[x] >= 0) {
            if (A[x] < A.Length) { // First occurrence.
                A[x] += (i + 1) * A.Length; // Encode the first index.
            } else { // Second occurrence.
                int firstIndex = (A[x] / A.Length) - 1; // Decode the first index.
                yield return (firstIndex, x);

                // Mark the value as handeled by making it negative;
                A[x] *= -1; // A[x] is always >= A.Length, so no zero problem.
            }
        }
    }
}

Returns the expected result

[2] = 2
[0] = 3

Our elements are ints that don't have an identity. I.e. we can return one of the duplicates at any index since two equal ints cannot be distinguished. In case the elements have an identity (they could be reference types with equal values but differing references or have additional fields not involved in equality testing), we would have to return the first occurrence with

yield return (firstIndex, Math.Abs(A[firstIndex]) % A.Length);

to satisfy all the requirements.

Olivier Jacot-Descombes
  • 86,431
  • 10
  • 121
  • 160
2

For who wants an implementation of the problem, I suggest two variant (in c# as in the tags), one using the accepted answer and one using the aproach of another answer, using the opposite of the elements. However the last solution has problem with the value zero and require some trick.

First Solution

using System;
public class Program
{
    public static void Main()
    {
        int[] a = {3, 4, 0, 5, 2, 3};
        int N = 6;
        int min_index = 0; 
        bool found = false;
        int index = -1;
        int i = 0;
        while(i < N && !found)
        {

            if(a[i] >= N) 
                index = a[i] % N;
            else
                index = a[i];

            if(a[index] >= N) //its a duplicated elements 
            {
                min_index = i;
                found = true;
            }else
            {
                a[index] += N;
            }
            i++;

        }

        Console.WriteLine("Result = " + a[min_index] % N);
    }
}

Second solution

    using System;
public class Program
{
    public static void Main()
    {
        int[] a = {3, 4, 2, 5, 2, 3};
        int N = 6;
        int min_index = N-1; 
        bool found = false;
        int index = -1;
        int i = 0;
        while(i < N && !found)
        {
            if(a[i] == -N+1) //it was 0
                index = 0;
            else
                index = Math.Abs(a[i]);

            if(a[index] < 0 || a[index] == -N+1) //its a duplicated elements 
            {
                min_index = i;
                found = true;
            }else
            {
                if(a[index] > 0)
                {
                    a[index] = -a[index];
                }else
                {
                    a[index] += -N+1;
                }
            }
            i++;
        }

        if(a[min_index] == -N+1)
            a[min_index] = 0;

        Console.WriteLine("Result = " + Math.Abs(a[min_index]));
    }
}
Andrea Bellizzi
  • 487
  • 4
  • 11