How to compare 2 lists and return a list of the greatest subset?

Question

I want to compare two ArrayLists and return the greatest subset of similarities in Java. So I want to compare parts of the list not just single values.

Example:

list 1       list 2
F            A
A            B
B            C
C            F
D            D
Z            Z
A
F
C

greatest subset:

Arraylist: [A,B,C]

The second greatest subset should be:

ArrayList: [D,Z]

How can I do this efficiently?(without using more than 2 for loops)

retainAll() doesn't work, retainAll() returns the equal values, not the largest subset.

Edit I want as output, List before greatest subset, greatest subset, list after greatest subset. By the example the output should be:

[[F],[null]],[A,B,C],[[D,Z,A,F,C],[F,D,Z]]

@Ramanlfc retainAll() returns all equal values, not the largest subset — Cees Mandjes, Mar 31 '16 at 07:09
Possible duplicate of [this.](http://stackoverflow.com/questions/8954744/how-to-find-the-subset-with-the-greatest-number-of-items-in-common) However, that link provides no answer. — robotlos, Mar 31 '16 at 07:18
@robolos , nope is not duplicated; By his example my output should be {c,d} because {c,d} matches across all arrays — Cees Mandjes, Mar 31 '16 at 07:25
Does it have to handle cases where there could be duplicate objects in the same list? — Maljam, Mar 31 '16 at 07:49

justAbit · Answer 1 · 2016-03-31T08:39:27.500

Assuming both Lists have String elements, you can use this:

    List<List<String>> beforeList = new ArrayList<>();
    List<List<String>> afterList = new ArrayList<>();
    List<String> commonSubsetList = new ArrayList<>();
    for (int i = 0; i < list1.size(); i++) {
        int k = i;
        List<String> tmpList = new ArrayList<>();
        List<String> tmpBeforeList1 = list1.subList(0, i); // container for before elements from list1
        List<String> tmpAfterList1 = new ArrayList<>(); // container for after elements from list1
        List<String> tmpBeforeList2 = new ArrayList<>(); // container for before elements from list2
        List<String> tmpAfterList2 = new ArrayList<>(); // container for after elements from list2

        for (int j = 0; j < list2.size();) {
            if(k < list1.size() && list1.get(k).equals(list2.get(j))) {
                // when common element is found, increment both counters and add element to tmp list
                tmpList.add(list2.get(j));
                k++;
                j++;
            } else {

                if(tmpList.size() > 0) {
                    tmpAfterList1 = list1.subList(k, list1.size());
                    tmpAfterList2 = list2.subList(j, list2.size());
                    break;
                } else {
                    tmpBeforeList2.add(list2.get(j));
                }

                j++;
            }
        }

        if(commonSubsetList.size() <= tmpList.size()) {
            // reset beforeList and afterList before adding new list
            beforeList.clear();
            afterList.clear();

            // add new lists
            beforeList.add(tmpBeforeList1);
            beforeList.add(tmpBeforeList2);
            afterList.add(tmpAfterList1);
            afterList.add(tmpAfterList2);
            commonSubsetList = new ArrayList<>(tmpList);
        }
    }

    System.out.println(beforeList + ", " + commonSubsetList + ", " + afterList);

This includes both before and after lists as well. Hope this is what you want.

I need also as output lists before subset and lists after subset — Cees Mandjes, Mar 31 '16 at 08:12

Sendhilkumar Alalasundaram · Answer 2 · 2016-03-31T08:23:21.917

1

The maximum size of the common list will be the size of the smaller list. You can subsequently check equality of sublists of size lesser than or equal to this maximum value. Check the following code for reference:

public static <T> List<List<T>> getLargestCommonListAndRest(List<T> list1, List<T> list2) {
    int beginSize = list1.size() < list2.size() ? list1.size() : list2.size();
    while (beginSize > 0) {
        for (int i = 0; i <= list1.size() - beginSize; i++) {
            List<T> subList1 = list1.subList(i, i + beginSize - 1);
            for (int i1 = 0; i1 <= list2.size() - beginSize; i1++) {
                List<T> subList2 = list2.subList(i1, i1 + beginSize - 1);
                if (subList1.equals(subList2))
                    return Arrays.asList(list1.subList(0, Integer.max(0, i)), subList1,
                            list1.subList(i + beginSize - 1, list1.size()));
            }
        }
        beginSize--;
    }
    return new ArrayList();
}

edited Mar 31 '16 at 08:23

answered Mar 31 '16 at 07:25

Sendhilkumar Alalasundaram

1,411
10
23

Not sure if OP meant *all* loops when they stated `for` loops. If that's the case this technically wouldn't satisfy OP's post. – robotlos Mar 31 '16 at 07:28
I did not understand your doubt! can you rephrase it? – Sendhilkumar Alalasundaram Mar 31 '16 at 07:30
1

While you technically didn't use 2 `for` loops, you did use 3 loops. The validity of your answer would depend on whether OP meant specifically just `for` loops and any loop at all. – robotlos Mar 31 '16 at 07:31
I want(if posible) 2 for loops. I have edited the post by the way. – Cees Mandjes Mar 31 '16 at 07:37

score 1 · Answer 3 · edited Jun 20 '20 at 09:12

1

Assuming your list are Strings typed Use the

list#retainAll()

to get the coincidence between those list

Example:

List<String> listA...
List<String> listB...
List<String> listC = new ArrayList<String>();    // new list to keep the originals unmodified.
listC.addAll(listA);   // add all the list a to c
listC.retainAll(listB); // keep the coincidences

edited Jun 20 '20 at 09:12

Community

1
1

answered Mar 31 '16 at 07:27

ΦXocę 웃 Пepeúpa ツ

43,054
16
58
83

Thats not the answer – Cees Mandjes Mar 31 '16 at 07:55

Braj · Answer 4 · 2016-03-31T08:44:30.507

1

It is very simple. You need just two loops to find out the greatest common subset between two list.

Steps

loop over first list
loop over second list inside first loop
compare each value of second list with increment index k of first list
increment the index k when there is match
else reset the index k back to its starting index i of first list

The complexity of below sample program is O(n^2). You can further reduce the complexity.

Sample code:

List<Character> list1 = Arrays.asList(new Character[] { 'F', 'A', 'B', 'C', 'D', 'Z', 'A', 'F', 'C' });
List<Character> list2 = Arrays.asList(new Character[] { 'A', 'B', 'C', 'F', 'D', 'Z' });
List<List<Character>> sublists = new ArrayList<>();

for (int i = 0; i < list1.size(); i++)
{
    int k = i;
    for (int j = 0; j < list2.size() && k < list1.size(); j++)
    {
        if (list1.get(k) == list2.get(j))
        {
            k++;
        }
        else if (k > i)
        {
            sublists.add(list1.subList(i, k));
            k = i;
        }
    }

    if (k > i)
    {
        sublists.add(list1.subList(i, k));
    }
}

System.out.println(sublists);

edited Mar 31 '16 at 08:44

answered Mar 31 '16 at 07:39

Braj

44,339
5
51
69

I have edited my post, i want to have this output: [[F],[null]],[A,B,C],[[D,Z],[F,D,Z]] – Cees Mandjes Mar 31 '16 at 07:44
I have updated the post to keep the track of `fromIndex` – Braj Mar 31 '16 at 07:45
as per your question title, you want to find just greatest common sub-list. you can modify the code if you need all the matches. – Braj Mar 31 '16 at 08:12
@CeesMandjes I have updated the code to full-fill your requirement. – Braj Mar 31 '16 at 08:39

magooup · Answer 5 · 2016-03-31T08:47:22.240

See this:

public static void main(String[] args) {
    ArrayList<String> list1 = new ArrayList<String>(Arrays.asList(new String[]{"F", "A", "B", "C", "D", "Z", "A", "F", "C"}));
    ArrayList<String> list2 = new ArrayList<String>(Arrays.asList(new String[]{"A", "B", "C", "F", "D", "Z"}));

    ArrayList<String> result = null;
    if (Arrays.equals(list1.toArray(), list2.toArray())) {
        result = list1;
    } else {
        for (int i = 0; i < list1.size(); i++) {
            String word = list1.get(i);
            //int index = list2.indexOf(word); // if list2 has repeat words, this can not give a exact result.
            for (int index : indicesOf(list2, word)) { // support repeat words in list2, but need a small loop.
                if (index >= 0) {
                    int ori = i;
                    ArrayList<String> temp = new ArrayList<String>();
                    temp.add(word);
                    //while (true) {
                    //    int pos1 = (i + 1) % list1.size();
                    //    int pos2 = (index + 1) % list2.size();
                    //    if (list1.get(pos1).equals(list2.get(pos2))) {
                    while (index < list2.size() - 1) {
                        if (i + 1 < list1.size() && list1.get(i + 1).equals(list2.get(index + 1))) {
                            temp.add(list1.get(i + 1));
                            i++;
                            index++;
                        } else {
                            break;
                        }
                    }
                    System.out.println(String.format("Found a subset: %s", temp));
                    if (null == result || temp.size() > result.size()) {
                        result = temp;
                    }
                }
            }
        }
    }
    if (null != result) {
        System.out.println("The greatest subset is: " + result);
    } else {
        System.out.println("No subset found.");
    }
}

static Integer[] indicesOf(ArrayList<String> list, String obj) {
    List<Integer> indices = new ArrayList<Integer>();
    for (int i = 0; i < list.size(); i++) {
        if (obj.equals(list.get(i))) {
            indices.add(i);
        }
    }
    return indices.toArray(new Integer[]{});
}

Output is:

Found a subset: [F]
Found a subset: [A, B, C]
Found a subset: [D, Z]
Found a subset: [A]
Found a subset: [F]
Found a subset: [C]
The greatest subset is: [A, B, C]

-----------------edit----------------------

You said donot want [D,Z,A], because i treated the list as a tail-head loop. Without this would be more easy, i had changed the code.

And, i fixed my code considering about that your list allow repeat word.

If the input is {"F", "A", "B", "C", "D", "Z", "A" , "F", "C"} , {"A", "B", "C", "F", "D", "Z"} The output is: Found a subset: [F] Found a subset: [A, B, C] Found a subset: [D, Z, A] Found a subset: [F] Found a subset: [C, F] The greatest subset is: [A, B, C]. [D,Z,A] is not a subset. — Cees Mandjes, Mar 31 '16 at 08:18
@CeesMandjes About [D,Z,A] cause i treat the list is a tail-head connect construct. Without this would be more easy, i have edited my code. And — magooup, Mar 31 '16 at 08:53
@CeesMandjes if you make two list: {"F", "A", "B", "C", "D", "Z", "A", "F", "C", "E"} and {"A", "B", "C", "A", "F", "C", "E"}, check the answer you accepted, its wrong.Then see which answer is correct! — magooup, Mar 31 '16 at 09:02
@Braj what is the complexity of code? Do you really understand the code yet — magooup, Mar 31 '16 at 09:08
OP has clearly mentioned in the question "How can I do this efficiently?(without using more than 2 for loops)". I think you are using more than 2 for loop? Isn't it? — Braj, Mar 31 '16 at 11:40
@Braj So you think 2 loop must be more efficiently than 3 loop? — magooup, Mar 31 '16 at 12:11
The outer most loop will run for `list1.size` times [N] then immediate inner loop `indicesOf` will run `list2.size` times [M] then there is one more `while` loop that will run `list2.size` times [M] is worst case. Total `N * M * M`. The impact is not visible for small set of list but if both the list contains millions of records with no match then it matter. You can run a performance test for your code by simply differentiating `endTime- startTime`. — Braj, Mar 31 '16 at 12:16

Maljam · Answer 6 · 2016-03-31T08:46:41.907

Here's a nice solution with complexity of O(n) (correct me if I'm wrong) exploiting HashMap (I'm using String for readability and simplicity sake, the same logic can be applied to List):

public static String greatestSubset(String list1, String list2) {
    int shift = -1, maxCount = -1, index1 = -1, index2 = -1;
    HashMap<Integer, Integer> shiftMap = new HashMap<Integer, Integer>();
    HashMap<Integer, Boolean> aliveShiftMap = new HashMap<Integer, Boolean>();

    for(int i = 0 ; i < list1.length() ; i++) {
        char c = list1.charAt(i);
        int index;

        //calculate shifts, if exists increments, otherwise add with count=1 
        for( shift = i-(index=list2.indexOf(c)) ; index != -1 ; shift = i-(index=list2.indexOf(c, index+1)) ) {
            if(shiftMap.containsKey(shift)) {
                shiftMap.replace(shift, shiftMap.get(shift)+1);
                aliveShiftMap.replace(shift, true);
            } else {
                shiftMap.put(shift, 1);
                aliveShiftMap.put(shift, true);
            }
        }

        for (Entry<Integer, Boolean> entry : aliveShiftMap.entrySet()) {
            if(!entry.getValue()) { //if shift not incremented, terminate
                if(shiftMap.get(entry.getKey()) > maxCount) {
                    maxCount = shiftMap.get(entry.getKey());
                    index1 = i-maxCount;
                    index2 = i;
                }

                shiftMap.remove(entry.getKey());
                aliveShiftMap.put(entry.getKey(), true);
            } else { // else keep for next iteration
                aliveShiftMap.put(entry.getKey(), false);
            }
        }

        //remove all non-incrementedn shifts
        aliveShiftMap.values().removeAll(Collections.singleton(true));
    }

    return list1.substring(index1, index2);
}

Note that the HashMap complication is only necessary to account for duplicates of objects in the same list, otherwise you only need a few primitive int variables.

Here's a summary of the algorithm:

Increment though the chars of list1, and calculate what is the shift required to match the same char on list2.
If that shift is already present in shiftMap, increment, otherwise add it with a count of 1
If a given shift was not incremented in the current iteration, then terminate it, and keep it as the maxCount (record the index1 and index2) if it exceeds the current max

S.D. · Accepted Answer · 2016-04-04T08:30:30.833

You'll have to consider all possible pairs of item across lists. When a pair matches, then try to construct a subset from those indices on-wards. This subset replaces current candidate if its larger than it.

One optimization is to exit when there is a subset larger than half of the smaller list's length.

You can modify below example to collect all subsets, with their index information as well.

Example:

http://ideone.com/DehDwk

import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;

public class Main {
    /**
     * Holds information about a sub set
     *
     * @param <T> type of subset items
     */
    private static class SubSet<T> {
        List<T> items; // items of subset
        int startIndex1; // start index in list 1
        int endIndex1; // end index in list 1
        int startIndex2; // start index in list 2
        int endIndex2; // end index in list 2
    }

    /**
     * Run main example.
     *
     * @param args arguments - none honored.
     * @throws java.lang.Exception - in case of any error.
     */
    public static void main(String[] args) throws java.lang.Exception {
        // define 2 lists
        List<Integer> list1 = Arrays.asList(1, 2, 3, 4, 5, 6, 3, 2, 5, 6, 7, 3, 8);
        List<Integer> list2 = Arrays.asList(2, 8, 7, 2, 3, 4, 5, 3, 2, 5, 1, 5);

        // print the lists
        System.out.println("First list: " + Arrays.toString(list1.toArray()));
        System.out.println("Second list: " + Arrays.toString(list2.toArray()));

        // get largest sub set
        SubSet<Integer> largest = getLargestSubSet(list1, list2);


        if (largest == null) {
            // nothing found
            System.out.println("No subset found.");
        } else {
            // print info about subset

            System.out.println("Largest subset: " + Arrays.toString(largest.items.toArray()));

            if (largest.startIndex1 > 0) {
                List<Integer> beforeList1 = list1.subList(0, largest.startIndex1);
                System.out.println("Items before largest subset in first list: "
                        + Arrays.toString(beforeList1.toArray()));
            }

            if (largest.endIndex1 < list1.size() - 1) {
                List<Integer> afterList1 = list1.subList(largest.endIndex1 + 1, list1.size());
                System.out.println("Items after largest subset in first list: "
                        + Arrays.toString(afterList1.toArray()));
            }

            if (largest.startIndex2 > 0) {
                List<Integer> beforeList2 = list2.subList(0, largest.startIndex2);
                System.out.println("Items before largest subset in second list: "
                        + Arrays.toString(beforeList2.toArray()));
            }

            if (largest.endIndex2 < list2.size() - 1) {
                List<Integer> afterList2 = list2.subList(largest.endIndex2 + 1, list2.size());
                System.out.println("Items after largest subset in second list: "
                        + Arrays.toString(afterList2.toArray()));
            }

        }


    }

    /**
     * Equality check for items.
     *
     * @param obj1 first item.
     * @param obj2 second item.
     * @param <T>  item type.
     * @return true if equal,false if not.
     */
    private static <T> boolean areEqual(T obj1, T obj2) {
        return obj1 == obj2; // naive comparison
    }

    /**
     * Get largest subset (first occurrence) for given lists.
     *
     * @param list1 first list.
     * @param list2 second list.
     * @param <T>   list item type.
     * @return Largest sub sequence list, or empty list.
     */
    private static <T> SubSet<T> getLargestSubSet(List<T> list1, List<T> list2) {
        SubSet<T> output = null;

        for (int i = 0; i < list1.size(); i++) {
            for (int j = 0; j < list2.size(); j++) {

                // optimisation : exit early
                if (output != null && output.items.size() > Math.min(list1.size(), list2.size())) {
                    return output;
                }

                if (areEqual(list1.get(i), list2.get(j))) {
                    // inspect sub sequence from this (i,j) onwards
                    output = inspectSubSet(list1, list2, i, j, output);
                }
            }
        }

        return output;
    }

    /**
     * For given starting indices, inspect if there is a larger subset, than given one.
     *
     * @param list1     first list.
     * @param list2     second list.
     * @param index1    first index.
     * @param index2    second index.
     * @param oldSubSet existing largest subset, for comparison.
     * @param <T>       list item type.
     * @return larger subset, if found, else existing one is returned as is.
     */
    private static <T> SubSet<T> inspectSubSet(List<T> list1, List<T> list2,
                                               int index1, int index2, SubSet<T> oldSubSet) {
        // new subset candidate
        SubSet<T> newSubSet = new SubSet<T>();
        newSubSet.items = new ArrayList<T>();
        newSubSet.startIndex1 = index1;
        newSubSet.endIndex1 = index1;
        newSubSet.startIndex2 = index2;
        newSubSet.endIndex2 = index2;

        // keep building subset as subsequent items keep matching
        do {
            newSubSet.items.add(list1.get(index1));
            newSubSet.endIndex1 = index1;
            newSubSet.endIndex2 = index2;
            index1++;
            index2++;
        } while (index1 < list1.size() && index2 < list2.size()
                && areEqual(list1.get(index1), list2.get(index2)));

        // return first, larger or same.
        if (oldSubSet == null) {
            return newSubSet;
        } else if (newSubSet.items.size() > oldSubSet.items.size()) {
            return newSubSet;
        } else {
            return oldSubSet;
        }
    }

}

Output:

First list: [1, 2, 3, 4, 5, 6, 3, 2, 5, 6, 7, 3, 8]
Second list: [2, 8, 7, 2, 3, 4, 5, 3, 2, 5, 1, 5]
Largest subset: [2, 3, 4, 5]
Items before largest subset in first list: [1]
Items after largest subset in first list: [6, 3, 2, 5, 6, 7, 3, 8]
Items before largest subset in second list: [2, 8, 7]
Items after largest subset in second list: [3, 2, 5, 1, 5]

@CeesMandjes see update. Small changes allow you to collect indices. Similarly, you can collect all subsets, (in case there are many of same size) and later sort them. — S.D., Apr 04 '16 at 08:25

How to compare 2 lists and return a list of the greatest subset?

7 Answers7

Example: