9

I want to compare two ArrayLists and return the greatest subset of similarities in Java. So I want to compare parts of the list not just single values.

Example:

list 1       list 2
F            A
A            B
B            C
C            F
D            D
Z            Z
A
F
C

greatest subset:

Arraylist: [A,B,C]

The second greatest subset should be:

ArrayList: [D,Z]

How can I do this efficiently?(without using more than 2 for loops)

retainAll() doesn't work, retainAll() returns the equal values, not the largest subset.

Edit I want as output, List before greatest subset, greatest subset, list after greatest subset. By the example the output should be:

[[F],[null]],[A,B,C],[[D,Z,A,F,C],[F,D,Z]]
Bergi
  • 513,640
  • 108
  • 821
  • 1,164
Cees Mandjes
  • 195
  • 12

7 Answers7

5

Assuming both Lists have String elements, you can use this:

    List<List<String>> beforeList = new ArrayList<>();
    List<List<String>> afterList = new ArrayList<>();
    List<String> commonSubsetList = new ArrayList<>();
    for (int i = 0; i < list1.size(); i++) {
        int k = i;
        List<String> tmpList = new ArrayList<>();
        List<String> tmpBeforeList1 = list1.subList(0, i); // container for before elements from list1
        List<String> tmpAfterList1 = new ArrayList<>(); // container for after elements from list1
        List<String> tmpBeforeList2 = new ArrayList<>(); // container for before elements from list2
        List<String> tmpAfterList2 = new ArrayList<>(); // container for after elements from list2

        for (int j = 0; j < list2.size();) {
            if(k < list1.size() && list1.get(k).equals(list2.get(j))) {
                // when common element is found, increment both counters and add element to tmp list
                tmpList.add(list2.get(j));
                k++;
                j++;
            } else {

                if(tmpList.size() > 0) {
                    tmpAfterList1 = list1.subList(k, list1.size());
                    tmpAfterList2 = list2.subList(j, list2.size());
                    break;
                } else {
                    tmpBeforeList2.add(list2.get(j));
                }

                j++;
            }
        }

        if(commonSubsetList.size() <= tmpList.size()) {
            // reset beforeList and afterList before adding new list
            beforeList.clear();
            afterList.clear();

            // add new lists
            beforeList.add(tmpBeforeList1);
            beforeList.add(tmpBeforeList2);
            afterList.add(tmpAfterList1);
            afterList.add(tmpAfterList2);
            commonSubsetList = new ArrayList<>(tmpList);
        }
    }

    System.out.println(beforeList + ", " + commonSubsetList + ", " + afterList);

This includes both before and after lists as well. Hope this is what you want.

justAbit
  • 4,066
  • 1
  • 17
  • 32
1

The maximum size of the common list will be the size of the smaller list. You can subsequently check equality of sublists of size lesser than or equal to this maximum value. Check the following code for reference:

public static <T> List<List<T>> getLargestCommonListAndRest(List<T> list1, List<T> list2) {
    int beginSize = list1.size() < list2.size() ? list1.size() : list2.size();
    while (beginSize > 0) {
        for (int i = 0; i <= list1.size() - beginSize; i++) {
            List<T> subList1 = list1.subList(i, i + beginSize - 1);
            for (int i1 = 0; i1 <= list2.size() - beginSize; i1++) {
                List<T> subList2 = list2.subList(i1, i1 + beginSize - 1);
                if (subList1.equals(subList2))
                    return Arrays.asList(list1.subList(0, Integer.max(0, i)), subList1,
                            list1.subList(i + beginSize - 1, list1.size()));
            }
        }
        beginSize--;
    }
    return new ArrayList();
}
1

Assuming your list are Strings typed Use the

list#retainAll()

to get the coincidence between those list

Example:

List<String> listA...
List<String> listB...
List<String> listC = new ArrayList<String>();    // new list to keep the originals unmodified.
listC.addAll(listA);   // add all the list a to c
listC.retainAll(listB); // keep the coincidences
Community
  • 1
  • 1
ΦXocę 웃 Пepeúpa ツ
  • 43,054
  • 16
  • 58
  • 83
1

It is very simple. You need just two loops to find out the greatest common subset between two list.

Steps

  • loop over first list
  • loop over second list inside first loop
  • compare each value of second list with increment index k of first list
  • increment the index k when there is match
  • else reset the index k back to its starting index i of first list

The complexity of below sample program is O(n^2). You can further reduce the complexity.

Sample code:

List<Character> list1 = Arrays.asList(new Character[] { 'F', 'A', 'B', 'C', 'D', 'Z', 'A', 'F', 'C' });
List<Character> list2 = Arrays.asList(new Character[] { 'A', 'B', 'C', 'F', 'D', 'Z' });
List<List<Character>> sublists = new ArrayList<>();

for (int i = 0; i < list1.size(); i++)
{
    int k = i;
    for (int j = 0; j < list2.size() && k < list1.size(); j++)
    {
        if (list1.get(k) == list2.get(j))
        {
            k++;
        }
        else if (k > i)
        {
            sublists.add(list1.subList(i, k));
            k = i;
        }
    }

    if (k > i)
    {
        sublists.add(list1.subList(i, k));
    }
}

System.out.println(sublists);
Braj
  • 44,339
  • 5
  • 51
  • 69
1

See this:

public static void main(String[] args) {
    ArrayList<String> list1 = new ArrayList<String>(Arrays.asList(new String[]{"F", "A", "B", "C", "D", "Z", "A", "F", "C"}));
    ArrayList<String> list2 = new ArrayList<String>(Arrays.asList(new String[]{"A", "B", "C", "F", "D", "Z"}));

    ArrayList<String> result = null;
    if (Arrays.equals(list1.toArray(), list2.toArray())) {
        result = list1;
    } else {
        for (int i = 0; i < list1.size(); i++) {
            String word = list1.get(i);
            //int index = list2.indexOf(word); // if list2 has repeat words, this can not give a exact result.
            for (int index : indicesOf(list2, word)) { // support repeat words in list2, but need a small loop.
                if (index >= 0) {
                    int ori = i;
                    ArrayList<String> temp = new ArrayList<String>();
                    temp.add(word);
                    //while (true) {
                    //    int pos1 = (i + 1) % list1.size();
                    //    int pos2 = (index + 1) % list2.size();
                    //    if (list1.get(pos1).equals(list2.get(pos2))) {
                    while (index < list2.size() - 1) {
                        if (i + 1 < list1.size() && list1.get(i + 1).equals(list2.get(index + 1))) {
                            temp.add(list1.get(i + 1));
                            i++;
                            index++;
                        } else {
                            break;
                        }
                    }
                    System.out.println(String.format("Found a subset: %s", temp));
                    if (null == result || temp.size() > result.size()) {
                        result = temp;
                    }
                }
            }
        }
    }
    if (null != result) {
        System.out.println("The greatest subset is: " + result);
    } else {
        System.out.println("No subset found.");
    }
}

static Integer[] indicesOf(ArrayList<String> list, String obj) {
    List<Integer> indices = new ArrayList<Integer>();
    for (int i = 0; i < list.size(); i++) {
        if (obj.equals(list.get(i))) {
            indices.add(i);
        }
    }
    return indices.toArray(new Integer[]{});
}

Output is:

Found a subset: [F]
Found a subset: [A, B, C]
Found a subset: [D, Z]
Found a subset: [A]
Found a subset: [F]
Found a subset: [C]
The greatest subset is: [A, B, C]

-----------------edit----------------------

You said donot want [D,Z,A], because i treated the list as a tail-head loop. Without this would be more easy, i had changed the code.

And, i fixed my code considering about that your list allow repeat word.

magooup
  • 134
  • 6
  • If the input is {"F", "A", "B", "C", "D", "Z", "A" , "F", "C"} , {"A", "B", "C", "F", "D", "Z"} The output is: Found a subset: [F] Found a subset: [A, B, C] Found a subset: [D, Z, A] Found a subset: [F] Found a subset: [C, F] The greatest subset is: [A, B, C]. [D,Z,A] is not a subset. – Cees Mandjes Mar 31 '16 at 08:18
  • @CeesMandjes About [D,Z,A] cause i treat the list is a tail-head connect construct. Without this would be more easy, i have edited my code. And – magooup Mar 31 '16 at 08:53
  • @CeesMandjes if you make two list: {"F", "A", "B", "C", "D", "Z", "A", "F", "C", "E"} and {"A", "B", "C", "A", "F", "C", "E"}, check the answer you accepted, its wrong.Then see which answer is correct! – magooup Mar 31 '16 at 09:02
  • @Braj what is the complexity of code? Do you really understand the code yet – magooup Mar 31 '16 at 09:08
  • OP has clearly mentioned in the question "How can I do this efficiently?(without using more than 2 for loops)". I think you are using more than 2 for loop? Isn't it? – Braj Mar 31 '16 at 11:40
  • @Braj So you think 2 loop must be more efficiently than 3 loop? – magooup Mar 31 '16 at 12:11
  • The outer most loop will run for `list1.size` times [N] then immediate inner loop `indicesOf` will run `list2.size` times [M] then there is one more `while` loop that will run `list2.size` times [M] is worst case. Total `N * M * M`. The impact is not visible for small set of list but if both the list contains millions of records with no match then it matter. You can run a performance test for your code by simply differentiating `endTime- startTime`. – Braj Mar 31 '16 at 12:16
1

Here's a nice solution with complexity of O(n) (correct me if I'm wrong) exploiting HashMap (I'm using String for readability and simplicity sake, the same logic can be applied to List):

public static String greatestSubset(String list1, String list2) {
    int shift = -1, maxCount = -1, index1 = -1, index2 = -1;
    HashMap<Integer, Integer> shiftMap = new HashMap<Integer, Integer>();
    HashMap<Integer, Boolean> aliveShiftMap = new HashMap<Integer, Boolean>();

    for(int i = 0 ; i < list1.length() ; i++) {
        char c = list1.charAt(i);
        int index;

        //calculate shifts, if exists increments, otherwise add with count=1 
        for( shift = i-(index=list2.indexOf(c)) ; index != -1 ; shift = i-(index=list2.indexOf(c, index+1)) ) {
            if(shiftMap.containsKey(shift)) {
                shiftMap.replace(shift, shiftMap.get(shift)+1);
                aliveShiftMap.replace(shift, true);
            } else {
                shiftMap.put(shift, 1);
                aliveShiftMap.put(shift, true);
            }
        }

        for (Entry<Integer, Boolean> entry : aliveShiftMap.entrySet()) {
            if(!entry.getValue()) { //if shift not incremented, terminate
                if(shiftMap.get(entry.getKey()) > maxCount) {
                    maxCount = shiftMap.get(entry.getKey());
                    index1 = i-maxCount;
                    index2 = i;
                }

                shiftMap.remove(entry.getKey());
                aliveShiftMap.put(entry.getKey(), true);
            } else { // else keep for next iteration
                aliveShiftMap.put(entry.getKey(), false);
            }
        }

        //remove all non-incrementedn shifts
        aliveShiftMap.values().removeAll(Collections.singleton(true));
    }

    return list1.substring(index1, index2);
}

Note that the HashMap complication is only necessary to account for duplicates of objects in the same list, otherwise you only need a few primitive int variables.

Here's a summary of the algorithm:

  1. Increment though the chars of list1, and calculate what is the shift required to match the same char on list2.
  2. If that shift is already present in shiftMap, increment, otherwise add it with a count of 1
  3. If a given shift was not incremented in the current iteration, then terminate it, and keep it as the maxCount (record the index1 and index2) if it exceeds the current max
Maljam
  • 6,048
  • 3
  • 14
  • 30
1

You'll have to consider all possible pairs of item across lists. When a pair matches, then try to construct a subset from those indices on-wards. This subset replaces current candidate if its larger than it.

One optimization is to exit when there is a subset larger than half of the smaller list's length.

You can modify below example to collect all subsets, with their index information as well.

Example:

http://ideone.com/DehDwk

import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;

public class Main {
    /**
     * Holds information about a sub set
     *
     * @param <T> type of subset items
     */
    private static class SubSet<T> {
        List<T> items; // items of subset
        int startIndex1; // start index in list 1
        int endIndex1; // end index in list 1
        int startIndex2; // start index in list 2
        int endIndex2; // end index in list 2
    }

    /**
     * Run main example.
     *
     * @param args arguments - none honored.
     * @throws java.lang.Exception - in case of any error.
     */
    public static void main(String[] args) throws java.lang.Exception {
        // define 2 lists
        List<Integer> list1 = Arrays.asList(1, 2, 3, 4, 5, 6, 3, 2, 5, 6, 7, 3, 8);
        List<Integer> list2 = Arrays.asList(2, 8, 7, 2, 3, 4, 5, 3, 2, 5, 1, 5);

        // print the lists
        System.out.println("First list: " + Arrays.toString(list1.toArray()));
        System.out.println("Second list: " + Arrays.toString(list2.toArray()));

        // get largest sub set
        SubSet<Integer> largest = getLargestSubSet(list1, list2);


        if (largest == null) {
            // nothing found
            System.out.println("No subset found.");
        } else {
            // print info about subset

            System.out.println("Largest subset: " + Arrays.toString(largest.items.toArray()));

            if (largest.startIndex1 > 0) {
                List<Integer> beforeList1 = list1.subList(0, largest.startIndex1);
                System.out.println("Items before largest subset in first list: "
                        + Arrays.toString(beforeList1.toArray()));
            }

            if (largest.endIndex1 < list1.size() - 1) {
                List<Integer> afterList1 = list1.subList(largest.endIndex1 + 1, list1.size());
                System.out.println("Items after largest subset in first list: "
                        + Arrays.toString(afterList1.toArray()));
            }

            if (largest.startIndex2 > 0) {
                List<Integer> beforeList2 = list2.subList(0, largest.startIndex2);
                System.out.println("Items before largest subset in second list: "
                        + Arrays.toString(beforeList2.toArray()));
            }

            if (largest.endIndex2 < list2.size() - 1) {
                List<Integer> afterList2 = list2.subList(largest.endIndex2 + 1, list2.size());
                System.out.println("Items after largest subset in second list: "
                        + Arrays.toString(afterList2.toArray()));
            }

        }


    }

    /**
     * Equality check for items.
     *
     * @param obj1 first item.
     * @param obj2 second item.
     * @param <T>  item type.
     * @return true if equal,false if not.
     */
    private static <T> boolean areEqual(T obj1, T obj2) {
        return obj1 == obj2; // naive comparison
    }

    /**
     * Get largest subset (first occurrence) for given lists.
     *
     * @param list1 first list.
     * @param list2 second list.
     * @param <T>   list item type.
     * @return Largest sub sequence list, or empty list.
     */
    private static <T> SubSet<T> getLargestSubSet(List<T> list1, List<T> list2) {
        SubSet<T> output = null;

        for (int i = 0; i < list1.size(); i++) {
            for (int j = 0; j < list2.size(); j++) {

                // optimisation : exit early
                if (output != null && output.items.size() > Math.min(list1.size(), list2.size())) {
                    return output;
                }

                if (areEqual(list1.get(i), list2.get(j))) {
                    // inspect sub sequence from this (i,j) onwards
                    output = inspectSubSet(list1, list2, i, j, output);
                }
            }
        }

        return output;
    }

    /**
     * For given starting indices, inspect if there is a larger subset, than given one.
     *
     * @param list1     first list.
     * @param list2     second list.
     * @param index1    first index.
     * @param index2    second index.
     * @param oldSubSet existing largest subset, for comparison.
     * @param <T>       list item type.
     * @return larger subset, if found, else existing one is returned as is.
     */
    private static <T> SubSet<T> inspectSubSet(List<T> list1, List<T> list2,
                                               int index1, int index2, SubSet<T> oldSubSet) {
        // new subset candidate
        SubSet<T> newSubSet = new SubSet<T>();
        newSubSet.items = new ArrayList<T>();
        newSubSet.startIndex1 = index1;
        newSubSet.endIndex1 = index1;
        newSubSet.startIndex2 = index2;
        newSubSet.endIndex2 = index2;

        // keep building subset as subsequent items keep matching
        do {
            newSubSet.items.add(list1.get(index1));
            newSubSet.endIndex1 = index1;
            newSubSet.endIndex2 = index2;
            index1++;
            index2++;
        } while (index1 < list1.size() && index2 < list2.size()
                && areEqual(list1.get(index1), list2.get(index2)));

        // return first, larger or same.
        if (oldSubSet == null) {
            return newSubSet;
        } else if (newSubSet.items.size() > oldSubSet.items.size()) {
            return newSubSet;
        } else {
            return oldSubSet;
        }
    }

}

Output:

First list: [1, 2, 3, 4, 5, 6, 3, 2, 5, 6, 7, 3, 8]
Second list: [2, 8, 7, 2, 3, 4, 5, 3, 2, 5, 1, 5]
Largest subset: [2, 3, 4, 5]
Items before largest subset in first list: [1]
Items after largest subset in first list: [6, 3, 2, 5, 6, 7, 3, 8]
Items before largest subset in second list: [2, 8, 7]
Items after largest subset in second list: [3, 2, 5, 1, 5]
S.D.
  • 28,125
  • 2
  • 75
  • 122
  • The before- and afterlist is missing – Cees Mandjes Apr 04 '16 at 07:31
  • @CeesMandjes see update. Small changes allow you to collect indices. Similarly, you can collect all subsets, (in case there are many of same size) and later sort them. – S.D. Apr 04 '16 at 08:25