4

I need clarification whether my approach is right or wrong any modifications required.
Let me explain clearly. I will have a excel file in which there will be country code country name years(mm/yyyy) as extra 10 columns

countrycode country Name    12/2000 11/2000 10/2000 09/2000 08/2000 07/2000 06/2000 05/2000 04/2000 03/2000 02/2000 01/2000
     IND    India           10.1    10.2    10.3    10.4    10.5    10.6    10.7    10.8    10.9    11.1    11.2    11.3
     USA    Uinted States   8.1     8.2     8.3      8.4    8.5     8.6     8.7     8.8      8.9     9.1    9.2      9.3

In a row if anyof the price is repeated for that particular year and country , i need to show message as Duplicate present in Excel file.

For the above , i implemented by this way. For a VO i override the hashCode() with the hashcode of (coutrycode + year + price) and equals method too and while inserting in database i pass this VO to HashSet and I eliminate duplicate and compare the size of original list size with HashSet size.

But sometime if there is unique price also I am getting message as duplicate.

Please suggest me my approach is right or wrong or another way I can implement.

dbw
  • 5,842
  • 2
  • 21
  • 56
developer
  • 9,003
  • 29
  • 80
  • 144

4 Answers4

4

Buddy you have taken the right thought and approach to solve the problem but just missing a little edge (information) to solve the problem correctly.

I would like to provide a little hint, that I believe can help and rectify the problem and understand the basics really very well.

If you look at the documentation (or the source code) of hashCode for the String and Double variables, it states

STRING

Returns a hash code for this string. The hash code for a String object is computed as s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1] using int arithmetic, where s[i] is the ith character of the string, n is the length of the string, and ^ indicates exponentiation. (The hash value of the empty string is zero.)
Returns: a hash code value for this object.

DOUBLE

Returns a hash code for this Double object. The result is the exclusive OR of the two halves of the long integer bit representation, exactly as produced by the method doubleToLongBits(double), of the primitive double value represented by this Double object. That is, the hash code is the value of the expression:
(int)(v^(v>>>32))
where v is defined by:
long v = Double.doubleToLongBits(this.doubleValue());
Returns: a hash code value for this object.

So the hashCode() function returns a unique value in most case, but there are so many cases when the it returns the same int value for the two objects.

I think you are also getting caught in the same scenario.

A little more hint, you can use the HashMap<Integer,List<String>> where the Integer value is hashCode as you calculated and the List<String> is the collection of actual value got by forming the String from coutrycode + year + price .

And the last part is comparison, you can get the List<String> at the calculated hashCode() of new value and check if the same String value do exists in the List.

dbw
  • 5,842
  • 2
  • 21
  • 56
0

Hashbased collections depends on the hashcode() and equals() methods to correctly identify duplicates. If you modify these to fit exactly one usecase you are probably likely to have all sorts of side-effects in other use cases.

To say it more explicitly. If you change the methods of your VO to use only a subset of the data, you are likely to encounter unforeseen problems some where else where you might store VOs in hashbased collections.

You should keep hashcode() and equals() consistent with data equality, i.e. using all attributes for tests, as suggested in many sources (Source generators in eclipse, @EqualsAndHashcode annotations from Lombok, 'Effective Java' by Joshua Bloch, etc.).

In your explicit case you could create a specific wrapper to calculate your hashcodes and equality based on the subset.

As an example:

public void doit(List<VO> vos) {
    Set<VOWrapper> dups = new HashSet<>();
    for (VO vo : vos) {
        if (dups.contains(new VOWrapper(vo))) {
            System.out.println("Found a duplicate");
        } else {
            dups.add(new VOWrapper(vo));
            // Process vo
        }
    }
}

Based on this VO

@Data // Lombok generates getters/setters/equals/hashcode (using all fields)
public class VO {
    private String countrycode;
    private String country;
    private int month;
    private int year;
    private double price;
}

And this wrapper

public class VOWrapper {
private final VO vo;

public VOWrapper(VO vo) { this.vo = vo; }

// Equals method with only 3 fields used
@Override
public boolean equals(Object obj) {
    if (this == obj)
        return true;
    if (obj == null)
        return false;
    if (getClass() != obj.getClass())
        return false;
    VO other = ((VOWrapper) obj).vo;
    if (vo.getCountry() == null) {
        if (other.getCountry() != null)
            return false;
    } else if (!vo.getCountry().equals(other.getCountry()))
        return false;
    if (vo.getCountrycode() == null) {
        if (other.getCountrycode() != null)
            return false;
    } else if (!vo.getCountrycode().equals(other.getCountrycode()))
        return false;
    if (Double.doubleToLongBits(vo.getPrice()) != Double.doubleToLongBits(other.getPrice()))
        return false;
    return true;
}

//Hashcode method with only 3 fields used
@Override
public int hashCode() {
    final int prime = 31;
    int result = 1;
    result = prime * result + ((vo.getCountry() == null) ? 0 : vo.getCountry().hashCode());
    result = prime * result + ((vo.getCountrycode() == null) ? 0 : vo.getCountrycode().hashCode());
    long temp;
    temp = Double.doubleToLongBits(vo.getPrice());
    result = prime * result + (int) (temp ^ (temp >>> 32));
    return result;
}
}
Niels Bech Nielsen
  • 4,459
  • 1
  • 18
  • 42
0

It is perfectly valid to write code like:

List<CountryInstance> list = ...;
Set<CountryInstance> set = new HashSet<CountryInstance>(list);
if(set.size() < list.size()){
    /* There are duplicates */

For it to work you need value class instances. To create one you need to override equals and hashcode. Before you do that read What issues should be considered when overriding equals and hashCode in Java?

Community
  • 1
  • 1
Margus
  • 18,332
  • 12
  • 51
  • 101
0

If you are just parsing all the values into Strings then your approach sounds logical to me.

I read your description. You seem to say that a unexpected duplicates are detected. So this really means that 'equals' method is not behaving as you expect I think. If 'hashCode' was incorrect, I think you would get the opposite problem (duplicate NOT detected).

If you are still experiencing issues then attach the implementation of 'hashCode' and 'equals' and it might help to quickly answer the problem.

One more thing. I assume that all sample countries are unique in the file? I mean no countries are duplicated later on in the file?

welterw8
  • 106
  • 1
  • 7