Why is this code saying two series are more similar than they actually are

Question

series1_values = ['risk no', 'No', 'No', 'No', 'No', 'Yes', 'No', 'Yes',
    'Medium rare', 'Female', '18-29', '$25,000 - $49,999',
    'High school degree', 'South Atlantic']

series1 = pd.Series(series1_values)

series2 = pd.Series(['risk no', 'No', 'Yes', 'Yes', 'No', 'Yes', 'No', 'Yes',
        'Medium rare', 'Female', '60+', '$25,000 - $49,999',
        'High school degree', 'South Atlantic'])


series1.isin(series2)


    0      True
    1      True
    2      True
    3      True
    4      True
    5      True
    6      True
    7      True
    8      True
    9      True
    10    False
    11     True
    12     True
    13     True
    dtype: bool

This code says that the two series share 13 values in common (sum of the trues) but they actually only have 11 values in common. Where is it getting the extra two values from?

Index 2 and 3 should also equate to False if you see what I mean.

`isin` is the wrong function, you want an elementwise comparison instead, isn't that right? `(series1 == series2).sum()` — cs95, Nov 28 '20 at 22:31
@cs95 just realized, I don't know any way to do elementwise containment, do you? So, basically a vectories `series1 in series2`, it's not `series1.str.contains(series2)`. — juanpa.arrivillaga, Nov 28 '20 at 22:41
@juanpa.arrivillaga It's easier if you're checking string containment here, that's covered in my post [here](https://stackoverflow.com/a/55335207/4909087) (under "multiple substring search"). Otherwise for numeric types I think the next best option is a list comprehension or something similar, where you convert the second series into a set for comparison. — cs95, Nov 28 '20 at 22:46
@cs95 yeah, but I don't think the "multiple substring search" is what I'm talking about, I mean each string is elementwise checked for containment. I guess if you want index-alignment just creating a data-frame and using `apply` would work. — juanpa.arrivillaga, Nov 28 '20 at 22:47
Did you try `series1 == series2`? I know that sounds too simple to work, but it actually does. — Karl Knechtel, Nov 28 '20 at 23:12

Why is this code saying two series are more similar than they actually are

0 Answers0