Comparing values yields different results

Question

I have a script which reads data in from a csv into a pd dataframe. It then iterates each row and passes the row as a pd series to another module. Here, one of the columns is evaluated to see if it is bigger than a value contained in another pd series, eg:

df_1:

col_A, col_B, col_C
234.0, 563.2, 565.5
565.7, 324.3, 5676.4

df_2:

col_X, col_Y, col_Z
124.1, 763.5, 562.1

In the above example, the first row of the dataframe is selected and sent to a function which checks to see if df_1['Col_A'] (ie: 234.0) is bigger than df_2['col_X'] (ie: 124.1). This all works perfectly.

My problem comes now that I have changed the script to read in the original dataframe from a PostgreSQL db instead of a csv file. Everything else has remained the same. The comparison appears to be doing nothing,....it doesn't evaluate to True or False, it just skips the evaluation completely.

The original code to compare the two values (each contained in a pd series) which worked correctly when reading in from csv is:

if df_1['col_A'] > df_2['col_X']:
    #do something

I have checked the types of the two values both when reading in from csv and from postgresql. It is comparing:

<class 'float'> and <class 'numpy.float64'>

The values stored in the database are of type numeric(10,2).

I have tried the following to no avail:

if df_1.loc['col_A'] > df_2.loc['col_X']
and
if Decimal(df_1.loc['col_A']) > Decimal(df_2.loc['col_X'])
and
if abs(df_1.loc['col_A']) > abs(df_2.loc['col_X'])

Im completely stumped since the only thing that has changed is getting the data from a database instead of a csv. The resulting datatypes are still the same, ie: float compared against numpy.float64

score 1 · Answer 1 · answered Apr 20 '15 at 13:21

It works fine on my machine. This code:

import numpy as np

df_1 = {'col_A': 234.0}
df_2 = {'col_X': np.float64(124.1)}

print(type(df_1['col_A']), type(df_2['col_X']))

if df_1['col_A'] > df_2['col_X']:
    #do something
    print(df_1['col_A'], 'is greater than', df_2['col_X'])

Prints this:

<class 'float'> <class 'numpy.float64'>
234.0 is greater than 124.1

What version of Python and numpy are you using?

score 1 · Accepted Answer · edited May 23 '17 at 11:50

Numpy, because of its C roots, has a more complex type system than pure Python. When your (presumably non-numpy) code reads 'float' variables, numpy types may just say "Hey, I ain't know no 'float', get lost". As @tommy-carstensen has pointed out, whether they actually do depends on versions of python and numpy.

You need to make sure all of your variables are of the same type before performing comparisons or arithmetical operations on them. See Converting numpy dtypes to native python types for discussion.

Comparing values yields different results

2 Answers2