Tensorflow model does not minimize error

Question

I am trying to use Tensorflow to implement a non linear regression (with 4 linear terms and 4 nonlinear terms based on Tanh(x).

The sum of squared errors , which is supposed to be minimized, only increases. After a relatively few training steps, the weights and bias become "inf"

There ought to be a straightforward solution, similar to the OLS coefficients

DETAIL Inputs.csv is a 1323x5 table. THe dependent variable(y) is the first column, the remaining columns(X1 thru X4) are four features.

The first few rows are shown here

[ Inputs1 sample[1]

The code computes and compares two models a) a linear multivariate model using OLS b) a mixed Q model using the 4 linear outputs plus 4 nonlinear inputs. this model is implemented in TensorFlow

The python code is listed below. It is based partly on this Stack Overlow question

    import pandas as pd, numpy as np, tensorflow as tf
    from tqdm import tqdm as metered #progress bar
    import matplotlib.pyplot as plt
    import statsmodels.api as sm
    from sklearn import preprocessing

    # pandas data
    df_train = d.read_csv(r'C:\Ajax\DS\inputs1.csv',sep="|")
    obs=df_train.shape[0]
    cols=df_train.shape[1]-1
    dblcols = cols*2

    graph = tf.get_default_graph()

    # tf variables
    x_ = tf.placeholder(name="input", shape=[None, cols], dtype=np.float32) 
    y_ = tf.placeholder(name="output", shape=[None, 1], dtype=np.float32)

    wts = tf.Variable(tf.random_normal([dblcols,1]),    name='weight')
    b = tf.Variable(tf.random_normal([]), name='bias')

    dependents= df_train["Y"].values.reshape(-1, 1)
    feats = df_train.iloc[:, 1:1+cols].values.reshape(-1, cols)

This section implements OLS (Multivariate Linear Regression)

    ones = np.ones(obs).reshape(obs,1)
    feats1 = np.concatenate((feats, ones), axis=1)
    model1 = sm.OLS(dependents, feats1).fit()
    print (model1.summary())
    OLS_Yhat = model1.predict(feats1)

This section implements the Q model containing 4 linear terms and 4 nonlinear terms

    agg= tf.concat((x_ ,  tf.tanh(x_)),axis=1)
    Qmodel = tf.add(tf.matmul(agg, wts)  , b)  
    ssq = tf.square(y_ - Qmodel, name='cost')
    ssq1= tf.reduce_sum(ssq)
    LR=.01
    train_op = tf.train.GradientDescentOptimizer(LR).minimize(ssq1)

    nz = preprocessing.MaxAbsScaler()
    Zfeats1 = nz.fit_transform(feats) 
    Zfeats = Zfeats1 - np.mean(Zfeats1,axis=0)

    print("\nNormalized feats\n", Zfeats[:9,:],"\nstdev=",np.std(Zfeats),"\n" )

    n_epochs = 10000
    train_errors, nt_errors, weights, biases = [],[],[],[]

    config = tf.ConfigProto(device_count = {'GPU': 0})

    fig, ax = plt.subplots()
    fig = plt.gcf()
    fig.set_size_inches(7, 7)
    ax.set_ylabel(r'Prediction', fontsize=15)
    ax.set_xlabel(r'Actual', fontsize=15)
    ax.set_title('OLS and Q predictors')
    ax.grid(True)
    fig.tight_layout()

TRAINING LOOP:

    with tf.Session(config=config) as sess:
        sess.run(tf.global_variables_initializer())

        for i in metered(range(n_epochs)):

            uu, err2, weight, bias = sess.run([train_op, ssq1, wts, b],
                                  feed_dict={x_: Zfeats, y_: dependents})
            out1.append(uu)
            nt_errors.append(err2) 
            weights.append(weight)
            biases.append(bias)

        NN_yhat = sess.run(Qmodel, feed_dict={x_: Zfeats})
        ax.scatter(dependents, NN_yhat, c='red', label='Q')
        ax.scatter(dependents, OLS_Yhat, c='blue', label='OLS')

    plt.legend()
    plt.show()

The code is designed to compare the predictions generated by the two models (on the vertical axis) with the actual data (on the horizotal axis.)

Note the weights and biases increase exponentially and become infinite or NAN after 10 to 30 iterations. The Sum of Squared Errors increases exponentially with epochs.

Reducing the learning rate from .01 to .001 does not help much. It takes 3x as many epochs, but the error increases monotonically and the weights ultimately become inf

The OLS model can be seen in blue. But the Qmodel fails to display, apparently because a 1323x1 array of NANs is returned as NN_yhat

I would also like to understand why None is returned to uu.

Your return value should be a tuple of (None, ssq, wts, b), at least wts can't be None, it's a variable. I suspect you're just looking at the first value of the tuple and seeing none. A standard first debugging step is to start with a smaller learning rate, 0.00001 is a good safe number, increase it once things are running. Also, plot (and post) the value of ssq, your loss function. What is it on the 1st step? 10th? 100th? — David Parks, Apr 08 '18 at 19:05
I edited the question to reflect your input, @DavidParks. THanks — Lcat, Apr 09 '18 at 00:21

Tensorflow model does not minimize error

0 Answers0