I am trying to use Tensorflow to implement a non linear regression (with 4 linear terms and 4 nonlinear terms based on Tanh(x).
The sum of squared errors , which is supposed to be minimized, only increases. After a relatively few training steps, the weights and bias become "inf"
There ought to be a straightforward solution, similar to the OLS coefficients
DETAIL Inputs.csv is a 1323x5 table. THe dependent variable(y) is the first column, the remaining columns(X1 thru X4) are four features.
The first few rows are shown here
[
The code computes and compares two models a) a linear multivariate model using OLS b) a mixed Q model using the 4 linear outputs plus 4 nonlinear inputs. this model is implemented in TensorFlow
The python code is listed below. It is based partly on this Stack Overlow question
import pandas as pd, numpy as np, tensorflow as tf
from tqdm import tqdm as metered #progress bar
import matplotlib.pyplot as plt
import statsmodels.api as sm
from sklearn import preprocessing
# pandas data
df_train = d.read_csv(r'C:\Ajax\DS\inputs1.csv',sep="|")
obs=df_train.shape[0]
cols=df_train.shape[1]-1
dblcols = cols*2
graph = tf.get_default_graph()
# tf variables
x_ = tf.placeholder(name="input", shape=[None, cols], dtype=np.float32)
y_ = tf.placeholder(name="output", shape=[None, 1], dtype=np.float32)
wts = tf.Variable(tf.random_normal([dblcols,1]), name='weight')
b = tf.Variable(tf.random_normal([]), name='bias')
dependents= df_train["Y"].values.reshape(-1, 1)
feats = df_train.iloc[:, 1:1+cols].values.reshape(-1, cols)
This section implements OLS (Multivariate Linear Regression)
ones = np.ones(obs).reshape(obs,1)
feats1 = np.concatenate((feats, ones), axis=1)
model1 = sm.OLS(dependents, feats1).fit()
print (model1.summary())
OLS_Yhat = model1.predict(feats1)
This section implements the Q model containing 4 linear terms and 4 nonlinear terms
agg= tf.concat((x_ , tf.tanh(x_)),axis=1)
Qmodel = tf.add(tf.matmul(agg, wts) , b)
ssq = tf.square(y_ - Qmodel, name='cost')
ssq1= tf.reduce_sum(ssq)
LR=.01
train_op = tf.train.GradientDescentOptimizer(LR).minimize(ssq1)
nz = preprocessing.MaxAbsScaler()
Zfeats1 = nz.fit_transform(feats)
Zfeats = Zfeats1 - np.mean(Zfeats1,axis=0)
print("\nNormalized feats\n", Zfeats[:9,:],"\nstdev=",np.std(Zfeats),"\n" )
n_epochs = 10000
train_errors, nt_errors, weights, biases = [],[],[],[]
config = tf.ConfigProto(device_count = {'GPU': 0})
fig, ax = plt.subplots()
fig = plt.gcf()
fig.set_size_inches(7, 7)
ax.set_ylabel(r'Prediction', fontsize=15)
ax.set_xlabel(r'Actual', fontsize=15)
ax.set_title('OLS and Q predictors')
ax.grid(True)
fig.tight_layout()
TRAINING LOOP:
with tf.Session(config=config) as sess:
sess.run(tf.global_variables_initializer())
for i in metered(range(n_epochs)):
uu, err2, weight, bias = sess.run([train_op, ssq1, wts, b],
feed_dict={x_: Zfeats, y_: dependents})
out1.append(uu)
nt_errors.append(err2)
weights.append(weight)
biases.append(bias)
NN_yhat = sess.run(Qmodel, feed_dict={x_: Zfeats})
ax.scatter(dependents, NN_yhat, c='red', label='Q')
ax.scatter(dependents, OLS_Yhat, c='blue', label='OLS')
plt.legend()
plt.show()
The code is designed to compare the predictions generated by the two models (on the vertical axis) with the actual data (on the horizotal axis.)
Note the weights and biases increase exponentially and become infinite or NAN after 10 to 30 iterations. The Sum of Squared Errors increases exponentially with epochs.
Reducing the learning rate from .01 to .001 does not help much. It takes 3x as many epochs, but the error increases monotonically and the weights ultimately become inf
The OLS model can be seen in blue. But the Qmodel fails to display, apparently because a 1323x1 array of NANs is returned as NN_yhat
I would also like to understand why None is returned to uu.