Trainable, Multi-Parameter Activ. Function (RBF) NeuPy / Theano

Question

How do I implement a custom activation function (RBF kernel with mean and variances adjusted by gradient descent) in Neupy or Theano for use in Neupy.

{Quick Background: Gradient Descent works with every parameter in the network. I want to make a specialized features space that contains optimized feature parameters so Neupy}

I think my problems is in the creation of parameters, how they are sized, and how they are all connected.

Primary functions of interest.

Activation Function Class

class RBF(layers.ActivationLayer):
    def initialize(self):
        super(RBF, self).initialize()
        self.add_parameter(name='mean', shape=(1,),
                       value=init.Normal(), trainable=True)
        self.add_parameter(name='std_dev', shape=(1,),
                       value=init.Normal(), trainable=True)
    def output(self, input_value):
        return rbf(input_value, self.parameters)

RBF Function

def rbf(input_value, parameters):
    K = _outer_substract(input_value, parameters['mean'])
    return np.exp(- np.linalg.norm(K)/parameters['std_dev'])

Function to shape?

def _outer_substract(x, y):
    return (x - y.T).T

Help will be much appreciated as this is will provide great insight into how to customize neupy networks. The documentation could use some work in some areas to say the least...

score 1 · Accepted Answer · answered Apr 19 '18 at 15:37

When layer changes shape of the input variable it has to inform the subsequent layers about the change. For this case it must have customized output_shape property. For example:

from neupy import layers
from neupy.utils import as_tuple
import theano.tensor as T

class Flatten(layers.BaseLayer):
    """
    Slight modification of the Reshape layer from the neupy library:
    https://github.com/itdxer/neupy/blob/master/neupy/layers/reshape.py
    """
    @property 
    def output_shape(self):
        # Number of output feature depends on the input shape 
        # When layer receives input with shape (10, 3, 4)
        # than output will be (10, 12). First number 10 defines
        # number of samples which you typically don't need to
        # change during propagation
        n_output_features = np.prod(self.input_shape)
        return (n_output_features,)

    def output(self, input_value):
        n_samples = input_value.shape[0]
        return T.reshape(input_value, as_tuple(n_samples, self.output_shape))

If you run it in terminal you will see that it works

>>> network = layers.Input((3, 4)) > Flatten()
>>> predict = network.compile()
>>> predict(np.random.random((10, 3, 4))).shape
(10, 12)

In your example I can see a few issues:

The rbf function doesn't return theano expression. It should fail during the function compilation
Functions like np.linalg.norm will return you scalar if you won't specify axis along which you want to calculate norm.

The following solution should work for you

import numpy as np
from neupy import layers, init
import theano.tensor as T


def norm(value, axis=None):
    return T.sqrt(T.sum(T.square(value), axis=axis))


class RBF(layers.BaseLayer):
    def initialize(self):
        super(RBF, self).initialize()

        # It's more flexible when shape of the parameters
        # denend on the input shape
        self.add_parameter(
            name='mean', shape=self.input_shape,
            value=init.Constant(0.), trainable=True)

        self.add_parameter(
            name='std_dev', shape=self.input_shape,
            value=init.Constant(1.), trainable=True)

    def output(self, input_value):
        K = input_value - self.mean
        return T.exp(-norm(K, axis=0) / self.std_dev)


network = layers.Input(1) > RBF()
predict = network.compile()
print(predict(np.random.random((10, 1))))

network = layers.Input(4) > RBF()
predict = network.compile()
print(predict(np.random.random((10, 4))))

Two recommendations: (1) adding a comment/demonstration on trainability; and (2) plots always help. But this works fantastic and is very instructional. Thanks! — 10donovanr, Apr 19 '18 at 20:04

score 0 · Answer 2 · edited Jun 20 '20 at 09:12

Although itdxer answered the question sufficiently, I would like to add the exact solution to this problem.

Creation of Architecture

network = layers.Input(size) > RBF() > layers.Softmax(num_out)

Activation Function

    # Elementwise Gaussian (RBF)
    def rbf(value, mean, std):
        return T.exp(-.5*T.sqr(value-mean)/T.sqr(std))/(std*T.sqrt(2*np.pi))

RBF Class

    class RBF(layers.BaseLayer):
 
        def initialize(self):
 
            # Begin by initializing.
            super(RBF, self).initialize()
 
            # Add parameters to train
            self.add_parameter(name='means', shape=self.input_shape,
                           value=init.Normal(), trainable=True)
            self.add_parameter(name='std_dev', shape=self.input_shape,
                           value=init.Normal(), trainable=True)
 
        # Define output function for the RBF layer.
        def output(self, input_value):
            K = input_value - self.means
            return rbf(input_value,self.means,self.std_dev

Training

If you are interested in training. It is as simple as,

# Set training algorithm
gdnet = algorithms.Momentum(
    network,
    momenutm = 0.1
)

# Train. 
gdnet.train(x,y,max_iter=100)

This compiles with the proper input and target and mean and variances are updated on an elementwise basis.