14

I am working on a neural network based on the NEAT algorithm that learns to play an Atari Breakout clone in Python 2.7, and I have all of the pieces working, but I think the evolution could be greatly improved with a better algorithm for calculating species fitness.

The inputs to the neural network are:

  • X coordinate of the center of the paddle
  • X coordinate of the center of the ball
  • Y coordinate of the center of the ball
  • ball's dx (velocity in X)
  • ball's dy (velocity in Y)

The outputs are:

  • Move paddle left
  • Move paddle right
  • Do not move paddle

The parameters I have available to the species fitness calculation are:

  • breakout_model.score - int: the final score of the game played by the species
  • breakout_model.num_times_hit_paddle - int: the number of times the paddle hit the ball
  • breakout_model.hits_per_life - int: the number of times the paddle hit the ball per life, in the form of a list; e.g. first element is the value for the first life, 2nd element is the value for the 2nd life, and so on up to 4
  • breakout_model.avg_paddle_offset_from_ball - decimal: the average linear distance in the X direction between the ball and the center of the paddle
  • breakout_model.avg_paddle_offset_from_center - decimal: the average linear distance in the X direction between the center of the frame and the center of the paddle
  • breakout_model.time - int: the total duration of the game, measured in frames
  • breakout_model.stale - boolean: whether or not the game was artificially terminated due to staleness (e.g. ball gets stuck bouncing directly vertical and paddle not moving)

If you think I need more data about the final state of the game than just these, I can likely implement a way to get it very easily.

Here is my current fitness calculation, which I don't think is very good:

def calculate_fitness(self):
    self.fitness = self.breakout_model.score
    if self.breakout_model.num_times_hit_paddle != 0:
        self.fitness += self.breakout_model.num_times_hit_paddle / 10
    else:
        self.fitness -= 0.5
    if self.breakout_model.avg_paddle_offset_from_ball != 0:
        self.fitness -= (1 / self.breakout_model.avg_paddle_offset_from_ball) * 100
    for hits in self.breakout_model.hits_per_life:
        if hits == 0:
            self.fitness -= 0.2
    if self.breakout_model.stale:
        self.fitness = 0 - self.fitness
    return self.fitness

Here is what I think the fitness calculation should do, semantically:

  • The score, obviously, should have the most significant impact on the overall fitness. Maybe a score of 0 should slightly negatively affect the fitness?
  • The number of times that the paddle hit the ball per life should have some effect, but not as significant of a contribution/weight. e.g. if that number is 0, it didn't even really try to hit the ball at all during that life, so it should have a negative effect
  • The number of times that the paddle hit the ball total should also have some effect, and its contribution should be based on the score. e.g. if it didn't hit the ball many times and also didn't score many points, that should have a significant negative effect; if it didn't hit the ball many times but scored a high number of points, that should have a significant positive effect. Overall, (I think) the closer to equal this value is to the game score, the less contribution/weight this value should have on fitness
  • The average distance in the X direction between the center of the frame and the center of the paddle should basically encourage a central "resting" position for paddle
  • If the game was ended artificially due to staleness, either this should have a significant negative effect, or it should automatically force the fitness to be 0.0; I'm not sure which case would be better

I'm not sure how to operate on all these values to make them affect the overall fitness appropriately.

Thanks in advance for any help you can provide.

Mat Jones
  • 871
  • 1
  • 9
  • 25
  • 1
    Maybe the best criterion for fitness is something like the following: (score / number of paddle hits) - number of paddle misses. I.e. you want to maximize scoring per paddle hit and minimize the number of paddle misses. – Alex Jan 08 '17 at 02:26
  • You should first define what you think "playing well" means for the game before you can reasonably define a fitness function for it. Do you want to maximize score per paddle hit as Alex suggested? – Doomed Mind Jan 09 '17 at 09:10
  • @Alex I originally had something as simple as that, but it caused it to have behavior that seemed like it would intentionally miss the ball if it hit more than one block from one paddle hit, and would get stuck in that local minimum; I wasn't sure how to help it escape this minimum, so I slowly started making the fitness function more and more complex in hopes of solving this. – Mat Jones Jan 09 '17 at 14:00
  • @mjones.udri Given that hitting more than one block from one paddle hit contributes to maximize "score per paddle hit" whereas "intentionally miss the ball" contributes negatively to both "score per paddle hit" and minimizing the number of paddle misses. I don't quite understand the behavior you are describing unless due to some bug. – Alex Jan 11 '17 at 00:51
  • @Alex Its a local minimum. Say it breaks enough blocks for the ball to get above the wall of blocks, then bounces a few times between blocks and the ceiling; the minimum number of paddle hits for this to occur is 3 (because there are 3 rows of bricks). Then it sees that it hit the ball only 4 times and broke a bunch of bricks, so `score / (num_paddle_hits)` is a high ratio, so it then intentionally misses (at least, it does some behavior that *seems* like its intentionally missing). – Mat Jones Jan 11 '17 at 14:11

1 Answers1

1

I would minimize the conditional logic in your fitness function, using it only in those cases where you want to force the fitness score to 0 or a major penalty. I would just decide how much weight each component of the score should have and multiply. Negative components just add complexity to understanding the fitness function, with no real benefit; the model learns from the relative difference in scores. So my version of the function would look something like this:

def fitness(...):
    if total_hits == 0:
        return 0
    return (game_score/max_score) * .7 \
           + game_score/total_hits * .2 \
           + game_score_per_life/hits_per_life * .1

(Aside: I didn't include "distance from center of frame" because I think that's cheating; if staying near the center is a good thing to do to maximize play efficiency, then the agent should learn that on it's own. If you sneak all the intelligence into the fitness function, then your agent isn't intelligent at all.)

matt2000
  • 944
  • 8
  • 16