It is quite common with small life universes to make them wrap round on all sides (to make a toroidal universe) but this requires double buffering. In your case this requires 3KB RAM but you only have 2KB.
If you don't wrap then you don't need to double-buffer the whole universe. You just need to avoid overwriting cells before you have finished using them as input to the calculation.
Say your universe is laid out as a conventional bitmap. We are going to treat it as a series of rows arranges sequentially in memory. Say the universe has four rows numbered 0 to 3.
0
1
2
3
4
...
When you calculate the next generation, the new version of row 3 is calculated using rows 2, 3, and 4 (which is blank). You can write the new version of row 3 on top of row 4. Similarly, calculate the new row 2 from rows 1,2,3 and write it on top of row 3, since that data is no longer needed after calculating row 2. The new row 1 is calculated from rows 0,1,2 and written over row 2.
This means you only need memory for one extra row. 97*128 bits is 1552 bytes.
The disadvantage is that your universe scrolls through memory and you have to have some mechanism to deal with this. Either copy it back to its original position after each calculation (which is slow) or arrange to be able to display it from any position in memory, and ensure that your calculation and display systems can wrap around from high to low addresses.