I have some questions on saving computing power on calculations and cpu architecture cause i want to create a 2d game with friends that can handle as many objects and parameters as possible to learn about efficient computing. We're using pygame for now but can implement c and fasm functions.
So:
1. Can you save processing power by reducing the precision of a division or binary places of quotients?
This precision would have to be in binary fractional places ofc, since 0.1 in decimal has a repeating fractional part behind the comma in binary for example, so lets say you would set it to 0.125 which is 0.001=3 fractional places in binary. I mean you could just use long division to get to a certain fractional place but i guess that's not more efficient since the processor will have to load every intermediate result into the register again. I couldn't figure this one out since x86 processors have their own DIV instruction in 4 different versions and i don't know how the cpu executes those. is it possible to write an assembly function that takes precision parameter to gain efficiency for this ?
Could it be useful to just use half precision floats or other data types if no accuracy above 2048 is needed?
On the other hand is it useful to reduce the binary places of a constant like PI that could be used thousands of times a frame, since i don't need inch precision till behind Jupiter orbit with the standard 15 decimal places.
2. Reducing trigonometric function precision or using data table
I read that the CORDIC alghorithm does this bit by bit operation, so would a version that you can give the number of binary places to calculate be more efficient? i don't know which algorithm pyhton math uses by default or if something like this is available in something like numpy/anaconda. Alternatively would it be even faster to precalculate a table of sin function results to the needed precision, or to interpolate between those results when more is needed.
3.What is the most efficient collision algorithm you know?
I had the idea that instead of checking objects against objects, which ends up with the square number of objects in checks, you could write the position of objects into a matrix(numpy array) that has a precision that you would set, so for 2d it could be pixel perfect, then just depends on how big your play area is so 1920*1080 Matrix for 1 screen for example. Then the place in the matrix just holds a number that references the object in an object pool. while you write your new position when moving you just check if something is already there. this also has the advantage that bitmap collision comes at 0 extra cost compared to rectangle collision cause you can just write it like that into the matrix. i could imagine this runs more efficiently at a very big number of objects, i got a basic version of this working but would need more optimization. I know you can do spacial subdivision for objects against objects also but i dunno which yields the better result.
lastly, do you have any book recommendations that helps to understand how the processor does its operations and uses the cache/memory to help with this kind of stuff?
i hope someone can do something with this brick of questions, cheers!