It is not a simple thing to do. Most games work around the problem by using regions (as you surmised with your bejeweld example) or color maps, which maintain a second drawing, mapping position by color to a particular item.
Outside of games, routines use matrix transformation to attempt to identify edges and verticies, which tend to reduce the amount of data under consideration for possible matching. The trivial example would be to use a filter with a kernel like
kernel = [ -1 -1 -1 ]
[ -1 8 -1 ]
[ -1 -1 -1 ]
To emphasize any region which doesn't balance with its neighbours. From that you can attempt to detect lines and vertices, greatly reducing the number of items to consider in a match. If you want to detect "near" matches, then you attempt to use a linear transformation to describe the distance to a match by measuring displacement of the vertices, and set up some criteria for deciding if the match is too far from being the same.
A trivial solution, but one that only works with "perfect" data is to just xor
the bitmap against the original for every possible offset. If the image is known to be constructed with the exact bitmap, then the xor
should result in a zero field the same size as the bitmap. This technique can be somewhat improved in performance by checking a few chosen pixels for exact match before attempting the more expensive xor
and verify calculations, but its performance will degrade with larger spaces to consider in a very undesirable manner.