While implementing a specialized memcache server for a certain datatype, storage backend would be more memory efficient and lookup time could be decreased by bit wise lookup operations (i.e: O(1) lookups).
I wrote all the protocol implementation and event driven daemon part with Python within 2 days, giving us enough time to test on functionality and focusing on performance while team was validating protocol conformance and other bits.
Given the the tools like Pyrex, implementing C extensions for Python is next to trivial for any developer a bit experienced in C. I rewrote the Radix Tree based storage backend in C and made it a Python module with Pyrex within a day. Memory usage for 475K prefixes went down from 90MB to 8MB. We got a 1200% jump in the query performance.
Today, this application is running with pyevent (Python interface for libevent) and the new storage backend handles 8000 queries per second on a modest single core server, running as a single process daemon (thanks to libevent) consuming less than 40MB of memory (including the Python interpreter) while handling 300+ simultaneous connections.
That's a project designed and implemented to production quality in less than 5 days. Without Python and Pyrex, it would take longer.
We could have troubleshoot the performance problem by just using more powerful servers and switch to a multiprocess/multi-instance model while complicating the code and administration tasks, accompanied with much larger memory footprint.
I think you're on the right track to go with Python.