Assembly is indeed what you would use for optimising selected sequences of code. But, because native code compilers generate machine code, usually using an intermediate assembly source representation, which is then run through an assembler, the advantage you gain from using a compiler to "magically convert" your section of code, subject to optimisation, to assembly which then is linked to the rest of the program, compared to simply compiling the whole program with the compiler, is about zero - you're using the same compiler for converting, after all. From that angle, a compiler is nothing else than such a program which "magically converts it to assembly". For optimisation purposes, you want to hand code those section of code - and you need to be good at it. Many compilers generate code nowadays which performs better than non-expert crafted code, for various reasons. One is that target CPUs are very different in what is best performing code for them, and the rules to determine how efficient code for a specific CPU must look like, are often extremely complex. As a hand coder, you need to know the differences between them, to know how to write code which performs well. This knowledge is something many compilers have, and are therefore able to generate code such that one or another CPU architecture or model can benefit from the differences the compiler puts into code generation.
Often much better performance gains can be achieved by choosing more efficient algorithms. A better algorithm, coded in high level, usually outperforms a less adequate algorithm, hand coded in assembly. Therefore I'd look into possibilities to make the hashing process as such faster, by looking at alternative and faster algorithms, rather than trying to improve speed using assembly at this stage - consider assembly optimisation as a last, final step optimisation, when other means to speed up your code have been exhausted.