An inline assembly routine using "plain old" jz
/jnz
is unlikely to be faster than what you have; that said, you have a few inefficiencies in your code:
- you're retrieving
tstring.length()
once per loop iteration; that's unnecessary.
- you're using random indexing,
tstring[len]
which might be a more-expensive operation than using a forward iterator.
- you're calling
stuff()
during the loop; depending on what exactly that does, it might be faster to just let the loop build a list of locations within the string first (so that the scanned string as well as the scanning code stays cache-hot and is not evicted by whatever stuff()
does), and only afterwards iterate over those results.
There's already a likely low-level optimized standard library function available,strchr()
, for exactly that kind of scanning. The C++ STL std::string::find()
is also likely to have been optimized for the purpose (and/or might use strchr()
in the char
specialization).
In particular, strchr()
has SSE2 (using pcmpeqb
, maskmov...
and bsf
) or SSE4.2 (using the string op pcmpistri
) implementations; for examples/actual SSE code doing this, check e.g. strchr()
in GNU libc (as used on Linux). See also the references and comments here (suitably named website ...).
My advice: Check your library implementation / documentation, and/or the actual generated assembly code for your program. You might well be using fast code already ... or would be if you'd switch from your hand-grown character-by-character simple search to just using std::string::find()
or strchr()
.
If this is ultra-speed-critical, then inlining assembly code for strchr()
as used by known/tested implementations (watch licensing) would eliminate function calls and gain a few cycles. Depends on your requirements ... code, benchmark, vary, benchmark again, ...