2

I heard that modern compilers (clang) and parsers (like simdjson) use simd instructions to speed up lexing. But how is it possible?

I wanna to implement a lexer for a language in C++. How can I use simd to speed up Lexing and parsing?

Reza Mahdi
  • 56
  • 8
  • 2
    One use of SIMD in RapidJSON is [How to implement atoi using SIMD?](https://stackoverflow.com/q/35127060). But that's not lexing. Presumably x86 SSE2 `pcmpeqb` / `pmovmskb` can be useful to look for a space in 16 bytes at once. Or SSE4.2 `pcmpistri/m` to get a bitmask of matches for char ranges or sets. https://www.strchr.com/strcmp_and_strlen_using_sse_4.2. Of course once you have a bitmask, you can bit-scan it. – Peter Cordes May 03 '20 at 13:58
  • 1
    GCC uses SIMD to speed up finding the end of a comment. Given the fact that between licenses, changelogs and documentation, it's quite possible that the majority of some files can be comments, I guess it makes sense. But I haven't seen (or searched for) a benchmark which confirms the effectiveness – rici May 03 '20 at 15:53
  • 3
    Anyway, lexical analysis is not the bottleneck in compilation, and optimisation is best left for after you get the basics working. Just a suggestion. – rici May 03 '20 at 15:57
  • rici - that's true but I found the question interesting, since I've been looking for ways to possibly do DFA implementations somehow without conditional branching. i'm still not sure it's possible, but if anyone *has* implemented a lexer using simd i'd love to know. Just my $0.02 – honey the codewitch Jan 03 '21 at 13:40
  • Well, In general i say, lets look at SIMD as a tool. not all of languages are same, for example an nginx-like config file have just ten or twenty rules for language, but Python... is a complex one (may be more complex than C++). It seems that SIMD is just applicable for situations that we except a predefined expression, like a regex, specially for beginning and end of a string. so, consider SIMD for well-defined situations. – Reza Mahdi Jan 04 '21 at 11:50

0 Answers0