Faster code-completion with clang

Question

I am investigating potential code-completion speedups while using clang's code-completion mechanism. The flow described below is what I found in rtags, by Anders Bakken.

Translation units are parsed by a daemon monitoring files for changes. This is done by called clang_parseTranslationUnit and related functions(reparse*, dispose*). When the user requests a completion at a given line and column in a source file, the daemon passes the cached translation unit for the last saved version of the source file and the current source file to clang_codeCompleteAt. (Clang CodeComplete docs).

The flags passed to clang_parseTranslationUnit(from CompletionThread::process, line 271) are CXTranslationUnit_PrecompiledPreamble|CXTranslationUnit_CacheCompletionResults|CXTranslationUnit_SkipFunctionBodes. The flags passed to clang_codeCompleteAt(from CompletionThread::process, line 305) are CXCodeComplete_IncludeMacros|CXCodeComplete_IncludeCodePatterns.

The call to clang_codeCompleteAt is very slow - it takes around 3-5 seconds to obtain a completion even in the cases where the completion location is a legitimate member access code, a subset of the intended use case mentioned in the documentation of clang_codeCompleteAt. This seems way too slow by IDE code-completion standards. Is there a way of speeding this up?

@Cameron The flags passed to `clang_parseTranslationUnit`(from [CompletionThread::process, line 271](https://github.com/Andersbakken/rtags/blob/master/src/CompletionThread.cpp)) are `CXTranslationUnit_PrecompiledPreamble|CXTranslationUnit_CacheCompletionResults|CXTranslationUnit_SkipFunctionBodes`. The flags passed to `clang_codeCompleteAt`(from [CompletionThread::process, line 305](https://github.com/Andersbakken/rtags/blob/master/src/CompletionThread.cpp)) are `CXCodeComplete_IncludeMacros|CXCodeComplete_IncludeCodePatterns`. — Pradhan, Nov 18 '14 at 20:30
Hmm. What sort of file is being completed -- does it include a lot of headers (e.g. Boost)? What are the compile options? Is your libclang compiled with optimizations? I'm going to try libclang's completion myself soon -- I'll check then if it's slow for me too. — Cameron, Nov 19 '14 at 14:34
I'd be glad to help you but we need more specifics. Example code would be good for a start — raph.amiard, Nov 30 '14 at 21:09
I just integrated libclang's completion into a project. It appears to work well, and quite fast (a few hundred ms at most), though I haven't tested it with heavy translation units yet. Have you tried without the `CXTranslationUnit_SkipFunctionBodes` flag? Perhaps that's incompatible with `completeAt`, leading to a full reparse each time? — Cameron, Dec 02 '14 at 21:51
@raph.amiard Thanks for your time. Let me get back to you with answers in a few hours. — Pradhan, Dec 02 '14 at 22:03
Also, don't pass `CXCodeComplete_IncludeMacros` if you're in a member context (`.`/`->`/`::`), though be aware that in some circumstances you may get macros back anyway. But that's just a light performance tweak, not the cause of your problem, I think. — Cameron, Dec 02 '14 at 22:51
Did you manage to solve this problem? It has been more than a few hours. :) — Yakk - Adam Nevraumont, Feb 24 '15 at 15:58
@Yakk Oops. I simply lost this thread :(. Will try to get back sometime this week. — Pradhan, Feb 24 '15 at 16:06
@Cameron Sorry about the long delay in getting back to you. I tried all 8 combinations of `CXTranslationUnit_SkipFunctionBodies`, `CXCodeComplete_IncludeMacros`, `CXCodeComplete_IncludeCodePatterns` and did not see a significant difference on the codebase I am working with. All of them average around 4 seconds per complete. I guess this is just because of the size of the TUs. `CXTranslationUnit_PrecompiledPreamble` ensures `reparseTU` is very fast. However, even with `CXTranslationUnit_CacheCompletionResults`, `clang_codeCompleteAt` is painfully slow for my use-case. — Pradhan, Apr 22 '15 at 06:34
@Yakk Pinging back since you had commented a couple of months ago. Haven't managed to solve it yet, but got negative results to Cameron's suggestions above. — Pradhan, Apr 22 '15 at 06:54
Hmm, that's unfortunate. Can you reproduce the completion slowness on a translation unit available to the public (e.g. open source)? It would help if we were able to reproduce this ourselves. The completion should be roughly as fast as the reparse, since that's what it does internally (it injects a special code-completion token and parses up to that point). — Cameron, Apr 22 '15 at 13:49
Recently I just moved from Rtags to clangd, which is a lot faster and more stable for vim! [clangd](https://clang.llvm.org/extra/clangd/) is a Language Server Protocol implementation, run as deamon. It can use memory or disk for pre-compiled header data. — solotim, Jan 01 '20 at 07:20

GutiMac · Answer 1 · 2015-06-22T20:06:06.523

6

The problem that clang_parseTranslationUnit has is that precompiled preamble is not reused the second time that is called code completion. Calculate the precompile preamble takes more than the 90% of these time so you should allow that the precompiled preamble was reused as soon as posible.

By default it is reused the third time that is called to parse/reparse translation unit.

Take a look of this variable 'PreambleRebuildCounter' in ASTUnit.cpp.

Other problem is that this preamble is saved in a temporary file. You can keep the precompiled preamble in memory instead of a temporary file. It would be faster. :)

edited Jun 22 '15 at 20:06

answered Jun 22 '15 at 20:04

GutiMac

2,058
18
26

Awesome! This sounds like it gets to the real issue. Will take a look at this and let you know. Thanks! – Pradhan Jun 22 '15 at 20:06
ok! let me know if it works for you! and if you have any questions feel free to ask me!!!! – GutiMac Jun 23 '15 at 18:13

score 4 · Answer 2 · answered Jun 04 '15 at 17:54

Sometimes delays of this magnitude are due to timeouts on network resources (NFS or CIFS shares on a file search path or sockets). Try monitoring the time each system call takes to complete by prefixing the process your run with strace -Tf -o trace.out. Look at the numbers in angle brackets in trace.out for the system call that takes a long time to complete.

You can also monitor the time between system calls to see which processing of a file takes too long to complete. To do this, prefix the process your run with strace -rf -o trace.out. Look at the number before each system call to look for long system call intervals. Go backwards from that point looking for open calls to see which was the file that was being processed.

If this doesn't help, you can profile your process to see where it spends most of its time.

Faster code-completion with clang

2 Answers2