What makes Everything's file search and index so efficient?

Question

Everything is a file searching program. As its author hasn't released the source code, I am wondering how it works.

How could it index files so efficiently?
What data structures does it use for file searching?
How can its file searching be so fast?

To quote its FAQ,

"Everything" only indexes file and folder names and generally takes a few seconds to build its database. A fresh install of Windows 10 (about 120,000 files) will take about 1 second to index. 1,000,000 files will take about 1 minute.

If it takes only one second to index the whole Windows 10, and takes only 1 minute to index one million files, does this mean it can index 120,000 files per second?

To make the search fast, there must be a special data structure. Searching by file name doesn't only search from the start, but also from the middle in most cases. This makes it some widely used indexing structures such as Trie and Red–black tree ineffective.

The FAQ clarifies further.

Does "Everything" hog my system resources?

No, "Everything" uses very little system resources. A fresh install of Windows 10 (about 120,000 files) will use about 14 MB of ram and less than 9 MB of disk space. 1,000,000 files will use about 75 MB of ram and 45 MB of disk space.

[Supposedly it doesn't index file contents.](https://www.voidtools.com/faq/#does_everything_search_file_contents) This makes the mystery a lot less interesting. — Veedrac, Dec 07 '17 at 02:37
Note that 1,000,000 files / 1 minute is only about 20k files per second; presumably a fresh Windows install allows some files to be skipped or handled faster. — Veedrac, Dec 07 '17 at 02:38
A basic Regex search over a text file can run at many GB/s under optimal conditions; handling a query over a few tens of MB of memory isn't that impressive even with a linear scan. — Veedrac, Dec 07 '17 at 02:46
Using just a simple text file? Um... It cannot be treated as not a good idea. — Sraw, Dec 07 '17 at 02:50
I'm voting to close this question as off-topic because it's asking us how a specific closed-source application works. I would suggest rephrasing it to not ask what this program specifically does, but rather how to approach the problem in general (but you're asking multiple questions here, which is discouraged, and one of them appears to be purely a disk I/O question and has nothing to do with data structures and, for the other, you may be looking for a *"suffix tree"*). — Bernhard Barker, Dec 08 '17 at 08:19

score 1 · Answer 1 · answered Dec 07 '17 at 02:54

How could it index files so efficiently?

First, it indexes only file/directory names, not contents.

I don't know if it's efficient enough for your needs, but the ordinary way is with FindFirstFile function. Write a simple C program to list folders/files recursively, and see if it's fast enough for you. The second step through optimization would be running threads in parallel, but maybe disk access would be the bottleneck, if so multiple threads would add little benefit.

If this is not enough, finally you could try to dig into the even lower Native API functions; I have no experience with these, so I can't give you further advice. You'd be pretty close to the metal, and maybe the Linux NTFS project has some concepts you need to learn.

What data structures does it use for file searching?

How can its file searching be so fast?

Well, you know there are many different data structures designed for fast searching... probably the author ran a lot of benchmarks.

What makes Everything's file search and index so efficient?

1 Answers1