2

I'm trying to write a backup utility that is supposed to handle a rough 2 terrabytes of data in a lot of folders.

I want it to perform actions on files when they get created/edited/deleted preferrably also file moves or renames.

I've messed around with fanotify, only to realize it only works with file edits. And I'm against using inotify if I can since I would have to edit the maximum file watches, which I don't want to do. It would have a big performance impact I presume.

I'd preferably just set a single filewatch that works recursively to all files underneath it. Is there anyone who has experience with this who knows what a good method is? Should I go for inotify and just take the performance hit? Or is there a different approach I can take?

blipman17
  • 523
  • 2
  • 20
  • This smells like an XY problem. What exactly are you trying to do? I suspect you are trying to write a program that will keep two disks "in sync", so that file edits on one disk appear on the other. If that's your goal, consider using setting up a RAID instead; it's more efficient and reliable. – Colonel Thirty Two Sep 03 '16 at 20:20
  • @ColonelThirtyTwo one could easily want to keep 2TB in sync between two machines that have an internet connection – Michael Oct 23 '17 at 21:53

2 Answers2

2

I don't think there is a way to recursively watch for changes in a directory tree. On the other hand, with inotify, you don't need to create one file descriptor per directory you watch. You create a single inotify object and then add many directories to it with inotify_add_watch

int inotify_add_watch(int fd, const char *pathname, uint32_t mask);
redneb
  • 16,685
  • 4
  • 31
  • 50
  • wouldn't that also add a lot of overhead? don't I have to recursively scan the entire tree I want to watch on startup? – blipman17 Sep 02 '16 at 11:01
  • I guess I could scan it once entirely and write all the directories to a file. then load that single file up on startup of the program. – blipman17 Sep 02 '16 at 11:03
  • You have to scan once, so that you can start watching for all directories you find. From then on, when you receive an event for a directory creation or deletion, you have to call `inotify_add_watch`/`inotify_rm_watch` accordingly. – redneb Sep 02 '16 at 11:03
  • 1
    and how does that perform? Would it slow down a pc to a crawl for let's say... a million directories? – blipman17 Sep 02 '16 at 11:06
  • I haven't tried it for a such a large number, but I don't think there is a better option, so it might be worth trying it to find out. Let us know of the results if you do. – redneb Sep 02 '16 at 11:07
  • I will, It seems like the only way to go. – blipman17 Sep 02 '16 at 11:09
  • on closer inspection it seems like it's "only" 3140 directories and 74300 files to watch. Which is apperantly a managable number. According to that SO post I need to expect 1080 mb per watch, so 3140 * 1080 = 3.4 megabytes. which is also not unmanagable. I guess that performance isn't dead with inotify on fairly big trees. – blipman17 Sep 02 '16 at 11:41
0

I wrote a polling backup application for a tiny RAMDrive; I have tons of experience restoring fixed a few issues. I am not generally a fan of polling but I was using windows and std.file libs are ancienct: Windows 10 makes their lib 5 Major versions behind; so don't worry about portability.

I was planning 2 modes: active and sleeping that would prevent disks from spinning down pointlessly once before they switch to a longer time that would allow them to stay spun down. I have only begun to think about detecting and repairing and the ZFS solution of medium replication seems naive because WinRAR allows you to do just this with archives.

I bought a new computer a month ago and I'm still trying to backup my files. Beware using phobos i.e. std.anything. std.file' copy changes creation time which is incorrect for restored files; after copying you need to read the old file's creation time and call the OS function yourself and set the creation time; all that meta-data comes with a windows system call so that is a good place to start researching the data in Linux system calls.

Now is probably a good time to re-visit Python and try: https://docs.python.org/3/library/pathlib.html Spoiler: it is broken in 3.5.2.

AuoroP
  • 31
  • 1