2

I am actually working on application where a client depends on modules that needs to be downloaded from a repository server. The modules can be of any archive : jar, dll, zip, etc.

The client first submits a set of properties (a set of Key-Value already defined by the server) to the repository server. The server makes some computation based on those properties and return all modules correspond to the client. if the client needs a module that is outdated, the server will send the newest modules to the client so it can updates it. The server will also need to compute dependencies between modules and sent them to the client, like maven.

But, the main difficulty is I cannot make assumption about the properties sent by the client because they are specific to the client environment.

The first idea that I had, was to a have matrix where each column represents a property and each row represents a module. In the matrix, it would be possible to add and remove properties. And for each case in the matrix, I will add the value that corresponds for that module.

For instance, let say that I have 2 modules and a set of properties {OS, Archive, Arch, Version, .Net}. For module1, the values are {OS="Windows 7", Archive="dll", Arch=32-bits, Version="2.0.0",.net=3.5}. For module2, the values are {OS="Windows 7", Archive="jar", Version="2.1",.net="4.0"}

But this case works perfectly for if each property contains only one value. If the client says I want all the modules that works on Windows 7 (module1 and module2) and runs on a but but for which dll archive supports version of superior of .net 3.5, . module1 will be returned.

That's work perfectly.

But what if each property can contain multiple values (which is our case). For instance, in our previous example, if module1 can run in Windows 7, Vista, XP. For the OS property, I will have to go for each property sent by the client and search for the correct value. That's a combinatorial calculation.

What I see in this process is very similar to a package management system, like apt, yum.

What is the better approach to this problem?

Brian Tompsett - 汤莱恩
  • 5,195
  • 62
  • 50
  • 120
Dimitri
  • 7,489
  • 18
  • 63
  • 117
  • 1
    If this were for a desktop app., I'd say [Java Web Start](http://stackoverflow.com/tags/java-web-start/info) is something you should look into. – Andrew Thompson Apr 29 '13 at 13:47
  • 2
    Looks very similar to this: http://stackoverflow.com/questions/15896591/algorithm-to-resolve-version-scope-based-dependency/15898608#15898608 - you can have a look at maven or osgi and how they work with that – SpaceTrucker Apr 29 '13 at 13:50
  • @SpaceTrucker yeah thanks, I am currently watching satisfiability problem and their implementation – Dimitri Apr 29 '13 at 13:54
  • 5
    +1 on the fact that this looks like maven. Dont build this yourself. – Max Charas Apr 29 '13 at 15:08
  • 1
    Maven does this for Java, NuGet does this for Microsoft environments, don't do this yourself, it won't be pretty ... –  Apr 29 '13 at 16:15
  • @JarrodRoberson Thanks for your comment. Maven/nuget are used mostly while in development(if I'm not mistaken). The thing is I have an app that loads his functionalities contained in the modules from the server at startup time – Dimitri Apr 29 '13 at 16:55
  • @Dimitri doesn't matter, a artifact repository is an artifact repository, when it accessed is irrelevant to the problem domain –  Apr 29 '13 at 17:14
  • @JarrodRoberson Except that neither Maven nor NuGet have much in terms of a repository - it's just a bunch of files accessible over HTTP in a specific layout. The value comes from the `mvn` client itself that performs dependency resolution and downloads etc. This client might not be designed to be embedded in an app, and the model the application needs might not match the model for a tool. (For instance, I'm 80% certain Maven can't download a different artifact based on the running JDK. You have to use profiles, and handle all the combinations in your POM, and can't do this on the server.) – millimoose Jun 25 '13 at 21:45
  • Basically, at some point you might be hammering an octagonal peg into a round hole. Sure they look similar if you squint your eyes but it's still not a fit. – millimoose Jun 25 '13 at 21:47
  • @Dimitri: This might seem like a combinatorial calculation, but you can probably avoid much of the effort by using an index for the properties where you only do an exact comparison. (Maybe even multidimensional indices to get higher selectivity.) I know this is a vague hint since I don't have what you're trying to accomplish in my head as well as you do, but generally use a lot of hashes. – millimoose Jun 25 '13 at 21:50
  • Hi @millimoose, can you be more specific about your idea? – Dimitri Jun 27 '13 at 08:03
  • @Dimitri Not really since I don't know what the value domains for the attributes are or how you determine what matches given what input. – millimoose Jun 27 '13 at 10:02
  • What's wrong ewith a linear scan of the list of modules, and for each module checking if it matches the requirements? Then scan list of dependencies of each module, and do the same for dependencies. This is an O(N^2) algorithm, but in practice it should work in linear-ish time, because the modules don't usually have that many dependencies that are not yet installed. – maniek Jun 29 '13 at 10:09
  • Regarding the @maniek response, and depending on the complexity of the modules, it may be possible to order the attributes depending on their unlikeness. Then, I would go in a typical SQL-like maner selecting the entities that match attribute1, then attribute2, etc. – Daniel H. Jul 02 '13 at 19:48

3 Answers3

0

I'm wondering if you could setup solr to index the metadata and return the binary files for search results. Then you would simply search using restrictions to retrieve the appropriate binary "documents" from solr.

Setting up solr may be a bit of work, but it'll likely be less work (and less error prone) than creating your own repository manager. It sounds like there are people using Tika to index binary files (images, pdfs, docs) which shouldn't really be any different than a dll or jar. For instance, see this question SOLR - Tika - Store binary version of file for a hint on how to return the binary library.

Community
  • 1
  • 1
digitaljoel
  • 25,150
  • 14
  • 83
  • 114
  • Agreed this sounds a lot like a search problem. It sounds like he'd only need to index the metadata, so you could run Solr or Elasticsearch out of the box to do this with a thin front end to serve up the binaries from disk. – Richard Marr Jul 02 '13 at 08:49
  • That's what I thought too. If the index is going to be small, then solr would likely be overkill, but if it's going to be of any considerable size then solr may be a good fit. – digitaljoel Jul 02 '13 at 15:37
  • Taking a step down to Lucene might work as long as there was only one index, with one thread writing to it. – Richard Marr Jul 02 '13 at 19:14
0

I think you need a matching system which return modules match some properties. You might consider a tagsystem, with a modules have some tags, each tag corresponds to one attribute. Ex: module1 {OS="Windows 7", Archive="dll", Arch=32-bits, Version="2.0.0",.net=3.5} have 4 tags: OS="Windows 7", Archive="dll", Arch=32-bits, Version="2.0.0", .net=3.5 and you need match a modules have tags based on request.

You can see a complete db system implements and query here

http://tagging.pui.ch/post/37027745720/tags-database-schemas

0

First of all I agree with some of the commenters that this is a solved problem. If you can spare the time, survey the existing approaches; maven, ivy, yumm, gem, rpm, apt-get, gentoo, mac homebrew, perl cpan, cygwin, haskel cabal, python easyinstall, java web start. Then catagorise into sets of approaches, then list the pros and cons of each approach.

I needed to implement such a system, and for me the maven approach worked well. But I didn't use pure maven, as determining the transitive dependency tree on the client is a bit complicated. Instead, I computed the list of dependencies at build time, using maven-dependencies-plugin, and put then inside a properties file inside the top level jar.

Then when the client wanted to run something, it was given the maven coordinates and computed the path in the local repo, downloading if not present locally. It then looked at the properties file in the jar which gave it a list of other jars. The client then downloaded the ones that were not present, then setup the classpath and ran the code.

Ok, so much for what worked for me, now back to @Dimitri.

First, it's not clear from your question that you are think of this as a tree of dependencies. Get used to thinking in terms of trees. Have a look at property graphs to get a feel for them.

Also, your question suggested it would be the client that asked the server to compute the dependencies, whereas it's normally the client that does this computation, and the server just serves meta-data and binaries.

You say:

But what if each property can contain multiple values (which is our case). For instance, in our previous example, if module1 can run in Windows 7, Vista, XP. For the OS property, I will have to go for each property sent by the client and search for the correct value. That's a combinatorial calculation.

Surely the client only runs one OS, so you just to filter out all dependencies where the OS doesn't match?

Where it gets complicated is when you need to match ranges of versions against each other. Ideally if you can avoid that it's better. Maven does support version ranges but I've never needed to use it.

Community
  • 1
  • 1
David Roussel
  • 5,328
  • 1
  • 24
  • 33