12

I'm looking for an embeddable Java library that is suitable for collecting real-time streams of sensor data in a general-purpose way. I plan to use this to develop a "hub" application for reporting on multiple disparate sensor streams, running on a JVM based server (will also be using Clojure for this).

Key things it needs to have:

  • Interfaces for various common sensor types / APIs. I'm happy to build what I need myself, but it would be nice if some standard stuff comes out of the box.
  • Suitable for "soft real time" usage, i.e. fairly low latency and low overhead.
  • Ability to monitor and manage streams at runtime, gather statistics etc.
  • Open source under a reasonably permissive license so that I can integrate it with other code (Apache, EPL, BSD, LGPL all fine)
  • A reasonably active community / developer ecosystem

Is there something out there that fits this profile that you can recommend?

mikera
  • 101,777
  • 23
  • 241
  • 402
  • Why not just use Android? – James Black Feb 08 '13 at 03:11
  • I'll be running on the JVM - does Android provide a library that would work in this context? If so, then that might be a good answer. – mikera Feb 08 '13 at 03:14
  • You write the code in Java, but it runs on a Dalvik machine. You may want to look at this, as a starting point, and see if your requirements are going to be added to: http://www.opersys.com/downloads/cc-slides/embedded-android/embedded-android-120203.pdf – James Black Feb 08 '13 at 23:16

2 Answers2

14

1. Round-robin database (wikipedia)

RRDtool (acronym for round-robin database tool) aims to handle time-series data like network bandwidth, temperatures, CPU load, etc. The data are stored in a round-robin database (circular buffer), thus the system storage footprint remains constant over time.

This approach/DB format is widely used, stable and simple enough. Out of the box it allows to generate nice plots:

enter image description here

There is Java implementation -- RRD4J:

RRD4J is a high performance data logging and graphing system for time series data, implementing RRDTool's functionality in Java. It follows much of the same logic and uses the same data sources, archive types and definitions as RRDTool does. Open Source under Apache 2.0 License.

Update

Forget to mention there is Clojure RRD API (examples).

2. For some experiments with real-time data I would suggest to consider Perst

It is small, fast and reliable enough, but distributed under GPLv3. Perst provides several indexing algorithms:

  1. B-Tree
  2. T-Tree (optimized for in-memory database)
  3. R-Tree (spatial index)
  4. Patricia Trie (prefix search)
  5. KD-Tree (multidimensional index)
  6. Time series (large number of fixed size objects with timestamp)

The last one suits your needs very well.

3. Neo4J with Relationship indexes

A good example where this approach pays dividends is in time series data, where we have readings represented as a relationship per occurrence.

4. Oracle Berkeley DB Java Edition

Oracle Berkeley DB Java Edition is an open source, embeddable, transactional storage engine written entirely in Java. It takes full advantage of the Java environment to simplify development and deployment. The architecture of Oracle Berkeley DB Java Edition supports very high performance and concurrency for both read-intensive and write-intensive workloads.

Suggestion

Give a try to RRD4J:

  1. It is simple enough
  2. It dose provide quite a nice plots
  3. It has Clojure API
  4. It supports several back-ends including Oracle Berkeley DB Java Edition
  5. It can store/visualize detailed data sets

enter image description here

Renat Gilmanov
  • 17,223
  • 5
  • 35
  • 54
  • I think you should add [http://lmax-exchange.github.com/disruptor](http://lmax-exchange.github.com/disruptor/) as he might need some message passing. – Adam Gent Feb 19 '13 at 01:36
2

For collecting real-time streams of sensor data following might be of help

Have you checked LeJos API's. This http://lejos.sourceforge.net/nxt/nxj/api/index.html

Also it is worth checking Oracle Java ME Embedded and the target markets they are addressing http://www.unitask.com/oracledaily/2012/10/04/at-the-java-demogrounds-oracle-java-me-embedded-enables-the-internet-of-things/

Can be downloaded from http://www.oracle.com/technetwork/java/embedded/downloads/javame/index.html

For storing the Time series data nothing beats cassandra http://cassandra.apache.org/ and to answer why cassandra refer http://www.datastax.com/why-cassandra

For accessing Cassandra from Java refer https://github.com/jmctee/Cassandra-Client-Tutorial It is quite helpful and applying the time series concept in cassandra db refer
http://www.datastax.com/wp-content/uploads/2012/08/C2012-ColumnsandEnoughTime-JohnAkred.pdf

Manish Singh
  • 3,191
  • 16
  • 20