12

I am working on a solution for centralized log file aggregation from our CentOs 6.x servers. After installing Elasticsearch/Logstash/Kibana (ELK) stack I came across an Rsyslog omelasticsearch plugin which can send messages from Rsyslog to Elasticsearch in logstash format and started asking myself why I need Logstash.

Logstash has a lot of different input plugins including the one accepting Rsyslog messages. Is there a reason why I would use Logstash for my use case where I need to gather the content of logs files from multiple servers? Also, is there a benefit of sending messages from Rsyslog to Logstash instead of sending them directly to Elasticsearch?

alecswan
  • 3,440
  • 5
  • 21
  • 33

4 Answers4

6

I would use Logstash in the middle if there's something I need from it that rsyslog doesn't have. For example, getting GeoIP from an IP address.

If, on the other hand, I would need to get syslog or file contents indexed in Elasticsearch, I'd use rsyslog directly. It can do buffering (disk+memory), filtering, you can choose how the document will look like (you can put the textual severity instead of the number, for example), and it can parse unstructured data. But the main advantage is performance, on which rsyslog is focused on. Here's a presentation with some numbers (and tips and tricks) on Logstash, rsyslog and Elasticsearch: http://blog.sematext.com/2015/05/18/tuning-elasticsearch-indexing-pipeline-for-logs/

Radu Gheorghe
  • 564
  • 4
  • 8
3

I would recommend logstash. That would be easier to setup, more examples and they are tested to fit together.

Also, there are some benefits, in logstash you can filter and modify your logs.

  1. You can extend logs with useful data: server name, timestamp, ...
  2. Cast types, string to int, etc. (useful for correct Elastic index)
  3. Filter out logs by some rules

Moreover, you can setup batch size to optimize saving to elastic. Another feature, if something went wrong and there are crazy amount of logs per second that elastic can not process, you can setup logstash that it would save some queue of events or drop events that can not be saved.

Andrew Andrew
  • 302
  • 1
  • 2
  • 2
    Thanks! Rsyslog can do #1 - [this article](http://www.rsyslog.com/output-to-elasticsearch-in-logstash-format-kibana-friendly) shows how to report host name and timestamp. It can also do #3 - [this page](http://www.rsyslog.com/doc/v8-stable/configuration/basic_structure.html#rulesets-and-rules) shows how to configure rules. I don't think I have a use case for #2 - type casting. We don't have crazy amounts of logs per second. So, I'm trying to evaluate the trade off of installing and managing an extra client (logstash) on my VMs and the benefit I get from it. Thoughts? Links? Thanks again! – alecswan Aug 21 '15 at 20:27
  • Well, that's interesting, looks like rsyslog can handle buffering as well extending/filtering. Probably that's a good way to go and it seems that rsyslog + elastic can go together well. If rsyslog works fine give it a try. Logstash requires pretty much ram memory and there are some issues with logstash, it's not perfect. – Andrew Andrew Aug 22 '15 at 20:29
2

If you go straight from the server to elasticsearch, you can get the basic documents in (assuming the source is json, etc). For me, the power of logstash is to add value to the logs by applying business logic to modify and extend the logs.

Here's one example: syslog provides a priority level (0-7). I don't want to have a pie chart where the values are 0-7, so I make a new field that contains the pretty names ("emerg", "debug", etc) that can be used for display.

Just one example...

Alain Collins
  • 15,596
  • 2
  • 29
  • 53
  • After learning how rsyslog works I can say that business logic can be relatively easily implemented in rsyslog configuration. For your specific example, you can use syslogseverity-text rsyslog property instead of syslogseverity. – alecswan Dec 21 '15 at 16:22
2

Neither are a viable option if you really want to rely on the system to operate under load and be highly available.

We found that using rsyslog to send to a centralized location, archive it using redis of kafka and then using logstash to do its magic and ship to Elasticsearch is the best option.

Read our blog about it here - http://logz.io/blog/deploy-elk-production/

(Disclaimer - I am the VP product for logz.io and we offer ELK as a service)

meager
  • 209,754
  • 38
  • 307
  • 315
Asaf Yigal
  • 104
  • 2