Top Ten Fluentd Tips from Kube Con + Cloud Native Con 2017

Top Ten Fluentd Tips from Kube Con + Cloud Native Con 2017
Last modified: October 7, 2020

Top Ten Fluentd Tips from Kube Con + Cloud Native Con 2017

1. How much traffic can Fluentd handle?
Tons! Some users send 15,000 messages per node per second, but of course it’s depends on how much filtering and parsing you ask Fluentd to do: the more work it does, the fewer events it can handle. For example, Treasure Data powers parts of our backend with Fluentd and routinely handles 2,000,000+ events per second.

2. Can Fluentd scale?
Absolutely Fluentd can scale! Fluentd is currently handling logging for Waze, HBO’s services for Game of Thrones, and Amazon.

3. How secure is Fluentd?
You can encrypt data transfer between servers with Fluentd using a built in plugin named out_secure_forward.

4. How can Fluentd collect data from my containers?
There are two different solutions depending on your stack.

If you’re using Docker without Kubernetes, you can use Docker’s built-in logging driver: https://docs.docker.com/engine/admin/logging/fluentd/
If you’re using Kubernetes, use Daemon Sets. Fluentd can collect STDOUT/STDERR of containers and application logs through files on shared volume. A pre-configured config template is available here: https://docs.fluentd.org/v0.12/articles/kubernetes-fluentd

5. What is the difference between Logstash and Fluentd?
There are several, and we’d love to go on about it, but here is an impartial view from Loom Systems:
https://www.loomsystems.com/blog/single-post/2017/01/30/a-comparison-of-fluentd-vs-logstash-log-collector

6. What is the difference between Fluentd and Fluent Bit?
In a nutshell, Fluent Bit has a smaller footprint and replaces Fluentd in some scenarios. That being said, Fluentd and Fluent Bit work together and if you’re using Fluent Bit you should still send logs to Fluentd as it has more plugins and orchestrations.
For example, use Fluent Bit on edge servers or devices, and configure them to simply aggregate all data to a Fluentd cluster. This two-tier configuration provides higher scalability, and lets you have complex message routing and data enrichments in Fluentd.

7. Does Fluentd connect to the tools I need? Is there a list of connections?
There are hundreds of connections/integrations for Fluentd!
Check out the full list https://www.fluentd.org/plugins and check back for updates as new connectors are created.

Get Treasure Data blogs, news, use cases, and platform capabilities.

Thank you for subscribing to our blog!

8. Why do I need Enterprise Fluentd when I’m already using the Open Source Project?
We’re very proud of the open source Fluentd, the community (check out this public Slack channel – https://slack.fluentd.org/ ) and the project’s reach (Fluentd is used in Pokemon Go, Netflix, Google, Microsoft, and Facebook!). But there are three major differences between the open source and Enterprise Fluentd.
They are:

Security – Enterprise Fluentd encrypts both in-transit and at rest.
Enterprise Connections – Enterprise Fluentd features stable enterprise-level connections to some of the most used tools (Splunk, Kafka, Kubernetes, and more)
Support – With Enterprise Fluentd you have support from our troubleshooting team.

9. Where does Fluentd store my data?
It doesn’t actually! Fluentd is a logging engine that you install on prem or in your containers (see #3) to reliably orchestrate your logs. Use Fluentd to send your logs to your favorite data lake or storage – or even multiple destinations!
An important value-add to your infrastructure is that you can choose or change your log storages just by reconfiguring Fluentd. This can include advanced deployment patterns such as having secondary storage for failure recovery, sending data to S3 for archiving in addition to keeping the latest data in Elasticsearch, etc.

10. What do I do if I’m having memory issues with Fluentd?
Fluentd uses “memory” for buffering by default. But it may consume too much in a high load environment and is less reliable in the event of a server crash. The solution is to use “file” for buffering.

Here’s the config to use:

  ...
  # Use file for buffering
  buffer_type file

  # Make sure fluentd has write access to the directory!
  buffer_path /var/fluentd/buffer/my_output

  # Flush buffer every 5 seconds or when a chunk exceeds 32MB.
  flush_interval 5s
  buffer_chunk_limit 32m

  # Buffer at most 128 buffer chunks. Make sure you have enough disk space
  # to store buffer_chunk_limit * buffer_queue_limit bytes. In this case, 32MB * 128 = 4096MB.
  buffer_queue_limit 128

  # Use 5 threads to flush buffers.
  num_threads 5
  ...

More info here: https://docs.fluentd.org/v0.12/articles/performance-tuning

Bonus Question: How does Fluentd relate to a Customer Data Platform?
A Customer Data Platform (CDP) enables the unification of data from different sources and reduces the need for data cleaning and preparation. Fluentd can be the mechanism that feeds data into your CDP. Because of the innovative architecture used to ingest structured and semi-structured data, business rules can be created to pull from the schema-on-read data and sent to most any system, allowing automated, personalized marketing.

Eduardo Silva

Eduardo is an Open Source Developer at Treasure Data, focusing on Fluentd and related projects. He is the founder of the Monkey Project and Duda.io, high-performance open-source web framework for Linux. He likes to make software more efficient.