Bringing Machine Learning to Elasticsearch with Logz.io and Treasure Data
Data analytics is a battle between order and chaos.
As organizations build up their data infrastructure, we often see different, divergent pipelines sprouting up to accommodate respective needs. This leads to data silos as well as analytics silos: one team builds its own data pipeline for themselves while another team builds a separate, disjointed one.
By way of example, let’s consider log data. DevOps teams reach for Splunk or Elasticsearch for log analysis. They set up log collectors (Splunk forwarders for Splunk, Logstash or Fluentd for Elasticsearch) and visualize/analyze them in Python, R, or Spark. Occasionally, their data engineers (yet another group!) build a pipeline for them so that they can access structured log data in databases. In either case, most data scientists hardly interact with Splunk or Elasticsearch.
Lack of a shared toolchain results in poor analytics practices, duplicated efforts and weaker insights from data. For example, suppose a DevOps engineer wants to detect abusive users impacting the system’s stability, and simple rule-based heuristics isn’t cutting it. This is a perfect opportunity to apply data science to a DevOps problem. Data scientists can build statistical models to identify abusive users using anomaly detection algorithms and reduce false alarms so that DevOps engineers can sleep better at night. This kind of collaboration, however is rare – because of those very data silos and lack of communication!
So how do we make our data analytics efforts more effective and useful? In last month’s joint webinar with Logz.io, we explored one potential solution. By complementing Logz.io (DevOps engineers’ tool) with Treasure Data (Data scientist/analysts’ service), one can gain deeper insights into operational data.
If your team is looking to improve operational intelligence by bringing a bit of data science into DevOps, I encourage you to check it out.