Data Engineering 101: Integrating MongoDB and Treasure Data

0
0

When trying out different database or data storage backends, the choice of which to use – and for which part of your architecture – can be a tradeoff: Do I want excellent latency control, or do I need literally effortless – and potentially infinite – scalability?   Do I prefer to work with a sharding and implementation strategy in order to run lightning fast on bare metal, or can a tolerate a few extra milliseconds of latency to enjoy the benefits of automatic scaling overhead in the cloud?  Am I building a one-off app that works well as a standalone, or do I need to scale massively from the get go?  Do I prefer to ramp up my engineers to a powerful new query paradigm, leveraging Mongo’s impressive and helpful community,  or am I comfortable enough with the knowledge transfer from SQL to its emerging and popular big data counterparts such as Presto and Hive? (more…)

Data Science 101: Interactive Analysis with Jupyter, Pandas and Treasure Data

0
0

In case you were wondering, the next time you overhear a data scientist talking excitedly about “Pandas on Jupyter”, s/he’s not citing the latest 2-bit sci-fi from the orthographically challenged!

Treasure Data gives you a cloud-based analytics infrastructure accessible via SQL.  Our interactive engines like Presto give you the power to crunch billions of records with ease.   As a data scientist, you’ll still need to learn how to write basic SQL queries, as well as how to use any external tool you choose – like Excel or Tableau – for visualization.

(more…)

Apache Flink: General Analytics on a Streaming Dataflow Engine

0
2

This is a guest blog from Kostas Tzoumas, of dataArtisans and committer at Apache Flink.

Apache Flink® is a new approach to distributed data processing for the Hadoop ecosystem.

Flink’s approach is to offer familiar programing APIs on top of an engine that has built-in support for:

  1. Stream processing via pipelined execution, stream checkpointing, and mutable operator state
  2. Classic batch processing, including an optimizer and memory management
  3. Iterative processing for Machine Learning and other applications
  4. Stateful iterative processing, in particular for graph analytics

(more…)

The 4 Important Things About Analyzing Data Part 3 & 4: Recognize You Can’t Do It All and Show Your Value

0
0

In this blog series highlighting the four important things about analyzing data, so far we’ve talked about the importance of delivering many obvious results and the need to accurately understand the purpose of the analysis. Throughout the discussions, we’ve used the word “analysis” to describe the activity, but in fact the analysis process involves many components, from identifying data sources through dashboards and visualization. (more…)

Join us for the Upcoming Webinar: Data Science at Pebble!

0
0

Pebble watches

Are you curious about analytics within wearables companies?

We are happy to announce that we are hosting a webinar with smartwatch maker Pebble. Join us to hear about the importance and impact of data science and analytics at the company.

Over the last year, Pebble has expanded their data analytics effort to inform their product development. In this webinar, their Head of Analytics will give a glimpse of how they use data to make smarter watches.

We encourage you to sign up even if you can’t participate in the live event; we will record the session and make it available to all registrants!

Ruby 101 and Data Collection with Iron.io and Treasure Data

0
1

Here at Treasure Data, we aim to give you all the tools you need to need to ramp up with data collection  – starting with the basics, using a programming language or environment of your choice – as well as the Treasure Data Service itself, and integration with any number of third-party tools.  (If there’s a technology or topic you’d like us to cover, please leave a note in the comments below.)

This post covers the basics of data collection in Ruby, and how to collect data from multiple Iron.io IronWorker tasks running in parallel. (more…)