Pandas: A Crash Course

Pandas: A Crash Course

“In Data Science, 80% of time spent prepare data, 20% of time spent complain about need for prepare data.” – @BigDataBorat

On July 1, Treasure Data hosted the latest round of the Silicon Valley Data Engineering  meetup, “Pandas: Lessons Learned for Open-Source Data Tools with Chang She”.

The talk was given by Chang She, at Cloudera, one of the best known data engineers around and a core contributor to Pandas.

Starting on the premise that data can be “..too small to be big data, but too big to be stupid about it,” we are taken through a Pandas crash course.   The talk touched on several areas, including:

Get Treasure Data blogs, news, use cases, and platform capabilities.

Thank you for subscribing to our blog!

  • Pandas API
  • Data Structures
  • Community

Initially built for financial quants,  the product has expanded to embrace software engineers, data engineers and, generally, a broader base of users.  Since the Pandas workflow encompasses core analytics, data munging, ingestion, and publishing/plotting, it’s not hard to imagine that the history of this toolset has had many interesting twists and turns!

One of the interesting problems of the tool is the problem of convenience vs. parameter gravity-well;  that is, there appear to be opposing pulls in the community between functions being convenient swiss-army knives  vs. keeping them concise and readable.

During the Q & A, we learned many interesting lessons (and heard many interesting stories) about Pandas evolution.

Join us at Silicon Valley Data Engineering!

There are more events coming in the Silicon Valley Data Engineering group!  Next up:   Fluentd, Docker and All That: Logging Infrastructure 2015 Edition on Thursday, July 23rd!

John Hammink
John Hammink
John Hammink is Chief Evangelist for Treasure Data. An 18-year veteran of the technology and startup scene, he enjoys travel to unusual places, as well as creating digital art and world music.