Treasure Data Python Client; MySQL, PostgreSQL, and Jira connectors; and more
Treasure Data has made Python the glue for your end-to-end data analytics pipeline. Now it’s easier than ever to link your data science toolkit directly to a robust and infinitely scalable data storage and analytics pipeline in the cloud and be up and running in minutes!
Maybe you’re brand new to data science, are a manager looking for a fast cycle time, or are an experienced old hand at Data Science. In addition to connecting directly to staples like Jupyter and Pandas, you can now pipe in data directly from MySQL, Postgre SQL and Jira, as well as engage Treasure Data Agent directly from Python. Whatever your skill level, you can leave the analytics infrastructure to us while you focus on your data!
This release also includes a few key Presto enhancements, and a new JDBC Driver.
Treasure Data Client for Python
Treasure Data is now supported by an API library for Python. From Python 2.7 or greater, 3.3 or greater or PyPy, you can now list jobs, run jobs, import data (including bulk import), and kick off Presto or Hive queries:
with tdclient.connect(db="sample_datasets", type="presto", wait_callback=on_waiting) as td:
data = pandas.read_sql("SELECT symbol, COUNT(1) AS c FROM nasdaq GROUP BY symbol", td)
Get the library from Github.
Treasure Data and Pandas Connector
With Treasure Data, you can now munge data and perform analytics directly from the powerhouse library Pandas. As detailed in this blogpost, you can run Treasure Data queries and analyze your data right from Pandas and Jupyter (formerly iPython notebook).
Data Connectors for MySQL, PostgreSQL and Jira
Setting up each connector works similarly. First, you need to ensure that you’re running the latest Treasure Data Toolbelt:
$ td –version
Next you should prepare a seed.yml – basically a Ruby configuration file containing your access information to the respective service. Here’s the one for the MySQL connector:
Treasure Data uses artificial intelligence to guess the file format you’ll be importing from, but you must prompt it first:
$ td connector:guess seed.yml -o load.yml
Then, you can preview how the system will parse the file:
Finally, import jobs can be scheduled or run as standalone.
Presto! New improvements + JDBC driver update
JDBC Driver v0.3.4 supports a new “SELECT 1” query used by a number of BI tools.
Presto engine also contains a number of significant improvements, including a new “black hole” connector that writes a session’s metadata to memory (like a /dev/null device on Unix) along with some key performance optimizations. This version also iteratively contains the patches supporting 0.107 and 0.106 Presto versions.