Unlocking Data Science in the Enterprise

On-demand webinar series

The world of data science is a practice grounded in statistics but progressively being encompassed by software and systems. Data scientists strive to implement their work beyond simple research but bridging the gaps between the language of the data scientist and the speak of distributed systems proves to be increasingly difficult. Factor in a fast evolving ecosystem of tools and libraries, many being delivered weekly, and you have a recipe for distraction.

Cloudera has introduced the Data Science Workbench, an enterprise data science platform that accelerates analytics projects from exploration to production. It is a collaborative, scalable, and highly extensible platform for data exploration, analysis, modeling, and visualization. It includes powerful features to bring data scientists, analysts, and business teams together.

Find out more using the links below:

Watch on-demand

Thanks for the interest!

Navigate to the on-demand webinar series below, and watch your video of choosing.

Part 1:
Introducing the Cloudera Data Science Workbench

Part 2:
A Visual Dive into Machine Learning and Deep Learning

Part 3:
Models in Production: A Look From Beginning to End

04/12/2017 to 04/14/2017

If you'd like to join the webinar now, please click here.

04/19/2017 to 04/21/2017

If you'd like to join the webinar now, please click here.

05/03/2017 to 05/05/2017

If you'd like to join the webinar now, please click here.







Webinars

Part 1: Introducing the Cloudera Data Science Workbench

Today, leading organizations struggle to make their data scientists productive with Hadoop clusters. Data scientists find it difficult to use their existing open source languages (e.g. Python, R) and libraries with Hadoop, especially when the clusters are secured with Kerberos. At the same time, IT doesn't want to give special access to these users, who require very diverse and specific environment configurations to run their experiments. As a result, most data science teams work away from the Hadoop cluster, often on their laptops or in other data silos. The negative business impacts are a lack of insight and agility for the most advanced users, and the security, governance, and cost issues that arise from data silos.

Cloudera Data Science Workbench is a new tool, under development, that will enable collaborative, customizable, self-service access by data scientists to secure Hadoop environments via Python, R, and Scala. It can be installed on any existing cluster, whether on-premises or in the cloud.

Matt Brandwein, Director of Product Management at Cloudera and Tristan Zajonic, Senior Engineering Manager discuss:

  • The emergence of open source tools for data science
  • Common gaps in the ecosystem
  • Introduce a new tool from Cloudera
  • Demonstration
  • Q&A

Cloudera Data Science Workbench is a new tool, under development, that will enable collaborative, customizable, self-service access by data scientists to secure Hadoop environments via Python, R, and Scala. It can be installed on any existing cluster, whether on-premises or in the cloud.

Matt Brandwein, Director of Product Management at Cloudera and Tristan Zajonic, Senior Engineering Manager discuss:

  • The emergence of open source tools for data science
  • Common gaps in the ecosystem
  • Introduce a new tool from Cloudera
  • Demonstration
  • Q&A

Part 2: A Visual Dive into Machine Learning and Deep Learning

Machine Learning and Deep Learning present an advanced opportunity for us to understand data beyond simple numbers and text. Data Science practitioners want to quickly implement new machine learning and deep learning libraries but have few options for enterprise analytics systems that support these new tools. The Cloudera Data Science Workbench helps data scientists get ready-access to Hadoop data, leverage the newest machine learning and deep learning frameworks and deliver value much quicker; all in a secure environment.

Join Sean Anderson, Senior Manager of Data Science Marketing at Cloudera and Vartika Singh, Solutions Architect for Data Science at Cloudera as they discuss:

  • An introduction to machine learning and deep learning
  • Common practices and tools
  • Introduce a new tool from Cloudera
  • Demonstration
  • Q&A

The Cloudera Data Science Workbench helps data scientists get ready-access to Hadoop data, leverage the newest machine learning and deep learning frameworks and deliver value much quicker; all in a secure environment.

Sean Anderson, Senior Manager of Data Science Marketing at Cloudera and Vartika Singh, Solutions Architect for Data Science at Cloudera as they discuss:

  • An introduction to machine learning and deep learning
  • Common practices and tools
  • Introduce a new tool from Cloudera
  • Demonstration
  • Q&A

Part 3: Models in Production: A Look From Beginning to End

"I've built a model -- now what?"

Developing a predictive model is only one part of a larger journey. Data scientists have to access and transform data, and engineer features, before exploratory modeling happens. A model doesn't do anything until it's applied to data, productionised and deployed.

Apache Hadoop can support all stages of the data science lifecycle, but how this is done is still more art than science, as it requires coordinating different teams and technologies. This webinar will demonstrate a simple reference architecture for connecting the output of exploratory data science in Cloudera Data Science Workbench with production deployment on Hadoop. This includes data engineering with Spark, modeling with Spark MLlib, and production build and deployment via git, Maven and Spark Streaming.

Apache Hadoop can support all stages of the data science lifecycle, but how this is done is still more art than science, as it requires coordinating different teams and technologies. This webinar will demonstrate a simple reference architecture for connecting the output of exploratory data science in Cloudera Data Science Workbench with production deployment on Hadoop. This includes data engineering with Spark, modeling with Spark MLlib, and production build and deployment via git, Maven and Spark Streaming.

Speakers

Matt Brandwein

Director, Product Management
Cloudera
Click for more

Matt Brandwein

Director, Product Management
Cloudera

Matt is Director of Product Management at Cloudera, driving the platform's experience for data science and data engineering users. Before that, he led Cloudera's product marketing team for three years, with roles spanning product, solution, and partner marketing. Prior to Cloudera, he built enterprise search and data discovery products at Endeca/Oracle. Matt holds degrees in Computer Science and Mathematics from the University of Massachusetts Amherst.

Tristan Zajonic

Senior Engineering Manager
Cloudera
Click for more

Tristan Zajonic

Senior Engineering Manager
Cloudera

Tristan Zajonc is a senior engineering manager at Cloudera. Previously, he was cofounder and CEO of Sense, a visiting fellow at Harvard’s Institute for Quantitative Social Science, and a consultant at the World Bank. Tristan holds a PhD in public policy and an MPA in international development from Harvard and a BA in economics from Pomona College.

Vartika Singh

Solutions Architect
Cloudera
Click for more

Vartika Singh

Solutions Architect
Cloudera

Vartika Singh is a solutions architect at Cloudera with over 12 years of experience in applying machine learning technologies to industry problems ranging from advertising to imaging.

Sean Anderson

Senior Product Marketing Manager
Cloudera
Click for more

Sean Anderson

Senior Product Marketing Manager
Cloudera

Sean is a tenured infrastructure scaling and cloud strategy consultant with a strong focus on strategic partnerships and innovative hybrid technology. He has been a part of integral shifts in technology including the rise of cloud computing, open source standardization, big data, and machine learning. Sean quickly became a go-to resource and speaker for data specific workloads focusing on technologies like machine learning, data science, Apache Hadoop, MongoDB, Redis, ElasticSearch, SQL, and Data Warehousing. At Rackspace Hosting, Sean helped bring to market and launch open-source cloud platforms around Hadoop, MongoDB, and Redis. Sean is currently senior marketing manager for data science and data engineering at Cloudera; the pioneers of Apache Hadoop. Sean focuses on modern data science practices involving popular open-source languages like Python, R, and Scala and speaks often about the convergence of big data and machine learning/AI.

Sean Owen

Director of Data Science
Cloudera
Click for more

Sean Owen

Director of Data Science
Cloudera

Sean is Director of Data Science at Cloudera, based in London. Before Cloudera, he founded Myrrix Ltd, a company commercializing large-scale real-time recommender systems on Apache Hadoop. He has been a primary committer and VP for Apache Mahout, and co-author of Mahout in Action. Previously, Sean was a senior engineer at Google. He holds an MBA from the London Business School and a BA in Computer Science from Harvard.