Cloudera | Ask Bigger Questions
 
FOLLOW
February Highlights
You’re invited to The Cloudera Sessions
Your Inside Track to Big Data
The Cloudera Sessions

We are going on tour and will be in a city near you soon!

The Cloudera Sessions offer an inside track — based on our experience as the market leader in Apache Hadoop and Big Data — to help you identify where you are on your journey with Hadoop.

Attend the Sessions to discover how to:

  • Design a Blueprint for Data Management
    Pursue a practical, proven approach to implementation of Hadoop for a full range of workloads…while achieving gains in operational efficiency
  • Meet Mission Critical SLAs with Big Data
    Optimize slow or complex ETL/ELT transformation workloads in order to meet critical SLAs.
  • Deliver Improved Business Value with Big Data
    Gain the advantage of ALL your data while employing your existing BI tools and practices.
  • Optimize Your Data Warehouse for Improved ROI & Data Utilization
    Derive incremental value from your existing data warehouse, accelerate query performance, and enable new workloads not previously viable.
  • Make Management Easier
    Understand how the latest advancements in Hadoop management, governance, and lineage can provide a more unified view of all your data.

Click on a city below to see its unique agenda.

Atlanta   |   Boston   |   Charlotte   |   Chicago   |   Columbus   |   Dallas   |   Houston
Los Angeles   |   Minneapolis   |   Seattle   |   Toronto   |   Washington, DC
 
 
UPCOMING EVENTS
Vancouver Hadoop User Group
Feb 20, Vancouver, BC
Jeff Lord presents Hadoop 101. More >
Oracle Developer Day: Big Data
Feb 20, Redwood Shores, CA
Cloudera solutions architect Greg Rahn provides a hands-on tutorial for managing a Big Data environment using Cloudera Manager. More >
Silicon Valley Java User Group
Feb 20, Mountain View, CA
Cloudera Engineer Eli Collins delivers a talk on the future of the Hadoop ecosystem. More >
Belgian Big Data Community
Feb 25, Leuven, Belgium
Cloudera Impala architect/tech lead Marcel Kornacker hosts an Impala deep-dive. More >
ApacheCon NA
Feb 26-28, Portland, OR
Cloudera is a sponsor, and several tech talks from Clouderans are in the agenda. More >
Strata Conference - Making Data Work
Feb 26-28, Santa Clara, CA
Cloudera is a Zettabyte Sponsor – visit our booth and catch a demo of MicroStrategy BI running on Cloudera Impala. More >
Gartner Business Intelligence & Analytics Summit
Mar 18-20, Grapevine, TX
Cloudera CTO Dr. Amr Awadallah will present Unlocking the Value of Business Data: How Hadoop is Poised for the Enterprise. More >
GigaOm Structure
Mar 20, New York, NY
Cloudera co-founder Jeff Hammerbacher will be presenting Big Data Bottlenecks: What We Need To Be Aware Of... More >
 
IN THE NEWS
Forbes
Open-Source Solves Big-Data Problems: Talking to 'Mr. Hadoop,' Doug Cutting
Read the Article >
TechCrunch
The Enterprise Cool Kids
Read the Article >
TechCrunch
Congratulations Crunchies Winners! GitHub Wins Best Overall Startup
Read the Article >
TechCrunch
Founders Stories: Cloudera’s Jeff Hammerbacher On Building Big Data Systems
Read the Article >
CTO Vision
The CTO Vision Disruptive IT List: Firms we believe all enterprise technologists should be tracking
Read the Article >
Business Insider
25 Enterprise Startups To Bet Your Career On
Read the Article >
Computerworld
Big data means big IT job opportunities -- for the right people
Read the Article >
Are you new to Hadoop?
Developers

We've compiled a simple guide of free online resources to help you get up to speed with Hadoop and take advantage of Big Data. Learn, explore, and get started!

Check out our resources section on Cloudera.com to search the full library of videos, papers, reports, and tutorials.

Is Hadoop Developer training right for you or your team?
Hadoop Developer Training

Are you new to Hadoop and need to start processing data fast and effectively? Have you been playing with CDH and are ready to move on to development supporting a technical or business use case? Are you prepared to unlock the full potential of all your data by building and deploying powerful Hadoop-based applications?

If you're wondering whether Cloudera's Developer Training for Apache Hadoop is right for you and your team, watch this webinar now! You will learn who is best suited to attend the live training, what prior knowledge you should have, and what topics the course covers. Cloudera Curriculum Manager, Ian Wrigley, discusses the skills you will attain during the course and how they will help you become a full-fledged Hadoop application developer.

During the session, Ian also presents a short portion of the actual Cloudera Developer course, discussing the difference between New and Old APIs, why there are different APIs, and which you should use when writing your MapReduce code. Following the presentation, Ian answers questions about this and other Cloudera training courses.

Take Hadoop for a test drive with Cloudera’s Hadoop demo VM
Hadoop Demo VM

We created a set of virtual machines that make it simple to get started with Apache Hadoop. Our latest VM includes CDH4 (Cloudera’s Distribution Including Apache Hadoop) in its entirty and runs on CentOS 6.2.

Follow the links below and get started with Hadoop today!

Join us at Strata Santa Clara
Strata Santa Clara

Join us February 26 – 28 at Strata Santa Clara and learn how Cloudera has developed the de facto platform for Big Data.

Visit Cloudera in booth 701 and attend a series of presentations and demonstrations. Meet celebrated authors from the Big Data community who will be available to sign copies of their published works and discuss specific projects within the Big Data space. You will also have the opportunity to meet Doug Cutting, co-founder of the Apache Hadoop project.

Register for Strata Santa Clara with the code “CLOUDERA” to receive a 25% discount! We look forward to seeing you there!

Hadoop Security
SecureWorks slashes the cost of storage with Dell | Cloudera Solution

SecureWorks needed a highly scalable solution for collecting, processing, and analyzing massive amounts of data collected from customer environments. The organization deployed the Dell | Cloudera® Solution with CDH, Dell Crowbar software framework, optimized PowerEdge™ C servers, Force10 networking, Dell and Cloudera services, based on a Dell reference architecture.

“Our storage cost per gigabyte is now approximately 21 cents. We thought we had great economics previously when we were spending about seventeen dollars per gigabyte,” said Robert Scudiere, Director of Engineering, SecureWorks.

Cloudera University
Success Story: Syncsort Optimizes Hadoop Scalability
Cloudera Data Science

Here’s an excerpt of a guest post on the Cloudera Blog by Dave Nahmias, Pre-Sales and Partner Solutions Engineer at Syncsort, on his team’s private Cloudera Developer Training for Apache Hadoop experience.

The details about how Hadoop actually reads and writes data were particularly interesting to me. I knew Hadoop stored multiple replicas of data blocks, but I didn’t previously have any in-depth knowledge of the exact algorithms used or how they relate to scheduling and data integrity with respect to the checksums maintained. It’s very helpful to understand the difference in scheduling techniques. The course also clarified how the Map and Reduce tasks are scheduled across nodes to minimize network traffic.

My colleagues agree that private training enabled us to be much more productive when working with Hadoop and to better visualize its possibilities. Specifically, training helped us understand how Syncsort might leverage the Hadoop architecture in conjunction with our software to help customers extend their usage of Hadoop. It will also help us position Syncsort to contribute to accelerating Hadoop acceptance by providing well-architected Big Data ETL and intra-platform data movement solutions. Even at a fraction of those expected outcomes we will have realized measurable return on investment from Cloudera training.

From a professional standpoint, the knowledge gained also prepares my team for Cloudera Certification, a necessity for anyone who needs to provide tangible evidence of Hadoop expertise in the field. We believe the private training also improved our collective ability to intelligently communicate with our partners and customers about Hadoop’s functionality, a skill that should never be underestimated.

Save 15% on Multi-Course Public Training Enrollments in February

Did you know Cloudera training can help you plan for the advanced stages and progress of your Hadoop cluster? In addition to core training for aspiring Hadoop Developers and Administrators, Cloudera University also offers the best (and, in some cases, only) opportunity to learn about the lifecycle projects within the Hadoop ecosystem in a classroom setting, including Training for Apache HBase, Training for Apache Hive and Pig, and Introduction to Data Science: Building Recommender Systems. Depending on your Big Data agenda, Cloudera training can help you increase the accessibility and queryability of your data, conduct business-critical analyses using familiar scripting languages, and build new applications and customer-facing products.

For a limited time, Cloudera University is offering a 15% discount when you register for two or more Hadoop training courses to help you build out and realize your Big Data plan. Cover the basics with Developer and Administrator training, move beyond the core by pairing Developer and HBase training, work towards machine learning with Hive and Pig training and Introduction to Data Science, or customize your own learning path. Just use discount code 15off2 when you register for multiple public training classes from Cloudera University. This offer is only available for new enrollments and is only valid for classes delivered by Cloudera and scheduled to begin before March 1, 2013.

Online Tutorials From Cloudera University
Cloudera Data Science

Are you eager to learn more about Hadoop, but can’t attend onsite training? Have you attended a training course and are looking for a refresher or a way to learn more about complementary topics and projects? Are you studying for a Cloudera Certification exam and need to revisit or dive deeper into key areas? Get started at your own convenience through Cloudera University’s online resources space, where you can watch full trainings on Cloudera Essentials for Apache Hadoop and Cloudera Manager, explore the Apache Hadoop Ecosystem, or get An Introduction to Impala. We add web-based tutorials regularly, so keep an eye out for new videos throughout the month!

Developer Community Center
Where to Find Cloudera Tech Talks in Early 2013

Clouderans are traveling the United States (and beyond) in droves during the first quarter of 2013 to present at developer meetups and conferences. Read on >

How-to: Deploy a CDH Cluster in Skytap Cloud

For anyone who wants to try out a CDH cluster—from small to large—it can now be easily accomplished in Skytap Cloud. The how-to from Skytap’s Matt Sousley explains how. Read on >

Understanding MapReduce via Boggle, Part 2: Performance Optimization

In Part 1 of this series, you learned about MapReduce’s ability to process graphs via the example of Boggle. In this post, Jesse Anderson presents some optimizations to improve performance and scalability. Read on >

A Ruby Client for Impala

Thanks to Stripe’s Colin Marc for this guest post and for his work on the world’s first Ruby client for Cloudera Impala! Read on >

Featured Developer Content from the Cloudera Vault
How HDFS Protects Data (by Eric Sammer)

HDFS provides protection against data loss and corruption in a few ways:

  • Data written to files in HDFS is split up into chunks (usually 128MB in size). Each chunk (called a block) is replicated three times, by default, to three different machines. Multiple copies of a block are never written to the same machine. Replication level is configurable per file.
  • HDFS actively monitors the number of available replicas of each block, compared to the intended replication level. If, for some reason, a disk or node in the cluster should become unavailable, the filesystem will repair the missing block(s) by creating new replicas from the remaining copies of the data.
  • HDFS can be (and normally is) configured to place block replicas across multiple racks of machines to protect against catastrophic failure of an entire rack or its constituent network infrastructure.
  • Each block has an associated checksum computed on write, which is verified on all subsequent reads. Additionally, to protect against "bit rot" of files (and their blocks) that are not regularly read, the filesystem automatically verifies all checksums of all blocks on a regular basis. Should any checksum not verify correctly, HDFS will automatically detect this, discard the bad block, and create a new replica of the block.
  • Filesystem metadata -- information about object ownership, permissions, replication level, path, and so on -- is served by a highly available pair of machines (i.e. namenodes) in CDH4. Updates to metadata are maintained in a traditional write-ahead transaction log that guarantees durability of changes to metadata information. The transaction log can be written to multiple physical disks and, in a highly available configuration, is written to multiple machines.
  • HDFS block replicas are written in a synchronous, in-line replication pipeline. That is, when a client application receives a successful response from the cluster indicating that a write was successful, it is true that at least a configurable minimum number of replicas are also complete. This eliminates the potential failure case of asynchronous replication where a client could complete a write to a node, receive a successful response, only for that one node to fail before it's able to replicate to another node.
  • HDFS is fully instrumented with metric collection and reporting so monitoring systems (such as Cloudera Manager) can generate alerts when faults are detected. Metrics related to data integrity include unresponsive nodes in the cluster, failed disks, missing blocks, corrupt blocks, under-replicated blocks, and so on. Cloudera Manager has extensive HDFS-specific monitoring configured out of the box.
  • HDFS supports directory-level filesystem quotas to protect against accidental denial of service attacks that could otherwise cause critical applications to fail to write data to the cluster.
  • All higher level data storage and processing systems in CDH (MapReduce, HBase, Hive, Pig, Impala) use HDFS as their underlying storage substrate and, as a result, have the same data protection guarantees described above.
Go Back to the Top
Cloudera, Inc. 220 Portage Avenue, Palo Alto, CA 94306 USA   |   1-888-789-1488 or 1-650-362-0488   |   cloudera.com
©2013 Cloudera, Inc. All rights reserved. Cloudera and the Cloudera logo are trademarks or registered trademarks of Cloudera Inc. in the USA and other countries. All other trademarks are the property of their respective companies. Information is subject to change without notice.