Home / Working with Big Data: Assembling Your Toolkit

Working with Big Data: Assembling Your Toolkit

When does it make sense to start up a Big Data program? If your email marketing system isn't talking to your sales force automation system, and neither is synched up with your online purchase system, are you really ready to tackle a Big Data project? The answer may surprise you as we examine Big Data and its impact on the next generation digital experience in this third installment of our ongoing series "Are You Ready for Big Data?"

The good news for those who want to tap into the power of Big Data: The rise of this data revolution has been powered by a number of prominent open source and low-cost cloud computing projects, in addition to an explosion of commercial offerings.

These projects are moving targets, constantly evolving (see Recommended Reading: Books, Reports, Blogs and Conferences for ways to keep up), but a familiarity with the primary pieces of the Big Data puzzle will help you get oriented.

Hadoop

The most important software in Big Data, and the one that sits at the white hot center of this revolution, is Apache Hadoop, an open source project that runs on commodity Linux hardware.

Hadoop, which is named after a favorite stuffed elephant of the creator's daughter, was developed at Yahoo, and was initially inspired by papers published by Google outlining its approach to handling an avalanche of data.

Hadoop implements a framework named Map/Reduce, where the application is divided into many small fragments of work, and it assigns that work to the nodes in a cluster. Hadoop also provides a distributed file system, HDFS, that spans all the nodes in a Hadoop cluster for data storage. HDFS links together the file systems on many local nodes to make them into one big file system.

Hadoop's strength is that it can parallel process huge amounts of data across inexpensive, industry-standard servers that both store and process the data. It can scale nearly without limits, which makes it uniquely suitable for working with ever-expanding sources of data.

Hadoop is also supplemented by an ecosystem of open source Apache projects, such as Pig, Hive, and Zookeeper, that further extend the value of Hadoop and improves its usability.

Sorting out Big Data "Solutions"

Because Hadoop is open source, it has been incorporated into a wide variety of product offerings from the large, familiar enterprise vendors. Teradata has the Aster Big Analytics Appliance, EMC has Greenplum, IBM has InfoSphere BigInsights, Microsoft has its Big Data Solution, Oracle offers a Big Data Appliance.

There are also new Hadoop-based companies like Cloudera, Hortonworks, and MapR.

Cloud computing has also come into the picture as an option for those considering a Big Data project. "Infrastructure as a Service" (or IaaS) providers enable users to buy time, and install and configure their own software, like an Hadoop cluster. Budget-constrained companies can use these services to launch a Big Data project without having to invest in expensive hardware.

The next level up: cloud services that provide an application layer. Some of these "Platform as a Service" (PaaS) providers have already implemented Big Data solutions.

Three major players are Amazon Web Services, Google Cloud Platform, and Microsoft Windows Azure.

Amazon Web Services and Microsoft’s Azure cross the boundaries between a service and platform, offering hybrid solutions. Google’s approach focuses on the application layer. California-based Joyent also offers hybrid products.

Because a lot of Big Data already lives in the cloud – such as data from social media and device sensors – cloud platforms are making more sense for hosting and analyzing Big Data.

However, merging this data with what a company has on-premises will continue to be a challenge in the near term.

Editor’s Note: This is the third post in the ongoing series “Are You Ready for Big Data?” by DC Denison. Download the complete "Are You Ready for Big Data" ebook to learn more about Big Data, its applications in creating the next generatlon digital experience, and what it takes to get into the game.

Reactie toevoegen

Plain text

  • Geen HTML toegestaan.
  • Adressen van webpagina's en e-mailadressen worden automatisch naar links omgezet.
  • Regels en alinea's worden automatisch gesplitst.

Filtered HTML

  • Use [acphone_sales], [acphone_sales_text], [acphone_support], [acphone_international], [acphone_devcloud], [acphone_extra1] and [acphone_extra2] as placeholders for Acquia phone numbers. Add class "acquia-phones-link" to wrapper element to make number a link.
  • To post pieces of code, surround them with <code>...</code> tags. For PHP code, you can use <?php ... ?>, which will also colour it based on syntax.
  • Adressen van webpagina's en e-mailadressen worden automatisch naar links omgezet.
  • Toegelaten HTML-tags: <a> <em> <strong> <cite> <blockquote> <code> <ul> <ol> <li> <h4> <h5> <h2> <img>
  • Regels en alinea's worden automatisch gesplitst.
Bij het indienen van dit fomulier gaat u akkoord met het privacybeleid van Mollom.