Home » Technology » Data Center » The future of big data is open source

The future of big data is open source

Tech Page One

Bigger dataWhen Merkle, a performance marketing company, was looking to help clients get a unified view of their customers’ data, it decided to build an open-source big data platform based on Apache Hadoop.

The Hadoop open-source framework can store and process massive amounts of multistructured data — data from multiple sources — in a distributed format on large hardware clusters.

“We had to come up with a computing model that was way more scalable than what we had,” Shawn Streett, the vice president of managed hosting for Merkle, said in a case study. “Hadoop was the logical target for us.”

Merkle used a Hadoop cluster based on Dell PowerEdge servers to create its Foundational Marketing Platform, which allows customers to gain the data it needs for its marketing campaigns.

“Effective interaction with customers requires businesses to gain a deeper understanding of customers,” said Tony Giordano, the executive vice president of the technology solutions group at Merkle. “That means having a 360-degree view of customer data, including data gathered from both traditional and online sources.”

In addition to marketing, big data analytics is used in industries such as financial services, health care, telecommunications and retail.

Hadoop helps pull data from multiple sources

Hadoop and open-source technology are a big part of the big data market. Hadoop helps companies store, process and analyze hundreds of terabytes or petabytes of data. Hadoop can be cost-effective and offer a scalable and streamlined architecture, according to Cloudera.

Companies adopt the Hadoop data storage and processing system to find value in structured and unstructured data, which can include unorganized financial contracts, legal briefs or doctors’ notes. The Hadoop File System allows Hadoop to process data across multipetabyte clusters, which can lead marketers at Merkle to make better decisions on how to run campaigns.

An open-source platform such as Hadoop enables companies to integrate data from multiple sources and manage it in an “easy infrastructure,” Jeff Cotrupe, big data and analytics industry director for Frost & Sullivan, told Power More.

The market for big data software will grow almost sixfold by 2019, according to Ovum.

Open source helps companies avoid hardware vendor lock-in

Companies turn to open-source big data software to avoid being locked in to a particular hardware vendor.

“They want to be able to reach their hands into the code and manipulate it — actually control the direction of the code base, or the functionality of the product,” Michael Cucchi, Pivotal’s vice president of products, told eWEEK. “The lock-in piece is something that burned customers over the last decade.”

With big data, “the expectation increasingly is open source,” Tony Baer, principal analyst at Ovum, told Power More. “Customers do not want proprietary lock-in.”

Amy O’Connor, a big data evangelist for Cloudera, agreed. “It does give the customer options where they don’t have vendor lock-in,” she said.

When a database is open source it’s more cost-effective for companies to support when they can turn to more than one provider, O’Connor explained, while noting that in the 1980s and 1990s companies were often locked in to one database vendor. Cloudera’s Hadoop platform works with multiple vendors’ hardware, including Dell’s.

Companies locked in to a hardware vendor must stick with the vendor as it introduces new produces according to its road map and lack the flexibility to seek support from other hardware companies as well.

In addition, with an open-source platform’s ability to exchange data in any format, organizations are able to share critical information.

Children’s Healthcare of Atlanta, a pediatric facility, used Hadoop because it provided more flexibility in the type of data it could collect for its bedside alarm study. The hospital captured machine data from bedside monitors so clinicians could improve pain management in premature babies.

“One of the very powerful things about Hadoop and the Cloudera distribution is that we can handle any type of data in any type of format, which means that people that are writing global, Web-based applications, people who are creating this big data, everything around the Internet of things, there’s innovation in that and there are not strict standards,” O’Connor said.

A diversity of contributions fosters innovation

“One of the most powerful things about open source is the amount of innovation you get by having a community of people that aren’t just in one company contributing to open source,” O’Connor told Power More.

Different companies such as LinkedIn can create unique code to add to the Cloudera Enterprise software, which combines with Dell servers and Intel chips to form the Dell In-Memory Appliance for Cloudera Enterprise. Another open-source platform, OpenStack, enables companies to innovate by managing multiple cloud deployments.

Whether it’s in finance, medicine or engineering, the open-source community is a place to bring new developments in the use of big data, O’Connor said.

When the core data path of software is open source, companies can incorporate more innovation into a product, O’Connor said.

“We’re continually innovating with other companies and leaders in open source to push this community ahead,” she said.


Brian T. Horowitz

Brian T. Horowitz

Brian T. Horowitz has been a technology journalist since 1996 and has contributed to numerous publications, including Computer Shopper, CruxialCIO, eWEEK, Fast Company, NYSE magazine, ScientificAmerican.com and USA Weekend. He holds a B.A. from Hofstra University and is based in New York.

Latest Posts:


Tags: Data Center, Storage, Technology