Slashdot: Coping with Too Much Data: How Boeing, Nike and Others Did It

Businesses wring their hands over having too little data. But what happens when you have too much of the same data? Figuring out conflicting reports, deciding between different metrics, and removing duplicate entries can prove an enormous drain of time and resources—especially for some of the world’s largest companies, which have implemented too many data warehouses, or data marts, that tell different stories about the same business processes or events.

Every executive wants workers to run reports that present accurate and consistent information—no matter what the data’s origin. At this month’s Teradata User Conference in DC, I heard from a number of IT architects on how they handled the situation and got their data more “truthy,” as Colbert might say.

Here is my full report about how some companies have coped from Slashdot.

Slashdot: Segregate your data owners by personae

 

Positing particular personae (say that slowly) isn’t something new when it comes to website design: The FutureNow guys have been doing it for more than five years, and there are a number of other content engagement “experts” that have their own ways at better segmenting and understanding your ultimate audience. The process of using particular personae can be a way to develop websites that can deliver higher click-through rates and improved customer experience. All well and good, but what about improving the internal data access experience too?

That was the subject of a session at the Teradata Users Conference in Washington DC in October. I heard about how you can use personae to segregate and better target your data owners and data users. It is an intriguing concept, and one worth more exploration.

(An example of virtual data marts at eBay, more explanation below.)

slash

The session was led by Gayatri Patel, who works in the Analytics Platform Delivery team at eBay and has been around the tech industry for many years. There aren’t too many places that have as much data as eBay has: each day they create 50 TB’s worth and they have more than 100 PB per day that is streamed back and forth from their servers. That is a lot of collectibles being traded at any given point. And something that I didn’t really understand before: eBay is a lot more than a marketplace. They have developed a large collection of their own mobile apps that are specific for buying cars, or fashion items, or concert tickets for their specific audiences. In the past they have had difficulties in trusting their data, because two different metrics would come up with different numbers for the same process, so that often meetings would be consumed with different groups presenting conflicting views on what was actually going on across their network.

Patel has come up with mechanisms to focus her team’s energies on particular use cases to better understand how they consume data, and to supply her end users with the right tools for their particular jobs. To get there, she has worked hard to develop a data-driven culture at eBay, to identify the data decision-makers and how to help them become more productive with the right kinds of data delivered at the right time to the right person.

Let’s look at how she partitions her company of data heavyweights:

  1. First are the business executives who are looking at top-line health and metrics of their particular units and have relatively simple needs. They want to drill down deeper to particular areas or create operational metrics and get more narrow and focused areas of particular data sets. Let’s say they want to see how weather-caused shipping delays from sellers are impacting their business. These folks need dashboards and portals that are one-stop shops where you can see everything at a glance, post your comments and share your thoughts quickly with your business unit team. Patel and her group created personal pages with a “DataHub” portal called Harmony, that makes sure all of their metrics are current and correct, and where the executives can bookmark particular graphs and share them with others.
  1. Second are product managers who are looking to learn more about their customers, and want to do more modeling and find the right algorithms to improve their marketplace experience. “We followed some of our managers around, attended their meetings and tried to understand how they use and don’t use data,” Patel said. Her team came up with what they call the “happy path” or what others have called the “golden path” – the walk that someone takes during their daily job to find the particular dataset and report that will help them do their job and make the best decisions. “Each product team has a slightly different path in how they interact with their data,” she said. “Our search development teams are more technical and data-savvy than the teams who work on eBay Motors, for example.” Her team has to constantly refine their algorithms to make the happy paths more evident and useful and well, happier for this group of users.
  1. Third are data researchers and data scientists. These folks want to go deep and understand how everything fits together, and are looking to make new discoveries about particular eBay data patterns. They want more analysis and are constantly creating ad hoc reports. Patel wanted to make this group more self-sufficient so they can concentrate on finding these new data relationships. Her team created better testing strategies, what she called “Test and Learn,” which has a collection of short behavioral tests that can be quickly deployed, as well as more longitudinal tests that can take place over the many days or weeks of a particular auction item on eBay. “We want to fail fast and early,” she said, which is in vogue now but still is something to consider when building the right data access programs. Patel and her team have developed a centralized testing platform to make it easier to track company-wide testing activities and implement best practices.
  1. Next is your product and engineering teams. They do prototypes of new services and want to measure their results. These teams are creating their own analytics and constantly changing their metrics using methods that aren’t yet in production. For this group, Patel made it easier for anyone to create a “virtual data mart” which can be setup within a few minutes, so that each engineer can build their own apps and create specific views pertinent to their own needs. (A sample screen is shown above.)

eBay has three different enterprise data efforts to help support all of these different kinds of data users. They have a traditional data warehouse on Teradata, three of them in fact. They have a fourth warehouse which is semi-structured and called “singularity” that has more behavioral data for example. Finally, they use Hadoop for unstructured Java and C programs to access. The sizes of these things is staggering: Each of the traditional data warehouses is 8 TB and the other two are 42 and 50 PB respectively.

As you can see, the eBay data landscape is a rich and complex one with a lot of different moving parts and specific large-scale implementations that meet a wide variety of needs. I liked the way that Patel is viewing her data universe, and having these different personae is a great way to set her team’s focus on what kinds of data products they need to deliver for each particular group of users. You may want to try her exercise and see if it works for you, too.

How Liberty Mutual built their first mobile app with Mendix

One of the largest insurers in the US was looking to roll out a new mobile app for its group insurance customers. Chris Woodman, an IT manager at the firm, described at Mendix World the process they went through and how Mendix was a key element to their success.

“In 2011, we wanted to develop a mobile app, but we didn’t know what we were getting into, and we had no previous mobile development experience,” he said. “Two months later we had our app deployed.” Mendix awarded the project as the outstanding effort of the year at the conference.

You can read more of my report on Liberty Mutual’s efforts from the Mendix blog here.

There are other entries that I authored during the show, and here are their links. Mendix definitely has an interesting story to tell. Here are the original stories that I filed and since then taken off their blog.

  • How fast can you deploy your apps?
  • John Rymer from Forrester describes his favorite mobile apps
  • Wrap of the first day at the confrence
  • Ron Tolido of Cap Gemini Europe spoke about whether your company has a business prevention department
  • The student programming competition
  • Wrap of the second day of the conference

Slash BI: How B.I. and Data Make a More Efficient Farm

Business intelligence is not just for big-city businesses anymore. B.I. has come to agribusiness, with farmers and cattle ranchers using many of the same tools found in numerous corporate cubicles. Thanks to everything from sophisticated tractor instruments to automated milking machines, farmers can collect all sorts of operational data in order to improve efficiency and keep production levels high.

You can read the full story over at Slashdot here of several examples of what is going on down on the farm.

SlashBI:Do You Need a Chief Data Officer?

The city of Philadelphia recently hired Mark Headd as their Chief Data Officer, and we are beginning to see this title crop up more frequently since it first came into being several years ago. Why should your organization have one and what do they do that differs from the CIO, CFO or CTO?  

Headd is an interesting choice. He comes from Code for America and ran several open data projects in his career, being a big supporter of numerous civic hackathons and using technology to open up government data to the public. <a href=”http://www.newsworks.org/index.php/local//innovation/42657-meet-mark-headd-philadelphias-first-chief-data-officer”>He is “making sure agencies aren’t reinventing the wheel, running into the same problems,” Headd said to the NewsWorks blog here</a>. “You get someone who can take a holistic view across city government and be strategic about the city’s use of data,” he added.

And the title is taking hold. <a href=”http://www.information-management.com/news/data-steward-chief-data-officer-goldensource-10022664-1.html”>”Over 60 percent of firms surveyed are actively working towards creating specialized data stewards, and eventually Chief Data Officers, for their enterprise,” according to a recent survey by GoldenSource Corporation cited in Information Management Magazine here.</a>

In a presentation by Deepak Bhaskar, the Senior Data Governance Manager at Digital River, he shows that this isn’t so simple, and that managing data across the entire enterprise can take on many dimensions.

Some organizations have begun to even identify their CDOs on their websites, as the <a href=”http://www.fcc.gov/data/chief-data-officers“>Federal Communications Commission does here. </a> They claim to be the first federal agency with the title, and have CDOs in each bureau or specialized office such as wireless or wireline communications. It is a notable effort.

Consulting firm Cap Gemini, among others, says that ideally one person should be focused on the quality, management, governance and the availability of data. It is about treating data as a strategic asset. <a href=”http://www.analytics-magazine.org/septemberoctober-2011/401-chief-data-officer-new-seat-in-the-c-suite”>But most organizations don’t give data the same kind of attention as other corporate assets</a> such as people or expenses, write Rich Cohen and Ara Gopal for Analytics Magazine last year.

That is ironic, because as IT departments move towards bring-your-own-devices and cloud-based computing, they should be more focused on their data. As one corporate IT manager from a large manufacturing firm told me, “Nowadays we don’t own the devices, we don’t own the servers, we don’t own the networks that connect them, and we don’t own any of our apps that run on these devices. All that we have left to derive value from is our data.” How true.

Why consider a CDO now? Several reasons. “Enterprise data is no longer black and white,” says Cohen and Gopal, meaning that data can be found anywhere and everywhere. Indeed, in some cases, IT departments can’t even figure out where their data actually lives. Chris Wolf, an analyst at Gartner, spoke about one firm that he interviewed that made a recent upgrade of its Microsoft Office suite from 2003 to 2010 version. In that upgrade, they lost support for an aging Access 2003 database that they weren’t aware of, yet was used by dozens of employees for their mission-critical data by one department. Oops.

Just think about all your customer interactions today. You can have a collection of text messages, blog entries, social network posts, mobile applications, your own and customer-generated videos, Tweets, emails, and even Instant Messages. How can you track all of this data? Ultimately, one person needs to be in charge and see the entire data landscape.

Another reason is that “recent regulatory reforms have placed an even higher emphasis on data accuracy and the risks associated with the lack of end-to-end visibility,” say Cohen and Gopal.

What is the role of the CDO?

<a href=”http://smartdatacollective.com/brett-stupakevich/52217/top-3-reasons-you-need-chief-data-officer”>Brett Stupakevich, writing in a blog on Smart Data Collective</a>, thinks that CDOs should have three primary responsibilities: data stewardship, or being the chief owner of all enterprise data; data aggregation, or being responsible for “building bridges between business units and creating an enterprise focus for the data”; and communicate data schemas and eliminate any semantic differences among enterprise data. That is a nice way of thinking about it.

Cohen and Gopal add that the CDO should “develop capabilities to measure and predict risk and influence enterprise risk appetite at the executive table.” The CDO should also be watching the top line revenue numbers and bottom line as well.

“The one skill that helps me a lot in my current job is to interact well with people, so I lead my team efforts to better serve the business needs of my C-Level peers,” says Brazilan CDO Mario Faria on a LinkedIn forum. “Technical and business skills are necessary however not sufficient for this job.”

Whom should the CDO report to?

Cohen and Gopal ask a very good question: “Does the CIO report to the CEO or to the CFO; in other words, is the IT organization seen as an integral part of the corporate strategy or it is seen as a cost center that enables day-to-day business operations?” But there could be a larger issue, namely, does the entire C-suite buy into the CDO as another peer? Cohen and Gopal also say, “it can be difficult to secure executive backing unless the CDO initiative is seen as a direct response to a burning business problem.”

Depending on your own organization, a CIO with enough revenue authority – or a CFO with enough IT authority — might be called a CDO. But it isn’t so much the title but what the person is actually responsible for and how much visibility into a corporation’s data footprint they actually have.

<a href=”http://www.linkedin.com/groups/Can-present-CFO-CIO-play-1863121.S.110000183?qid=b6051d7e-21d2-481f-9107-a06726ab77cf&trk=group_most_popular-0-b-ttl&goback=.gmp_1863121″>But not so fast, according to many members of the CDO LinkedIn group who responded to this very question</a>. “Each role (CIO and CDO) has its purpose and everything can not be under control thanks to a single function. And even more in big company,” said one poster.

It seems that CDO has yet to find its way to corporations involved in Big Data. Chief Scientist for Bit.ly Hilary Mason told me “I don’t know that many people with the specific title of Chief Data Officer. It seems like that would be a title found at larger corporations, but I like that it implies that whoever controls the data actually reports directly to the CEO.”

SlashBI: Lessons Learned From Boeing’s Data Analytics Experiments

It wasn’t all that long ago that Boeing didn’t even have an IT department, let alone any processes in place to make use of the massive amount of data it collected to improve its aircraft manufacturing efforts.

But today’s aircraft couldn’t be manufactured without a significant amount of BI and data management. Boeing today moves 60 petabytes around its network, and the company is in the middle of several Big Data pilot projects as well. Let’s see what they’ve been cooking up.

You can read the full article here on Slashdot/BI.

Gartner Catalyst Conference: Tales of IT Daring-Do

I was fortunate enough to attend the Gartner annual Catalyst conference this past week, where I heard some interesting stories from IT managers about their innovative approaches. Here are links to them:

Slashdot: Public Data: Where to Test Your Next Big Data App

When building Big Data apps, you need to conduct a test run with someone else’s data before you put the software into production. Why? Because using an unfamiliar dataset can help illuminate any flaws in your code, perhaps making it easier to test and perfect your underlying algorithms. To that end, there are a number of public data sources freely available for use. Some of them are infamous, such as the Enron email archive used in court hearings about the malfeasance of that company. You can read more of my article that appeared today in Slashdot here.

SlashBI: B.I. and Big Data Can Play Together Nicely

Integrating a Big Data project with a traditional B.I. shop can take a lot of work, but a few suggestions could make the process easier. Here are a few suggestions from the Hadoop Summit conference from last week, including many from Abe Taha, vice president of engineering at KarmaSphere.

A second article in Slashdot about best practices for Hadoop in enterprise deployments can be found here. There are lots of efforts underway to make Hadoop more suitable for large-scale business deployments—including the addition of integral elements such as high availability, referential integrity, failovers, and the like. My story goes into some of the details, including the ability to deploy the MapR version under Amazon’s Web Services (above).