Tag Archives: Data Science

Investing in your Customers with MIT’s Michael Schrage

Audio October 10, 2018 Jesse Weissman

What is the one question you should be asking your customers that will completely change the way you think about your business? Similar in power to Levitt’s famous “What business are you in?” question, this new question, “The Ask” will revolutionize the way you think about customers and their value. Author Michael Schrage, a Research Fellow at the MIT Center for Digital Business, explains what “The Ask” is and, why is it so powerful? Along the way he shares examples from Apple, Dyson, McDonalds, the pharmaceutical industry and more who have reaped the benefits of how organizations can and should create value.

from Ambition Data

Forward Thinkers

Creating Serendipity with Data: Interview with Jeff Steward

July 17, 2018 Jesse Weissman 1 Comment

The fifth floor of the Harvard Art Museums is home to the Lightbox Gallery, a minimalist space with nine LCD monitors arrayed on the wall as a single large viewing plane. I had no idea this space existed until my host and fellow ASC member Jeff Steward invited me to experience this venue before our interview. Used for exploring digital art on a larger scale, the Lightbox Gallery is a small area of R&D for a Museum that is also home to the Forbes Pigment Collection and world renowned conservation labs.

We descend to the lower level where Jeff’s team explores different uses for technology in the Museum. An original (working) Atari 2600 is on display opposite a VR station for exploring augmented reality. Jeff picks up a small plastic kylix that was printed from a 3D data file of the real object. He’s quick to mention that the data is a small sample of what the Museum offers to the public…and that it took about six hours to print. We settle into a conference room, and what follows is an edited version of our interview:

Jeff what are some of the key measures for a museum of this size and what do you see as the role of data and analytics?

It’s a very large question that drives right to the business of running the Museum. And it can be difficult to connect the value of analytics to these operational measures. But we have an obligation to make use of the collection and the big question is whether a piece of art is earning its keep. Managing the Cost of Ownership is big task for a museum of our size.

What does it mean to “make use” of the collection? What are the different ways we can experience the Museum?

We have 250,000 works but at any time only about 1,700 are on display. Art is handled by our staff in storage, experienced by museum guests, or viewed by students and faculty for research purposes. These are just a few examples. And our visitors include people in and outside the Harvard community who may be casually interested or actual conservators.

Wow…that’s a lot of art behind the scenes?!

Yes…yes it is!

Does the Museum collect data on these different Users, and if so how does it employ these data?

We have a ton of “log” data from our Collection Management System. Whenever a work of art is handled in storage, or viewed by students or researchers for study we keep very detailed records of these touch-points. That said, we haven’t fully explored all of the uses for these data and there is a real “void” in the data once a work has been installed in a gallery. We’ve compensated for that somewhat on the Web. The Museum has a history going back at least 10 years of maintaining a Web page for every work in the collection. We have been able to use Google Analytics to track things like Visits and Page views, and we started tracking events on the site as well…such as when someone reserves the Study Center online. This is actually the only way to reserve the Center. So we do have some insight into how people are exploring the collection online and we use these data to influence search rankings on the site. We can promote or demote a work in the rankings based on the historical traffic its page has received.

What’s your best story of how data was used to directly influence the Museum’s operations?

Our Conservation group met with security and gave them slips of paper and instructions to record every “touch” they witness to the collection. Every time someone bumps into a sculpture, accidentally touches a painting or even if they notice some new paint flecks on the floor. After a while a lot of data was collected and the Conservation team used this information to decide where to hang the art. They moved those black strips of tape you see on the floor based on the “touch” data that had been collected by security. There is a mobile by Calder in the gallery, and next to this work was sign saying “the slightest breath will set it in motion.” As you can imagine, this work received the most marks from security and the sign was edited to change behavior and prevent damage to the art.

What are some of the ways data and analytics are changing how we experience the collection?

The pigments from the Forbes collection are a reference collection used by conservation scientists globally. The pigments are cataloged in our database and will eventually be available to the public. One of our goals is to associate pigments with the art that actually contains them, so that once a conservator has done the analysis and found a shade of crimson in a work, for example, that association is available to the public. So in theory one day you could find online all of the Van Gogh’s around the world that share a common pigment.

Today we make collection data available via a public API, part of what is essentially a scholarly search interface into the collection. The data is available, but not especially accessible to a more general audience. So I am experimenting with using basic machine processing to do face detection, text detection, auto tagging and auto captioning. It’s fascinating how wrong some of the tagging and captioning is, but you can think of the computer vision service as just another set of eyes looking at the art even if that vision is flawed. Visitors are hoping for serendipity…odd quirks in the data you don’t expect but that are super interesting. So the thought is that computer vision is building more perspectives into the data to ultimately support non-scholarly interfaces. The Museums have a long term commitment to collecting and cataloging the data, but we cannot support lots of interfaces into the collections. We can support those with the interest and ability to build interfaces themselves.

Jeff Steward is the Director of Digital Infrastructure and Emerging Technology at The Harvard Art Museums in Cambridge, MA. jeff_steward@harvard.edu.

Listen to this, Strategy

CLV Transformation with Zack Anderson of Electronic Arts (EA)

Audio May 4, 2018 Jesse Weissman

from Ambition Data…

Zack Anderson, SVP and Chief Analytics Officer at EA (Electronic Arts), talks about how their CLV model shifted their data analytics and, by extension, the culture of the company. Its new customer-centric approach better supports its “Player First” mission. EA reduced marketing spend from 22% of revenue to less than 12% and enabled the company to develop its most successful game to date: Battlefield 1. In this episode, Anderson explains that calculating CLV isn’t the same as acting on it, and shares his four key points for actualizing your own transformation.

Listen to this, Survival Skills

Artificial Intelligence and Machine Learning for Publishers: A Primer

Audio May 3, 2018 Jesse Weissman

from Digi*Pub…

What is the difference between machine learning and artificial intelligence, and what does it mean for publishers and publishing? Publishing vet and software engineer Liza Daly arms us with definitions and takes us on a tour, showing us why this brave new world matters for publishers.

Forward Thinkers

The Cambridge Women in Data Science Conference

March 21, 2018 Leah Whitehouse 1 Comment

I was recently browsing Harvard’s Institute of Applied Computational Science website and saw there was a Women in Data Science conference. I was so excited to attend, so I set a reminder on my phone. As soon as the conference went live, I forwarded a link to all of my colleagues. Not much later, I started receiving feedback that the conference was sold out! It was really thrilling to see that there was so much interest. The conference was a great opportunity to hear how some women in data science are leveraging machine learning to transform healthcare, and advocating for open science to foster public debate of big data algorithms that are influencing society. Here are some highlights:

When Regina Barzilay, MIT Professor of Electrical Engineering and Computer Science, was a breast cancer patient at MGH, she could see how machine learning could be an approach to uncovering insights in the vast collection of patient information, including mammogram scans, pathology reports, and family history. Today, she’s in remission and collaborates with MGH to train the models to detect high-risk lesions sooner than ever imagined and their likelihood of being cancerous, reducing the number of unnecessary surgeries.

Heather Bell, who leads a digital and analytics department in biopharma, provided a big-picture talk of how various companies are using artificial intelligence to streamline the otherwise long and expensive R&D pipeline. One challenge is that it can take several months to recruit participants for clinical trials. In one example she shared, Clinithink developed a NLP platform that converts written doctor notes to structured data that can rapidly identify participants based on criteria. The platform was shown to recruit 2.5 times more participants in 5% of the time. In another example Heather provided, wearables and web applications are now proving to effectively monitor health between doctor visits. In one study, lung cancer patients responded to a brief questionnaire once a week about various health metrics like appetite and weight. The device algorithm, developed by SIVAN Innovation, generated an alert to the patients’ doctors in the case of a concerning change. Of the intervention cohort, 50% more were alive 7 months longer than the regular follow-up cohort. The trial was stopped early as the effect was so large.

Francesca Dominici, HSPH Professor of Biostatistics and Co-Director of Harvard’s Data Science Initiative, shared her powerful longitudinal study demonstrating an association between exposure to air pollution and mortality risk among all Medicaid beneficiaries (~67 million per year). As the study sparked media headlines and supports more stringent environmental policy during a time it’s hotly debated, Francesca espouses principled data science and an open science framework in which data are publicly available and results reproducible. While an inevitable concern in an open science framework is privacy, it’s worth considering Cynthia Dwork’s invention differential privacy — an effective tool that goes beyond de-anonymization to protect individuals’ identities in research databases. Coincidentally Cynthia was also a speaker at WiDS to discuss her latest endeavor of developing a metric for an algorithm that classifies people as fairly as possible.

Cynthia discussed how subjective this is so in that sense the metric must be culturally aware, which is another rationale for open science.

Rounding out an exciting day of data science, Tamara Broderick, MIT Assistant Professor of Computer Science, discussed achieving accurate Bayesian inferences with optimization, which I encourage you to watch here, as well as some of the other talks I’ve highlighted. It was inspirational to hear these accomplished women in data science presenting some of their impactful research. I am really looking forward to next year’s conference and I hope you are too.

To stay up-to-date on the Women in Data Science (WiDS), go to https://www.widscambridge.org.

Your Life In Data