Skip to Content

12 November 2015

The recipe for Big Data

bilkay.jpg

Ilkay Altintas, Chief Data Science Officer, Supercomputer Center


What do a cookbook and big data have to do with each other?

Quite a bit, said Ilkay Altintas, the chief data science officer for the San Diego Supercomputer Center, motioning to the book "How to Cook Everything Fast," which sits prominently on her office desk.

Although it might seem like an odd choice of literature for Altintas, she sees it as central to her work.

“That’s my dream project,” Altintas said.

She is working with her students to map all of the cookbook’s shortcuts and combine them with other information, such as types of ingredients and appliance brands, to create a roadmap — or a recipe — that will make cooking a meal even faster.

With that kind of data, Altintas explained enthusiastically, someone could create an app that would allow you to input the ingredients you have on hand to quickly find a foolproof and time-efficient recipe.

However, it is not just the making of meals that gets Altintas excited. Much of Altintas’s work at the San Diego Supercomputer Center is based on the same concept as creating a recipe — or workflow — that can be used time and time again to slice and dice the ever-expanding world of big data.

“Workflows do for data what recipes do for food,” she said. “Once you have the process, or the recipe, in place, you can use it whenever you want.”

Think of it this way: When you cook a meal, there are several steps. You have to determine what you want to make, and that is the phase in which you pose a question and define the basic conceptual steps to solve it. You then need to shop for the ingredients, and that could be considered the collection of data. Then you need to process the ingredients by chopping, mashing, and mixing. After you cook the different ingredients, Altintas said, they “have transformed into something larger than its parts.”

That, she said, is the essence of her job.

Altintas says workflows allow researchers to analyze and interpret information more quickly and efficiently by using software to produce an application that can be run on high-performance and cloud-computing resources.

The need to create these workflows began to be critical in the early 2000s as more data and computing technologies became available and researchers were looking for ways to speed up the process. But workflows aren’t just about analyzing data more quickly, Altintas said. They are also about creating a system that is reusable and reproducible so that others can vet and verify the data.

Altintas happened upon this burgeoning field almost by accident. She was working at Middle East Technical University in Turkey, her native country, when she decided to apply for a position at the Supercomputer Center in 2001. It was a fortuitous decision because it allowed Altintas, who has a PhD from the University of Amsterdam, in the Netherlands, to become a leader in the emerging field of workflows for the coordination of scientific computing and data management. At the San Diego Supercomputer Center, Altintas also directs the Workflows for Data Science Center of Excellence and serves as a lecturer in computer science and engineering.

For her, workflows and big data analysis aren’t just about providing insight into the past or defining a current condition. When this type of structured analysis is done properly, researchers can use it to predict the future outcomes in everything from personal health to hazard prevention. One of the projects Altintas is involved in is WIFIRE, which aims to develop an integrated workflow for rapid-wildfire prediction models. The project uses a variety of data sources — satellite imagery, photos from mountaintop cams, and measured real-time wind, temperature, and humidity data — to help predict the rate and spread of fires. In the future, it could help firefighters make informed decisions on how to battle wildfires better.

Altintas said evidence-based decision support also will be increasingly important in the practice of health care. Devices such as a Fitbit, which tracks a person’s daily activities, including exercise, meals, and sleep schedule, are already providing important and actionable data. Going forward, she can see processes that would analyze all aspects of a person’s health to come up with personalized prescriptions for a healthier life.

“One thing is becoming clear: as we have more and more data sources becoming available, we need dynamic data-driven systems that enable data-driven decision-making,” she said.

As the types and amounts of data continue to grow, the need to analyze that information quickly and effectively will become even more important, she explained. It is for those reasons Altintas and the San Diego Supercomputer Center have teamed up with UC San Diego Extension to provide a wide range of courses designed to train people for these increasingly in-demand big data jobs.

“There is a huge demand for data analysis,” Altintas said. “Everything uses data.”