Prototyping in Data Science

A conversation between Jose A. Rodriguez and Fabien Girardin on how data scientists can think ahead of the curve

Published in

TDS Archive

11 min readMar 14, 2022

Image courtesy of generative artist Iskra Velitchkova.

The definition of a data scientist is in constant evolution, particularly with the increase in specializations in the domain. I have nothing against specialization, except that I believe that not all data scientists should be pigeonholed into roles for solving specific problems or automating a specific process. A fair part of the practice of data science is about exploring ideas that do not have a shape yet. It is about being curious, testing assumptions, looking where no one else is looking, and above all, it is about understanding what to do instead of following the current tech culture of “cranking stuff out”. Which raises the question of how to give room to data scientists to think ahead of the curve.

In this conversation with Jose A. Rodriguez-Serrano, Senior Data Scientist and Program Manager at BBVA AI Factory, we introduce the term “discovery-driven prototyping” to describe how prototyping in data science can lead to the exploration of possible paths for decision-making.

👋 Hello! My name is Fabien Girardin. Each week, I set aside a couple hours for video calls for anybody (e.g. students, colleagues, friends, other professionals, etc.) to talk about anything related to data science and/or design. These conversations offer time to think and reflect on our practices and projects away from the constant hype around the work on data and Artificial Intelligence. This is the approximate transcript of one of these behind-the-curtain discussions.

📅 Book a 45-minutes slot in my calendar

FABIEN: A while ago, you recommended me reading Creativity, Inc.: Overcoming the Unseen Forces That Stand in the Way of True Inspiration by Ed Catmull, the Co-founder of Pixar. There is one particular quote that stood out to me:

“It isn’t enough to pick a path — you must go down it. By doing so, you see things you couldn’t possibly see when you started out; you may not like what you see, some of it may be confusing, but at least you will have, as we like to say, ‘explored the neighborhood.’ The key point here is that even if you decide you’re in the wrong place, there is still time to head toward the right place.” — Ed Catmull

That mindset resonates well with how I practice data science. Along with the increase in specializations in the domain, I often hear a narrow definition of the role of a data science function in an organization. I observe teams with their nose to the grindstone, busy with small improvements or short-term requests. Their scope is limited to solving specific business problems or optimizing existing processes.

But what happens when the problem is based on assumptions or is not well defined? What happens if the design of the optimized system does not take into account its implications (e.g. bias, quality)? What happens when nobody really knows exactly what infrastructure or datasets are necessary? What happens when an idea has only been discussed with abstract opinions during a meeting?

I believe there is a part of the data science practice that aims at addressing these questions. It aims at ‘exploring the neighborhood’ when the path is unknown. It aims at discovering the unknown unknowns, at guiding decision-making through the creation of possible paths (e.g. prototypes, experiments).

Do you share that observation?

Exploring the neighborhood with data. Image by author.

JOSE: Totally. Creativity, Inc excels in arguing the need for exploration and experimentation in enterprises. What the quote you mention says, in plain words, is: you’ll see the value of certain ideas only after putting a good team to work on them, and not before.

The book convinced me that this approach makes sense when three specific conditions are met: first, working in an industry where creativity is a differentiator; second, working in a constantly changing environment; third, working with high-skilled teams. While the book talks about the film industry, most people working in the tech industry, and especially data scientists, will recognize these three conditions in their working environments.

Yet, I also do believe that data science teams can go a step beyond in their experimentation mindset. Similar to the arguments of Creativity, Inc, I think that there are two “depths” of exploration in data science projects. The first one is well-known: if we think of the “typical day” of a data scientist, experimentation comes naturally in tasks such as exploratory data analysis, validating new hypotheses or being creative about optimizing resources. Call it “solution-driven prototyping”. The second one is related to discovering where no one else is looking. Call it “discovery-driven prototyping”. That depth explores questions like: Are we sure there is not a completely different and better way to design this service? Are we overlooking a new product feature using our current data or capabilities?

Because most data scientists come from a scientific or engineering background, I think solution prototyping is natural to us. But, as you suggest, there is a lot of room for discovery-driven prototyping in data science. At BBVA AI Factory, we have several initiatives which are of exploratory nature. Recently, we have launched an initiative on data science prototypes to amplify this mindset and discover value out of the bounds of ongoing projects.

I know you have been advocating for a long time for a practice across the boundaries of data science and design, so does all this resonate with your experience?

FABIEN: Yes, however I had never thought about your categories of prototyping that way. This reminds me of how Alexander Osterwalder and other business theorists talk about the “exploit” and “explore” functions of an organization. Where exploit means improving a successfully established business and explore is about pushing the boundaries of the core activities of the organization. They describe a company that can achieve both as an “ambidextrous organization”.

So — yes, I have been thinking a lot about developing ambidextrous data science functions.

I also come from an engineering background, but I have learned to approach the work on data through the language of design. During my research at MIT, I realized the value of the prototype-driven mindset for technologists to think ahead of the curve. That’s when I developed the notion of “sketching with data” as an attempt to describe our practice there of quickly producing tangible results with data and software (e.g. visualizations, maps, algorithms, interactive interfaces, models) for organizations in need of “exploring the neighborhood”.

Sketching with data with the visual programming platform Quadrigram in 2014. Image by author.

In the realm of engineering and data science, prototyping means rapidly experimenting with technology to see if it works, but in the realm of design a good prototype also offers a secondary function. It helps a vision move beyond assumptions and loose ideas by creating a “thing”. The result tells a story about “what could be”. It is not necessarily intended to build a business case or to showcase a technological solution, but it must shed light on the problem space. For instance, years ago, at Near Future Laboratory, we prototyped for a client their first recommender system. The result was not a solution of a hard business problem, but some sort of Trojan horse within the organization to discover shortcomings in the data, the specific set of knowledge that they were lacking and the implications if one day the decision was made to develop a large-scale recommender system.

JOSE: First, I admit I really like your phrasing “sketching with data”. It brings back the concept of “sketch”, which in design denotes an artifact which is a very preliminary version of the product, produced quickly and in an inexpensive manner. Its goal is to “think” further about the product, and to share and discuss the product idea in order to refine it, as explained outstandingly in Buxton’s Sketching User Experiences.

While Buxton distinguishes between sketches and prototypes, to me data science prototypes belong to the group of quick and inexpensive, even disposable artifacts. Why? Coding and programming with data is easier, faster, and cheaper than 15 years ago (when Buxton’s book was written), so concepts that would otherwise go to a paper sketch can be tested for real on a computer screen with some data (see for instance, examples of a prototype chatbot interface or customer care insights based on visualization tools, python libraries like Flask and calls to third-party NLP APIs).

I think we are coming closer to an age where you can produce prototypes very early on in the ideation process, that are built with real software and use real data, as a way of thinking and discussing a product idea early on. Or as Eugene Yan, Applied Scientist at Amazon, puts it:

“Sometimes, well-thought-out proposals backed by extensive research and data will fail to convince decision-makers. But a demo of a simple prototype will get them excited and ready to commit. I’m surprised how often this happens.” — Eugene Yan

FABIEN: Or as designer and urbanist Dan Hill would say: “Let the prototype do the deciding”.

I work with an even wider definition of discovery-driven prototyping that goes beyond writing code. In fact, the prototype can be anything that tells a tangible story about a service that does not exist yet: a Quick Start Guide, an Unboxing Video, a City Guide, a Product Catalog, a Map, an FAQ Page, a Repair Manual, News Articles, etc. Recently, we produced for a start-up their Annual Report from the future. That prototype offered the co-founders a unique way to pre-visualize opportunities, risks and consequences of their ideas before writing any line of code. Inspired by my colleague Julian Bleecker, who wrote in 2020 his start-up’s Annual Report — from 2024, it encouraged detailed discussions for decision-making on the company’s use of data, on the role of data science. Rather than pitching forward into the future, you report back from the future as if it has happened. At the Near Future Laboratory, we call this approach Design Fiction.

The OMATA Annual Report from 2024… written in 2020. Image courtesy of Julian Bleecker.

Besides Design Fiction, I have observed different flavors of discovery-driven prototyping applied in major organizations. For instance, when acting as Engineering Director at Google, Alberto Savoia used to talk about pretotyping. Or Werner Vogels, CTO at Amazon, articulated in 2006 already the company’s “working backwards” routine.

What these organizations have in common is that they understand that decision-making is about framing the situation, not choosing the best options. They excel in seeing around corners, not in taking the best decisions. They explore to come up with the alternative to choose from. Yet, even with these examples, it is still extremely hard to convince people who only rely on ROIs and KPIs to run their organization.

The impact of discovery-driven prototyping is hard to report in a spreadsheet, but practice is essential for “de-risking” (e.g. quickly recognizing and correcting bad decisions). Think about the value of finding out early enough that your assumptions were wrong. Or, what’s the value for an organization to anticipate a blindspot in their data? What is the value of missing an opportunity? So, exploration has a cost: the cost of doing it versus the cost of not doing it. And, as you underlined before, the cost of prototyping things has never been cheaper than today.

I understand the constraints of a business with its short-term KPIs and incentives, but what is your experience of fighting skepticism?

JOSE: When choosing a prototyping strategy, there are many challenges to overcome, ranging from how teams are organized to communication. I explain that the main purpose of discovery-driven prototyping is to ensure that valuable ideas make it to the next phase (and, conversely, that not so good ideas can be stopped or refined before going too deep into the development phase). What works changes from organization to organization.

I’ve read about companies developing prototypes in 5-day “spikes” or others following well-established prototyping frameworks. These are punctual examples; I do believe every organization must experiment and find its preferred way of “harvesting” ideas and de-risking.

At BBVA AI Factory, I have been leading projects related to applied research, innovation or horizontal capabilities in the last years. I have come up with a list of good practices that can be applied before setting up a prototyping project which have an impact later on:

Shift the language: Focus on communicating the differential advantage that the prototype enables, not on the prototype itself or the underlying technology. Or, as illustrated by Hà Phan: “Fall in love with the question, not with the solution. There’s a huge difference between asking, ‘How might we optimize this shopping flow?’ vs, ‘How might we help users make repeat purchases?’.
Have an adoption plan: With the current assumptions, who will be the user of the prototyped artifact when it is finished? What benefit will they start experiencing?
Involve: Agree the plan with these potential users and involve them early on in the discussions, decision-taking or even as part of the prototype development team.
Expect accountability: At least one person must be designated to “care” about adoption of the prototype.

Finally, while that looks optional, I tend to advocate strongly for “democratizing” the outcome of prototyping initiatives (and, in general, any outcome of innovation and research initiatives). This can go from simple actions like an internal “show and tell” and external communication (I am a big fan of how Fast Forward Labs disseminates their findings), but it could also include offering the prototype for internal reuse through an inner source repository. Since prototyping is about learning, by democratizing the outcomes, you want others to build on your work and generate more learnings.

FABIEN: That last point reminds me how architect, designer and futurist Buckminster Fuller designed experiments as a route to knowledge and for network production and transmission of knowledge. That was 70 years ago (!) and he was already highlighting the issues with over-specialization:

“Of course, our failures are a consequence of many factors, but possibly one of the most important is the fact that society operates on the theory that specialization is the key to success, not realizing that specialization precludes comprehensive thinking.” — Buckminster Fuller

Buckminster Fuller experiment around geodesic domes. Image by author.

Continue the conversation…

You can get in touch with me through the regular internet endpoints. Each week I set aside a couple hours for Office Hours video call for anybody to talk about anything related to data science or design. We can discuss your early stage project, ways we might collaborate, a recent reading, a challenge you are facing, etc. 📅 You can book a 45-minutes slot in my calendar.

Fabien Girardin is an ambidextrous thinker/doer who bounces back and forth from foresight to technology and design. He is a founding partner of the Near Future Laboratory, a distributed network of accomplished practitioners best known as pioneers of Design Fiction. Former Co-CEO at BBVA Data & Analytics and Researcher at MIT. Father of 🧑🏻‍🚀 and🌙. You can follow him on Twitter and LinkedIn, or just get in touch.

Jose A. Rodriguez-Serrano is Senior Data Scientist and Program Manager at BBVA AI Factory. Formerly, he was Area Manager and Research Scientist at Xerox. Some years ago, we collaborated at the financial services company BBVA on the development of an “Edge” program. The ambition was to push the boundaries of the organization’s core capabilities through the development of prototypes with emerging trends in machine learning. Read some of the posts he co-authored, or follow him on Twitter and LinkedIn.

TDS Archive

Prototyping in Data Science

A conversation between Jose A. Rodriguez and Fabien Girardin on how data scientists can think ahead of the curve

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Published in TDS Archive

Written by Fabien Girardin

No responses yet