Yesterday we published an April Fool: Navigating the Data Ecosystem: A Revolutionary Analytics Architecture. The idea of writing a piece about a new architecture that could immediately make the latest cool buzzwords obsolete rounded my mind for a few weeks. However, I was scared of the effort required to write something that would have been fun and, at the same time, barely believable, at the point of raising doubts of seriousness if the reader forgot to check the calendar.
I had a plot ready but not a plan to execute and no resources (time) to invest.
Then, ChatGPT-4 came to the rescue.
My first prompt was:
Write an article about an innovative data analytics architecture that replaces the modern data warehouse. In this architecture, the existing data lake are integrated with data river, data glacier, and data sea. The data sea is the ultimate place where data is stored and retrieved for analytics.
Please note that I didn’t specify the role of data river, glacier, or sea. I took for granted that ChatGPT-4 is trained on data lakes, and in my mind, the only connection was the fish (not easy to find in a glacier, but pretty common in the other places).
The title was not the final one:
Title: The Future of Data Analytics: A Revolutionary Architecture Integrating Data Lakes, Rivers, Glaciers, and Seas
However, and to my surprise, the first answer already contained the correlation that seems obvious only after you read it:
- Data Lakes: The Familiar Foundation
ChatGPT training dataset likely includes the data lake concept, and I specified “existing” in my prompt. Not many surprises, but “the familiar foundation” is terrific. You start with a known place. My marketing soul is already smiling. - Data Rivers: Fast-Flowing Data Streams
I only felt dumb not to have thought about this in advance. It’s obvious that a river is the right place for the data streams, right? Indeed, this is where I started to pay attention. ChatGPT made a correlation that makes sense. Now my marketing soul was crying from happiness. How much brainstorming should have been spent to come up with this obvious conclusion? - Data Glaciers: Long-Term, Cold Storage
Another obvious consideration. The glacier is cold and long-term (ignore global warming for a moment, please). But relating this to the storage and creating a perfect matching is another level. This connection was not on my mind when I wrote the prompt. - Data Seas: The Ultimate Analytics Hub
You are hired, whatever you are. It’s so evident that this is the ultimate goal. “enable organizations … derive insights … unlock the full potential ….”. Just write what people want to read. Nobody wants to know about problems (unless they are nerds, which usually implies they have no purchase-decision power).
I only saw a problem: there was no fish. My initial idea was that fish was the connection. So I tried a new prompt:
Rewrite the article including a metaphor of data like fish, without naming any fish but by providing it as an implicit element.
It went in the wrong direction:
Title: Navigating the Ocean of Data: A Revolutionary Analytics Architecture Inspired by Aquatic Ecosystems
In retrospect, I didn’t specify it should have been credible. Let’s try to fix it.
Do not include aquatic. The metaphor should stick to data.
Still not going well:
Title: Navigating the Data Ecosystem: A Revolutionary Analytics Architecture Inspired by Nature
I should be more precise and more explicit.
The nature and aquatic terms are banned. Do not make direct references to fish as living objects but continue to treat data as it behaves like fish moving between the different layers of the architecture.
I have a delusional moment:
Title: Navigating the Data Ecosystem: A Revolutionary Analytics Architecture Inspired by Fluid Dynamics
Are you kidding me by replacing aquatic with fluid?
Do not use fluids and remove “inspired by” in the title.
We finally have a good title:
Title: Navigating the Data Ecosystem: A Revolutionary Analytics Architecture
The content was close to the final one, but it was still missing the fish as a data metaphor.
Rewrite the article considering all the constraints described before and provide some more indirect reference to the concept that data is like fish moving between the layers of this architecture.
This prompt generated the article we published. Here is the fish (or something like that):
- Data moves gracefully within the lakes, adjusting to the ever-changing landscape of the data ecosystem.
- Data in these rivers flows continuously and effortlessly, adapting to the pace of the business environment
- Data in these glaciers remains dormant but preserved, ready to be awakened and utilized when needed, like hibernating creatures emerging when the time is right.
- Data in the sea moves seamlessly among the different components, creating a rich, interconnected ecosystem, much like diverse life forms coexisting and interacting in the depths of the ocean.
I didn’t ask for eco-friendly architecture, but here we are: green, nature, harmony, business, everything works, no problems. I know, you nerds now hate me, but someone must do the dirty job, and now we have ChatGPT!
I can only say that it was fun. I would have never enjoyed working on this April Fool without a buddy like ChatGPT-4. While I can clearly see the limitations (which I hope we help to highlight by exposing the behind-the-scenes), I also see that the correlations found by ChatGPT are food for thought to generate new ideas.
A final credit: the graphics representing the Data Ecosystem has been altered with Stable Diffusion starting from a graphical input by Alessandro Perilli, who writes Synthetic Work, a (mostly serious) newsletter for non-technical people to understand how AI is changing the world.