Why It’s Extremely Hard to Start an AI Application Business with Large Language Models
The media hype surrounding AI might have you believe it’s soaring high above us, on the brink of rendering all human jobs obsolete and poised to rule the world. However, this is merely hype.
The truth is, starting an AI application business based on large language models is extraordinarily difficult. I believe that within six months, most startups purely focusing on large language model applications are likely to fail.
Feel free to disagree and comment blow.
One major difference between today’s AI era and the past mobile internet age is that most essential user needs are already well met. From clothing and food to housing and entertainment — today’s market is oversaturated.
For AI to be impactful, it must address a genuine user need or pain point, which is increasingly hard to find. Most of today’s AI applications don’t address critical pain points; many seem somewhat superfluous, mostly tackling “problems” we convince ourselves exist.
Even if you identify a marginal pain point within a needed scenario, AI can’t provide a 100% stable and reliable solution. Although AI capabilities like GPT-4 are robust, they often fall short of our expectations. If you look at the evaluation metrics carefully, in most cases, the accuracy is around 80% to 90%.
Many product functionalities require multiple LLM API calls. Thus, an 80% success rate squared results in only 64% reliability. If a product only works 7 or 8 times out of 10, do you think users will continue using it?
In today’s oversupplied market, users become incredibly impatient, users’ time is incredibly valuable. A low success rate will quickly deter them from continuing to use the product.
A common benchmark for innovation is that a new product should be 10 times better than existing solutions. Achieving this is exceedingly difficult with AI’s current 70–80% success rate.
Even if you find a critical pain point and can offer a solution that is ten times better, most users might achieve similar results with ChatGPT anyway. Another challenge today for AI applications is that everyone in this space is essentially competing against ChatGPT.
Most AI applications’ issues can be resolved directly through ChatGPT, which offers greater flexibility through conversational interactions. Wrapping a large language model in a user interface often reduces this flexibility.
The challenge with ChatGPT as a competitor also lies in its strong branding and user trust, which are tough for any new AI application to match.
Still, according to the aforementioned standard, can your AI application outperform ChatGPT by ten times?
Consequently, many AI applications ultimately fail against ChatGPT.
Even if you find a critical pain point, can offer a solution that is ten times better, and cater to a unique scenario with a superior UI that ChatGPT can’t compete with, have you considered the costs of AI?
Given the suboptimal accuracy, enhancing the success rate of an AI application often requires extensive prompt engineering, chaining of transformations, few-shot examples, and even multi-agent coordination. This means using a significant amount of tokens per interaction.
Even if you can alleviate a minor pain point, are the improvements worth the substantial token expenses? How will you generate profit to subsidize these costs? Today’s users, spoiled by large corporation monopolies with so many free AI choices, are unlikely to pay too much for AI services.
Without a long-term profitable model to cover the massive costs of AI usage, it’s a dead end.
Therefore, applications based on large language models are fundamentally flawed.
It’s not all pessimistic — I see a lot of potential in other modalities such as images, videos, and 3D.
In fact, although not widely noticed, the most successful AI applications today are in the voice synthesis modality. Voice synthesis doesn’t solve problems; it generates new traffic. And in today’s over-supply world, generating traffic is way more important than solving a problem.
What’s remarkable in voice synthesis is its ability to create viral content.
What do you think? Do you agree? Welcome discussion!