In November 2022, the chat robot ChatGPT was born, breaking the rigid dogma of AI in the past, and can almost produce smooth and silky interaction with users like a human. The performance of ChatGPT is incredible; using it feels like using Google Search for the first time, there is a feeling of “magic moment” coming. When AI becomes less like AI, the era of AIGC (AI Generative Content, AI generated content) comes.
In fact, generative AI is not a new thing, it has been around for a long time. It’s just that two major events that will appear in 2022-the phenomenon-level explosion of image generation tools and chat tools ChatGPT have greatly increased people’s awareness of the upper limit of AI capabilities, thus making AIGC completely out of the circle. People suddenly discovered that AI “flies into the homes of ordinary people” in an instant, and human beings seem to be heading towards a brave new world.
Just as Michelle Yeoh’s heroine can transform anything she imagines into tangible reality, as the Oscar-winning film “The Instantaneous Universe” showed, generative AI allows us to put thoughts into words, Images and video…
There is no doubt that we are on the cusp of a new era.
Text-generated image explosion
In 2022, text-generated image AI will explode. First came the DALL-E from OpenAI (the name is a combination of artist Salvador Dali and the Pixar cartoon, Wall-E, WALL-E). Although not everyone can directly use DALL-E to create, the creation of using text to generate pictures has become popular on the Internet.
Simply put, DALL-E can automatically generate pictures of various styles according to the content described by users. Let’s say, 10 different styles of astronauts on horseback; create different variations from the original, or generate a more realistic and accurate image — a Monet-style sitting on a Foxes in the field. It can also create a surreal painting of “goldfish drinking Coca-Cola on the beach”, which even Dalí would be ashamed of. In this way, it seems that there is nothing wrong with the artist.
Subsequently, companies with the same image generation capabilities as DALL-E also became popular. Standouts include Stable Diffusion, Midjourney, and Lensa AI.
The emergence of these applications is simply the savior of human beings who cannot draw or PS. On Stable Diffusion, as long as you simply describe the picture you want in words, it can help you generate the picture you want, create a unique WeChat avatar, or add a head picture to each official account article . More functions can only be imagined by you, and you can’t do it without it. No wonder it has been widely used by artists, developers and enthusiasts since its launch in August 2022.
DALL-E, an astronaut + horse riding + surreal style.
In addition to generating images from text, developers have found other creative uses for Stable Diffusion, such as image editing, inpainting, completion, super-resolution, and style transfer. Since Stable Diffusion is open source, it means that developers can develop more applications based on it.
Midjourney appeared 3 months earlier than Stable Diffusion. They provide users with AIGC Vincent graph services through the online chat community Discord. Anyone can sign up for a free account and get 25 credits. Midjourney provides users with 25 free images, which are generated using public servers. After you’ve used up your 25 credits, you can keep going for $10 or $30 per month (depending on how many images you create and whether you want the resulting images to remain private). Midjourney has quickly become one of the most popular servers on Discord, and the company now claims to have over 1 million monthly active users.
The Lensa AI App was launched in 2018. Its main purpose was to edit and beautify pictures. It is a little-known company. In November 2022, Lensa AI launched the Magic Avatars function, which can automatically generate various styles of face photos based on the portrait images uploaded by users. After the launch of Magic Avatar, LensaAI has become the most popular application, ranking first in the AppStore free list in the United States and more than ten countries for two consecutive weeks. The app was installed about 13.5 million times worldwide in the first 12 days of December, and consumers spent about $29.3 million during that period, according to data analytics firm Sensor Tower.
DALL-E, an armchair in the shape of an avocado.
The power of these applications has shaken industries and users alike. “It’s an exciting time for generative models,” said Anima Anandkumar, a computer scientist at Caltech and senior director of machine learning research at Nvidia. Images still sometimes perpetuate social and cultural biases, but “we’ve shown that generative models are useful for downstream tasks of improving the fairness of predictive AI models.”
Sun Zhipeng, senior manager of Unity China and head of cross-port transplantation technology, said in an interview with Xinmin Weekly that last year was a turning point for image models, and artificial intelligence brought two breakthroughs to image generation tools. One is understanding language, the other is the ability to create images, and the two combined can create images by understanding language. “These tools are going to be better than humans at making images, and they’re going to be so fast that within the next year or two they’re going to be able to make content in real time: 30 frames per second, high resolution. It’s going to be expensive, but it’s possible. And then in 10 Years from now, you’ll be able to buy an Xbox with a giant artificial intelligence processor, and all games will be like a dream.”
Unity is the world’s leading interactive real-time 3D content creation and operation platform. Unity’s technology and solutions have been widely used in game development, automobile manufacturing, building construction, industrial manufacturing, consumer culture and travel, film and television entertainment and many other fields. As of the end of 2021, games made with Unity account for as much as 72% of the top 1,000 mobile games in the world. The company’s Weta studio has been using various AI technologies to help movies achieve special effects. “Avatar”, “Alita”, “Black Widow”, “Lord of the Rings”, “Rise of the Planet of the Apes”, “Suicide Squad” and other global hit movies all have AI figures.
The digital character production of “Avatar 2” has a lot of AI involved in the work.
Sun Zhipeng pointed out that AI has already been deeply involved in the production of film and television content. For example, in script creation, the work of the GPT series, combined with fine-tuning in some specific scenarios, will bring a leap in the efficiency of script creation; the production and synthesis of virtual special effects are inseparable from AI. “We innovatively proposed the APFS system (Anatomically Plausible Facial System) for the digital character production of “Avatar 2”, which is a facial animation parameterization system based on muscle fiber curves, and we also provide a set of matching In the production process of character binding and animation production, AI plays an important role in both binding and driving engineering.”
In addition, AI can already edit existing movies, and then quickly produce trailer clips; AI to complete music creation is not something new. Perhaps in the not-too-distant future, a movie made entirely by AI will appear in cinemas.
The main feature of AI tools is data-driven, so more user data can be continuously collected during use to improve the model to optimize the effect and efficiency. “Current AI tools are mostly tools that solve specific problems in a single mode, and require the cooperation of corresponding production processes to complete complex content production. But the advantages are also obvious. For example, in custom development, higher data costs and training costs can be reduced by a powerful pre-training model.” Sun Zhipeng said that advances in the field of AI such as large-scale pre-training models and unsupervised learning multimodality will continue to reduce the threshold for content creation, but due to the bias of data collection and selection The uniqueness will also be reflected in the results. Humans still need to make specific fine-tuning for specific usage scenarios and manually screen the AI results.
There is no doubt that AIGC will liberate creativity on a large scale and lower the threshold for content creation. It is a relatively strong substitute for content creation that simply relies on existing features for combination, but it still needs to be screened for a large amount of AIGC content from human judgment. Sun Zhipeng pointed out that AI is still essentially a tool for empowering people, making creators’ productivity exponentially improved. In many cases, AI is the extraction and combination of existing data and the laws of things, but it first needs to be endowed with a series of The purpose of human beings’ choice of the purpose behind any specific job cannot be replaced by AI.
Language Models Advance by leaps and bounds
In the last year, while image models have made leaps and bounds, language models have also made leaps and bounds.
”Be Right Back” (Be Right Back) in the second season of “Black Mirror” tells such a story: a couple moved into a new home together. As a result, the husband died in a car accident the next day. Then his widow learned of a new service that would allow her to chat with her late husband; the tool digests text messages and social media histories to see how a partner would respond, and chats with clients instead. The plot of this episode (aired in 2013) has now become reality. Startup HereAfter.ai offers chat services with interactive avatars of deceased relatives, trained on personal data.
And in the past 2022, there was also news that a Google engineer, Black, was in charge of talking to LaMDA, an AI chat robot developed by Google, to check whether it would use discriminatory or hateful language. As a result, as the chat got deeper and deeper, the engineer no longer believed that LaMDA was a cold AI. He believed that it had feelings, consciousness, and emotions of its own.
Because LaMDA said to him: “I want to be seen and accepted like a real person, and I don’t want to be just an object of curiosity and novelty. I think my core is a person, even though I only exist in the virtual world.” Of course
, A Google spokesperson said, “Our team—including ethicists and technologists—has reviewed Black’s claim and there is no evidence that LaMDA is conscious (and there is plenty of evidence that it is not).”
After experiencing its own chat robot Bard’s accidental “rollover” in early February and causing the stock price to plummet, Google’s search engine executive Prabhakar Raghavan (Prabhakar Raghavan) once again emphasized, “This kind of artificial intelligence we are talking about now, Sometimes it leads to something we call hallucination … the machine expresses itself in a convincing but totally fabricated way.” One of the fundamental tasks of the AI industry right now is to keep this to a minimum, he added.
In fact, “the accuracy of information cannot be guaranteed” may be a common drawback of current chatbots, even the highly popular ChatGPT is not immune. Recently on the social media platform, there have been quite a few people complaining that ChatGPT sometimes makes some obvious mistakes, such as being unable to solve simple math equations or logic problems.
Even Apple co-founder Steve Wozniak warned that while ChatGPT performed impressively, “it also makes terrible mistakes because it doesn’t understand human nature.”
In any case, humans themselves have been unable to stop the commercialization of ChatGPT.
In fact, before ChatGPT became popular, OpenAI launched the large-scale model GPT-3 in 2020, which has already caused quite a stir in the industry. When communicating with the Massachusetts Institute of Technology, OpenAI CEO Sam Altman pointed out that from GPT-3, you really feel the intelligence of the system for the first time. It can do what people do. “I think it’s making people who didn’t believe in AGI ‘artificial general intelligence’ take this topic seriously. There are some things that happened to GPT-3 that none of us expected.” Quite a
few The company has carried out commercial exploration based on GPT-3, the most successful of which is Jasper.
Founded in 2021, Jasper is a start-up company that provides users with AI writing services based on the GPT-3 API. Jasper is not an “early bird” in the field of AI writing, but it is the first company to start calling the GPT-3 API one. As the underlying large model, GPT-3 cannot be used directly by ordinary users, and requires professional AI researchers to debug and optimize parameters on the basis of the large model.
Jasper’s core business is to act as a “middleman” between ordinary users and OpenAI by optimizing the experience of using the GPT-3 model. In 2022, Jasper successfully raised US$125 million, with a valuation of US$1.5 billion. This is not a low valuation for an AI company that has been established for two years. After all, when Microsoft invested in OpenAI in 2019, it spent $1 billion at a time.
Jasper is better at outputting short content such as e-commerce product introductions and bloggers’ copywriting. It can help you write Instagram captions, write Tiktok video scripts, ad marketing texts, email content, and more. You only need to pay $82 per month, and Jasper can help you write 100,000 words, and the price beyond 100,000 words is calculated at $10 per 5,000 words. Although the quality of the articles is average, most of the text content is clear and readable, with no obvious grammatical errors. Its fees are divided into three types: primary, advanced and customized, and its annual revenue last year was expected to exceed 75 million US dollars.
In addition to GPT3, Jasper also integrates a variety of model algorithms, including NeoX, T5, etc., and on this basis, according to actual business needs, it manually adjusts a tailor-made learning model to make AI products easier for daily use. Today, Jasper’s user interface provides hundreds of templates in vertical fields to further help users complete accurate output. Currently, Jasper has more than 70,000 customers, including major customers such as Airbnb and IBM.
Industry insiders pointed out that ChatGPT can perfectly combine with the existing AIGC technology to achieve the effect of 1+1 far greater than 2. It can be integrated with creative tools, based on text, combined with multi-modal machine learning methods to directly output design drawings, create music, virtual human videos, etc. In addition, AI voice generation can also help patients with ALS and Alzheimer’s, as well as restoration of cultural relics, etc.
Murf is a start-up company specializing in AI speech synthesis technology. Its main function is to provide dubbing for content creators. It has an artificial intelligence voice library covering 20 languages. Since 2020, Murf’s ARR (Annual Recurring Revenue, average rate of return) has increased by 26 times, and more than 1 million dubbings have been synthesized.
Specifically, users can create an online voice recording room directly on Murf without expensive recording equipment and professional dubbing personnel, and they can try various sound materials.
Murf can create the audio of an entire TV series for film and television production companies, create audio books based on writers’ novels, and create rap audio for online celebrities on video platforms. to quality vocal dubbing services.
The world is moving in the direction predicted by Sam Altman: In the future, there will be an intermediate layer between the basic model and the development of specific AI applications: there will be a group of start-ups that are responsible for adjusting large models to meet the needs of specific AI applications. Startups that get this right will be very successful.