Tech

Compelling and concise, highlighting the central debate around OpenAI’s new model.

It resembles a reverie of the year 2023. At the onset of the prior year, ChatGPT ignited a burgeoning fervor among the populace for discourse. Subsequently, there ensued an arms race in investment toward expansive AI models both domestically and internationally.

The dawn of the Year of the Dragon has ushered in a new era catalyzed by OpenAI. Presently, Vincent Video has captured the zeitgeist. In the early hours of February 16th, OpenAI unveiled the Vincent video model Sora. Sora boasts the capacity to craft intricate videos based on textual cues, elongate narratives within preexisting videos, and conjure scenes from static images.

While such applications have lingered within the technological sphere for some time, Sora’s debut remains spellbinding. The focal point within the video remains steadfast and malleable, affording viewers the liberty to switch perspectives from multiple vantages. Furthermore, there has been a breakthrough in temporal extension, with the capability to generate videos spanning up to 60 seconds.

Indeed, it is a testament to OpenAI’s ingenuity. Though Sora is still in its nascent stages of development, its introduction already signifies a pivotal juncture in generative AI.

Concurrently, fresh tidings emanate from the financial realm. Following the latest transaction, OpenAI’s valuation has soared to surpass $80 billion. This transaction emerged from a meticulously orchestrated tender offer by Thrive Capital.

While neither OpenAI nor Thrive Capital have officially remarked upon this development, a post shared by the Xiaohongshu blogger “Shannon” yesterday epitomizes the prevailing sentiment: “OpenAI’s Sora will feature prominently in all fund-related discussions tomorrow.”

However, in contrast to the previous year, a shift in investor sentiment is palpable. Not all have the opportunity to invest in OpenAI, yet the boundaries delineating OpenAI’s capabilities have unequivocally impacted a myriad of startup enterprises and the investors therein.

“Natural evolution” versus “audacious innovation”

Primarily, it is imperative to elucidate that the “Vincent Video Model” is not a nascent endeavor. Prior to OpenAI’s emergence, nearly all leading model developers had introduced their own iterations of Vincent video models, including luminaries such as Google’s Lumiere and Stability AI’s SVD (Stable Video Diffusion). Even unicorns, specializing in multimedia content creation models, had already been conceived. Notably, Runway, the progenitor of the expansive video generation model Gen-2, garnered a valuation exceeding $1.5 billion subsequent to the completion of Series C financing, with notable participation from Google, Nvidia, and Salesforce by the end of June 2023.

Crucially, extant “video large models” exhibit a subtle semblance to productivity tools.

Consider Runway, for instance. Diverging from the typical modus operandi of many “technology-centric” large-model startup teams, the three founders of Runway—Valenzuela, Alejandro Matamala, and Anastasis Germanidis—hail from the New York University School of the Arts. Recognizing the “potential of artificial intelligence in creativity,” they embarked upon a journey to devise a suite of tools catering to filmmakers and photographers.

Endowed with this intrinsic genetic predisposition, Runway’s developmental trajectory bears semblance to the “epic narrative of Hengdian Film and Television City” rather than the conventional “garage culture” of technology firms: initially, they devised an array of professional creator aids that comprehensively fulfill various needs encompassing video frame interpolation, background removal, blur effects, motion tracking, and audio classification; subsequently, they partook in the development of Stable Diffusion, a voluminous image generation model, accruing expertise in static image generation and securing involvement in blockbuster productions such as “The Instant Universe”—these cinematic endeavors have been meticulously detailed in their press releases. Producers extolled their involvement for enabling post-production teams to operate at an unusually efficient scale, while industry peers marveled at the expedited production timelines for intricate special effects sequences (such as the conversing stones in “The Instant Universe”), which were reduced from “several days” to mere “minutes.”

By February 2023, Runway had unveiled their inaugural product Gen-1, accessible to the general populace free of charge via iOS devices. In addition to filter-style functionalities such as “real image to clay” and “real image to sketch,” it encompassed “Text to video,” thereby rendering Gen-1 the premier commercially deployed large-scale Vincent video model; come June 2023, they introduced the successor Gen-2, with the training dataset augmented to encompass 240 million images and 640 thousand video clips.

In August 2023, the AIGC endeavor titled “The Wandering Earth 3 Trailer,” which ignited fervor on Bilibili with an audience exceeding 10 million across the network and earned accolades from Guo Fan, was crafted upon the foundation of Gen-2. As disclosed by the luminary digitallifekazik on his personal social platform, the video’s conception was bifurcated into two distinct phases: initial storyboards crafted by MidJourney, subsequently fleshed out into 4-second vignettes by Gen-2. The culmination of this endeavor yielded 693 elemental frames and 185 supplementary clips, consummating over a span of five days. Half a year thereafter, digitallifekazik orchestrated yet another cinematic marvel, “The Last Goodbye,” a poignant three-minute narrative opus fashioned through the utilization of “MJ V6 Storyboard-Runway Running Video,” tendered for consideration at Runway Studios, the venerated custodian catering to the discerning needs of enterprise-level patrons, during the second iteration of the AI Film Festival, Gen48, orchestrated under the aegis of the Department of Services.

By all accounts, it becomes manifest that as far back as a year prior, the “Wensheng Video Model” had already garnered substantial traction, its current user base vastly surpassing initial estimations.

Valenzuela, the luminary co-founder of Runway, disclosed subsequent to the Series C financing round that, alongside titans of industry like New Balance from the Fortune 500 cohort, they countenance a legion of individual artisans in their clientele.

Moreover, the feat of “precise verisimilitude to the physical realm” exhibited by Sora is not an exclusive purview of OpenAI. Mere days following Sora’s debut, on February 18, Musk proffered commentary on the technological podcast “DrKnowItAll,” espousing Tesla’s near-attainment of analogous capabilities in virtual world generation, albeit lamenting the mundanity of footage derived from vehicular cameras.

Concomitantly, OpenAI’s seminal treatise “Video generation models as world simulators,” published concurrently, expounds upon Sora’s guise as a “data-driven physics engine,” fostering the high-fidelity simulation of denizens, fauna, and sundry objects within physical or digital environs. Nonetheless, it elucidates inherent constraints, universal to all counterparts, such as the arduous task of faithfully simulating intricate physical phenomena and discerning causal relationships. For instance, in a simulated vignette of a person partaking of a confection, the resultant image may not inevitably manifest bite marks upon the biscuit.

What truly imbues a sense of awe is Sora’s meteoric pace of evolution.

From a technical standpoint, the realization of a “real-world emulation with meticulous adherence to physical laws,” the facilitation of “60-second video synthesis,” or the manipulation of “multi-camera singular footage” are commendable feats. Yet, juxtaposed against the earlier iteration of Gen-2, which languishes in the realm of “4-second video creation” beset by conspicuously evident frame drops akin to a ponderous slideshow, it becomes apparent that Gen-2’s advent in June 2023, a mere eight months post-Sora’s inception, appears quaint.

In November 2023, Meta unveiled the formidable video generation model, Emu Video, seemingly advancing beyond the capabilities of Gen-2, affording refinement at 512×512 resolution and 16 frames per second. Yet, a mere three months hence, Sora eclipsed this achievement, boasting omnifarious video generation potential, as expounded upon in the developer’s technical exegesis. Sora demonstrates prowess not only in image and video manipulation, encompassing tasks ranging from loop creation to dynamic temporal extension and background substitution.

Should one endeavor to rationalize this prodigious pace of advancement, aside from invoking the specter of a “mysterious extraterrestrial civilization,” the most cogent explanation perhaps lies in the ostentatious squandering of financial resources.

As a principal figure at Runway, Stability AI has grappled intermittently with “cash flow exigencies” over the preceding biennium. Rumors once swirled regarding the active solicitation of the company’s sale by senior management, while internal communiqués from early investor Coatue Management bemoaned the disquieting fiscal state, urging the immediate resignation of CEO Emad Mostaque. Most disconcerting was the conjecture surrounding Mostaque’s utilization of personal assets as collateral to assuage Amazon’s apprehensions regarding potential default on cloud service dues totaling a staggering $75 million.

Notwithstanding, from a financing vantage, Stability AI has seemingly attained the zenith of its trajectory. Having secured in excess of $100 million in funding by October 2022, it promptly ascended to unicorn status. In a candid interview the previous July, Emad Mostaque, consumed by righteous indignation, decried the profligacy precipitated by entities such as Bard AI, responsible for incurring losses exceeding $100 billion daily by virtue of disseminating misleading promotional content. He prognosticated that the aggregate investment requisite for the establishment of an intelligent infrastructure may crest at $1 trillion, thereby constituting the most monumental speculative bubble in human annals.

On Zhihu, an entrepreneur under the sobriquet “Pixel Alchemist” bared his disquietudes subsequent to witnessing Sora’s advent: “I harbor a creeping apprehension that the offerings of technological behemoths shall whizz past like a juggernaut, leaving naught but ephemeral traces of my own creations, akin to wayside foliage in this epoch wherein technological strides resemble a fleeting spectacle.”

Valuation of 80 billion and the frontier of preeminence

In any event, OpenAI has once more reaffirmed its stature as the AI “Big Mac.” Its seemingly boundless capacity has propelled its valuation to triple in less than 10 months. According to data from CB Insights, OpenAI presently stands as one of the most esteemed technology startups globally, trailing only behind ByteDance and SpaceX.

While enhancing model capabilities, OpenAI is also advancing a diversified strategy. Particularly in the semiconductor domain, Altman is engaging with various stakeholders, including potential investors, semiconductor manufacturers, and energy suppliers. He is even contemplating establishing a new entity independent of OpenAI to venture into the AI chip industry.

This transaction also underscores that Altman remains an indispensable figure for OpenAI. Initially, OpenAI intended to finalize the latest financing arrangement in November last year, but at that juncture, Altman was ousted. Whether the deal was impacted remains unclear, but the consequence was that more than 700 of the 770 employees ultimately signed a petition demanding his reinstatement.

A closer examination of this round of financing reveals that instead of issuing new shares, OpenAI employees are permitted to vend their shares. This practice is not unprecedented for OpenAI. In 2023, venture capital behemoths such as Thrive Capital, Sequoia Capital, Andreessen Horowitz, and K2 Global similarly adopted this approach and partook in OpenAI’s tender offer. At that time, the company’s valuation had soared to US$29 billion.

So, where lie the frontiers of OpenAI?

This query not only pertains to OpenAI’s valuation but also to the prospects for generative AI startups, both grand and diminutive.

Initially, within the realm of video generation, several overseas startups have already secured their positions. Foremost among these is Runway, as mentioned earlier. Another frontrunner is Pika, founded in April last year, which announced in November that it had concluded Series A and angel round financing totaling US$55 million, with a valuation of US$250 million. Pika was co-founded by Guo Wenjing and Meng Chenlin, both doctoral candidates at the Stanford University Artificial Intelligence Laboratory, boasting impressive credentials. Guo Wenjing is also recognized as the “Chinese-American prodigy.”

Will OpenAI exert influence on these enterprises? Fret not, as Sora’s public debut has already prompted some overseas bloggers to compare the products of several companies. They input the same prompt into the four models Sora, Pika, Runway, and Stable Video. The consensus is that Sora holds significant advantages in terms of generation time and coherence.

It must be acknowledged that these companies generating Vincent Video have all developed their own expansive models, rather than merely crafting application scenarios reliant on others’ expansive models. Nonetheless, even with a technical moat, it is not a trivial task to withstand OpenAI’s impact.

Naturally, this does not imply that pure application companies lack prospects entirely. There might be developmental stage challenges underlying their endeavors.

Last year, two partners from Sequoia Capital once again penned an article revisiting their perspectives on the market from a year prior. One of the erroneous predictions they highlight is that vertical segmentation has yet to materialize. “We still believe there will be a segmentation between application layer companies and fundamental model providers, with model companies concentrating on scale and research, and application layer companies focusing on products and UI. However, in practice, this segmentation has not unfolded neatly. The most successful initial user-facing applications have emanated from vertically integrated companies.”

The scenario domestically mirrors this trend.

Certain investors have intimated to me that an AIGC enterprise they are monitoring is also developing a bespoke model grounded in specific industry data, rather than merely invoking others’ APIs. “Otherwise, it is arduous to anticipate them achieving true differentiation at the application level.”

Reflecting back, each technological stride of OpenAI over the past year has expanded the realm of capital’s imagination, yet concurrently obstructed the pathways for some startup firms.

“AGI poisoned the software industry last year, leading to its slow demise. Now, the public is merely witnessing the poisoning unfold,” remarked an entrepreneur in a social circle while forwarding a message about Sora.

Consequently, AI investment poses challenges, particularly at the application layer. “The crux lies in clearly delineating what can benefit from the evolution of expansive models and what will be rendered obsolete during this evolution,” an AI investor vaguely articulated to me. Yet, OpenAI’s supremacy renders this pivotal issue far from easy to prognosticate.

Revisiting the expansive model realm, Zhipu AI’s valuation surged more than sixfold last year, with certain investors pegging it at 20 billion. I have also recently caught wind that Baichuan Intelligent and MiniMAX have garnered news of fresh rounds of financing. It’s inevitable for these enterprises not to feel apprehensive about OpenAI’s latest maneuvers. Fortunately, they possess ample resources. The advent of Sora undoubtedly heralds a new era of catching up.

With Sora’s burgeoning popularity, AI-related concepts are poised to witness another wave of speculation, particularly entities akin to Nvidia, who function as water bearers and stand to amass substantial profits once more. Yet, for entrepreneurs and investors in the primary market, I can only advise: for the interim, persist in your endeavors.

error: Content is protected !!