Nvidia uses artificial intelligence to turn 2D photos into 3D models that allow users to improvise

  Nvidia is using artificial intelligence to let designers, game developers, and more create with 3D objects in a fraction of the time. Creators can use its proposed NVIDIA 3D MoMa method to quickly import, modify, and change materials for content.
  ”Inverse rendering is a technique for reconstructing a series of still photos into a 3D model of an object or scene. This technique has long been critical to unifying computer vision and computer graphics,” said David Lubeck, vice president of graphics research at NVIDIA. “By formulating each part of the inverse rendering problem as a GPU-accelerated differentiable component, the NVIDIA 3D MoMa rendering pipeline uses the mechanisms of modern artificial intelligence and the raw computing power of MoMa’s GPUs to rapidly generate 3D objects that creators can use in existing Unrestricted import, editing and extension within the tool.”
  Traditionally, people have used photogrammetry to create 3D objects, which involves a multi-stage, rather time-consuming process. Creators need to go through a lot of software tools and manual adjustments to achieve the final desired 3D model effect.
  Current techniques such as neural radiation fields achieve certain advantages in generating 3D representations of objects or scenes and provide high-quality synthesis of new views.

  However, these methods typically produce representations that entangle geometry, materials, and lighting into neural networks and cannot be generated in a triangular mesh format, making it difficult to support scene editing operations. “The triangular mesh is the basic framework for defining shapes in 3D graphics and modeling, and is the common language used by such 3D tools,” the researchers wrote in a blog post on Mo Weida’s official website.
  In addition, to be more practical, 3D objects should be adapted to many common tools, such as game engines, 3D modelers, and movie renderers. Whereas to use them in traditional graphics engines requires methods such as marker cubes to extract geometry from the network, which can result in poor surface quality, especially at low triangle counts. Neural network-encoded material cannot be easily edited or extracted into a form compatible with traditional game engines.

  In contrast, this study reconstructs 3D content that is compatible with traditional graphics engines, supporting restarts and scene editing. The resulting 3D models can be deployed on any device that supports triangle rendering without conversion, including mobile phones and web browsers. It can be used unmodified in standard game engines with gradient-based optimization for all stages.
  A related paper on the 3D reconstruction method NVIDIA 3 DMoMa was recently published at the 2022 Computer Vision and Pattern Recognition Conference and submitted on arxiv under the title “Extracting Triangular 3D Models, Materials and Lighting from Images”.
  The researchers evaluated their system against a variety of applications, re-editing and simulating existing objects to demonstrate that their approach explicitly decomposes into triangular meshes and materials, and does it with neural radiation fields, neural reflex decomposition, etc. comparison.

  It is worth mentioning that the researchers also made a virtual band video demonstration to demonstrate the power of the Nvidia 3 DM o Ma.
  First, they took about 100 images of musical instruments from different angles and reconstructed these static images into a 3D representation and a triangular mesh, respectively, using the newly proposed research method.

  Then, the objects were separated from the original scene and imported into the Nvidia Omniverse3D platform for editing. In widely used graphics engines, the resulting shape materials can be easily replaced with different materials such as gold, wood, etc., just like dressing up the mesh into different decorations, and it can also be placed in any virtual scene (such as Cornell. er box, a classic graphics test).
  It turned out that different virtual instruments respond to light differently, almost the same as in reality. Newly generated objects can be used as building blocks for complex painting scenes.

  Notably, the researchers also stated in the paper: “In order to speed up the optimization, a simplified shading model was chosen, which also did not take into account global illumination or shading. This choice is a limiting factor for material extraction and recycling. In future work , with the current progress in differentiable path tracing, this limitation is expected to be lifted.”
  Overall, this study demonstrates an approach that is comparable to state-of-the-art techniques such as view synthesis, while also having the advantages of optimized Triangular meshes, compatibility with legacy graphics engines and modeling tools, and end-to-end optimization driven by the appearance of rendered models.
  This simplifies numerous workflows for those who create 3D content, saving them time and increasing efficiency. The research method also acts as an appearance-aware transducer, complementing many recent techniques.