The Road to Substance Modeler: VR Roots, Desktop Reinvention

I volunteered to join the Substance Modeler team less than a year after it came to Adobe, from Oculus. The team was really strong, and it had hired a couple of top tier engineers to beef up the roster. On my end, I was a confident "toolsmith" and I really wanted them to succeed in their push to have a usable Beta in a very tight schedule. A few months later they surprised me by asking me to stay as a tech lead. While I am usually quite confident in my skills, it sure felt intimidating to be the tech lead of such a team, full of experts in VR and sculpting (I was neither). Nonetheless, I pulled up my sleeves and got to work, because building this gem wasn't going to happen overnight, and it wasn't going to be easy.

Modeler's engines

In my previous post I wrote about the implications of deferring topology. Working on such system has a tactile immediacy that is quite delightful. The speed you can create and carve complex primary and secondary shapes feels quite amazing. But creation speed only happens if we have a geometry engine which can support the level of detail an ambitious artist demands.

Modeler ships with a sculpt engine and a rendering engine. The two work together to define our geometry real time pipeline. While Mediums feels quite different than Modeler, these two components were both initially inherited from it, and improved with the brilliant and diligent work of our engineers.

Sculpting engine

Let's dig in a bit and see how the digital sausage is made, so to speak. Modeler's sculpting engine is a library used to author Clay layers, the basic building block of an art piece. A Clay layer is primarily a hierarchy of blocks containing voxel data - a small block is 8x8x8 voxels. The ground truth is captured in the voxels. Voxels represent a narrow band signed distance field (SDF), meaning the value of a voxel is the signed distance from the closest surface, where 0 is the place where the surface lies. The level set data is clamped between -2.0 and +2.0 so many of the constant field blocks far from the zero isosurface can be compressed to a single value. Clay objects also have an associated GPU polygon representation, generated with a "marching cubes" or a "dual marching cubes" technique.

Next to this level set sits a color voxel grid, which is only defined on boundary areas (no volumetric color inside the objects). At a quick glance, this data structure is similar to OpenVDB, though the hierarchy details and the disk storage differ.

In terms of space, the data of each Clay layer is logically unbounded, and that is a great improvement from Medium, where artists had to carefully manage their layer bounding boxes. In Modeler, they can just explore and create shapes of any size without worrying about their distance from the origin, regardless of the voxel density.

When an artist operates on clay using one of Modeler's tools, the engine can quickly detect which blocks were affected and only recompute the necessary pieces of the associated polygon mesh. The LODs are computed immediately, too, by downsampling the SDF and using the same marching cubes algorithm.

Here's the kicker. Initially we thought that volume authoring was the most expressive tool for an artist, but it turned out that high precision tasks like micro-displacement and crease control, operating on the surface rather than the volume gives better control. So over time we added or rewrote some volume tools into surface tools.

Here's how surface tools work in a world, where the ground truth is still voxels: we work on the associated GPU polygon mesh directly. Small vector offsets are applied on each vertex in the area of effect of the tool, then the voxel grid is quickly recomputed in the areas affected. The respective portion of mesh is immediately recomputed so we always have a fresh mesh with uniform tessellation. During many sculpting operations, this happens many times per second, reaching buttery real time update rates. When the operations are very frequent and subtle, there is a minor numerical drift. Initially the drift caused some small blocky artifacts, but we were able to reduce it down to a slight smoothing effect. 

If a sculpt operation works on a very large, very dense mesh however, the operation can take over a second. That is entirely acceptable and common on desktop applications which come equipped with progress bars and dialogs, but we have a VR device, where any time you drop under 90fps, bad things happen to the user, including nausea and dizziness. We don't let that happen, because of how our rendering engine works.

Rendering engine

Rendering and sculpting are thread-decoupled in Modeler. Rendering is entirely focused on hitting target frame rates on VR devices. During heavy sculpting operations, happening on a separate set of threads, the engine continues rendering the meshes representing the previous committed voxel state unobstructed. The mesh being displayed is not the in-progress mesh. The new mesh replaces the old one on screen only once the volume is updated and the submeshes recomputed.

In Modeler, rendering large scenes is made possible due to multiple techniques. A Modeler scene is made of layers assembled in a scene graph, much like in many DCC tools. The scene graph supports instancing, so much data can be explicitly deduplicated. However, that is a creative choice, not an optimization, because we achieve the same optimization implicitly.

Modeler makes broad use of COW. A prime example is how our heavy voxel data is split in small hierarchical blocks (as described for the sculpt engine) but blocks with identical data are not duplicated, in memory or on disk. So if, say, a sculpt of a man is duplicated (not instanced), there is still only one in memory. If one of the copies gets a remodel of the right hand, the rest of the body data will remain shared between the two sculpts (even if they are each made of a single voxel object). The savings are implicit, the user does not need to create explicit instances of a model for this to work.

The submeshes and the LODs associated with each block are also deduplicated this way, so that even if we have to display incredibly detailed scenes, the rendering engine only has to compute a very small fraction of them at any given frame. Our LODs are selected and smoothly interpolated based on the submesh size on screen, using a variant of Transvoxel. Submeshes that are too small on screen get culled entirely. 

We then apply backface, frustum and occlusion culling. This all contributes to handle billions of polygon (in terms of capacity) in a massive sculpted scene. We had solid engines which would carry Modeler far, but Modeler was not a product that was ready to ship yet.

Adobe-fication of Medium

Taking a whimsical, fun tool such as Oculus Medium and turning it into a professional sculpting tool for the Substance Ecosystem at Adobe included quite a few steps. We expected and looked forward to the product work - change the UI, improve workflows, more advanced options. When it came to organizational work though, the workload was surprisingly substantial, especially for a team largely composed of hardcore graphics people.

We cheekily nicknamed these changes the "Adobe Tax". It included creating and aligning all the infrastructure across our Substance products related to build and CI, security, analytics, crash reports, as well as licensing and legal. It involved rewriting or refactoring big swaths of code and negotiating with many teams in our organization as well as the others in Adobe. 

Keep in mind, the Substance group as a whole was also new to Adobe, as we came from a relatively recent acquisition ourselves. The Adobe Tax felt like a burden at times, but being part of Adobe had incredible perks. We met an army of excited and renowned researchers. We suddenly had access to resources and allies in business, engineering, law and marketing. We were also able to use some Adobe funds to support open source projects we relied on, which was a big priority for us.

I ended up volunteering for several of these because I was the tech lead and I wanted to make sure the graphics talent on the team would stay focused and motivated. It wasn't always fun, but I learned a lot and got to meet incredible professionals in other teams who would still be strangers if I hadn't stuck my neck out.

VR and 6 Degrees Of Freedom

So after a year or so, we just got out of alpha and we had a voxel-based sculpting tool. There were a few such tools around those days. Because we had a very compressed Alpha stage, Modeler was still quite rough around the edges and not quite an obvious sell. Based on what I learned presenting at conferences like GDC and meeting artists, it was clear that Modeler's (and Medium's) main differentiator was that it was first and foremost a Virtual Reality (VR) tool. VR morphs the act of sculpting into a much more visceral experience, where you can see and feel your way around your creation in a way that a desktop tool just doesn't. 

Even more than the visual immersion, what makes sculpting and painting in VR compelling is its 6 degrees of freedom (DOF) controllers, which approximate much more closely the way artists interact with physical sculpts. The result is that while desktop sculpting tools attract digital artists that are already familiar with 3D, VR sculpting tools are more compelling to traditional artists, to those who would rather not touch a mouse and keyboard, and would prefer sketching on paper or working with clay and marble. 

Tribal Mech Warfare, Artwork by Tomi Väisänen

What really defines Modeler's success for our VR users is: it makes modeling fun again. That was a recurring quote from our users. It makes it easy and fast to transfer what is in your mind into the digital world. It removes the barriers that separate concept work from modeling work.

The catch is that VR, at the time of writing, is still a rather small niche in the creative tooling landscape. A VR-first tool such as Modeler was in its early days really limited its user reach. It took a lot of strong will and optimism from our VP, to invest in a specialized team such as ours. It's worth remembering that Adobe normally deals with millions of users in its more successful applications. So in our search for a larger market fit, all the while improving our VR tooling, we had to pivot our focus away from VR and to a desktop-first sculpting tool.

Bringing Modeler to desktop

Making a hybrid VR/desktop tool is no joke. While other applications in the Substance ecosystem are typically based on Qt and QML, over time we replaced the legacy Medium UI system with two custom UI systems. Both were "immediate", in the way ImGui is, that is, little to no state is preserved across frames. These systems were aimed at allowing us to code our widgets, controls and dialogs simultaneously for VR and desktop.

By design we tried to maximize consistency between VR, which was well understood for us, and desktop, for familiarity's sake. But in time it became apparent that putting too much emphasis on consistency between the two worlds ended up doing a disservice to both. A successful UX for 6 DOF and VR does not map very well to a screen, mouse and keyboard. The reverse is even worse. 

To start from a few trivial examples, in VR, too many panels crowd the working space, while on desktop they can be organized in dockable panels across multiple screens (which, to be clear, we didn't get to implement). On desktop, precisely clicking is easy but relying too much on dragging can induce repetitive strain injuries, something I'm well acquainted with. On the other hand, in VR aiming precisely on sliders can be eye straining and frustrating while dragging objects around the scene feels very natural. 

Our early VR versions had no hierarchy view or panel. The scene hierarchy felt intuitive enough through "direct scoping" in VR. Entering and exiting "scopes" was as easy as a "hover and tap" action, accompanied by appropriate sound effects. However, desktop users felt the need for more familiar views such as a scene outliner. Adding an outliner for desktop effectively created needs we didn't have, such as selecting objects that are not siblings. That created serious complications to how the engine was structured, from the UX down to the core.

The VR render of the detailed geometry was, relatively speaking, fairly crude. That was acceptable in VR, but the need for a quality render became apparent on the desktop version. To tackle that we added support for custom cameras and lights, as well as a whole new path tracer which would not have worked on a VR app, and that caused some trouble while we were restructuring our rendering engine to fit in the new features without affecting the frame rate in VR.

Non-destructive Primitives

One other thing really helped make a desktop-first take of Modeler work well. It was a new type of data: the non-destructive primitives (in the app called simply "primitives"). These metaball reminiscent, Neo-like SDF functions can be combined in arbitrary ways. They bring back a workflow that is procedural rather than manual, non-destructive rather than viscerally physical. We had many desktop artists switch to use these almost exclusively.

Unlike Neo, they are still rendered as if they were made of Clay - maintaining the notions of volatile polygons and deferred topology. If we compare to ray marched SDF systems, this had advantages and disadvantages:

Among the advantages, we had a way to choose the render and compute quality of each component separately, in terms of voxel density. It helped keep the rendering pipeline clear, streamlined and extremely optimized. We could always up-res a set of primitives and the clay version would be analytically recomputed, with no loss of quality. It also gave a very good preview of what the mesh would look if exported "raw".

Among the disadvantages, well, we had to choose the render and compute quality of each component separately. That meant the user has to make some choices in terms of quality/performance that they wouldn't need to do in Shader Toy or Neo. It also meant the quality is determined by the voxel and mesh density rather than just the pixel resolution of the render, so when the camera gets really close, polygons become discernible.

Once the primitives shipped, the desktop version of Modeler finally started to feel like a real product. But switching to a desktop-first application was just one of the big adjustments we had to make. We started developing our alpha version during the year of the peak Metaverse hype. Then came other hypes. NFTs, crypto, Gen-AI, the new Mac processors, the iPad Pro, to name a few. Each time it felt like we needed to adjust the roadmap and learn new skills. However, despite the excitement in some of the leadership, our product was built within our core values and in the end we only worked on a couple initiatives. Here they are.

Mac port

When we started working on Modeler, it was a Windows-only tool. In fact, since much of the team and Medium itself came from the Oculus division at Facebook (now known as Meta Reality Labs), Modeler was working only on Oculus hardware, which is Windows-only. Given these restrictions, it made sense at the time to have the project be built in Visual Studio Pro.

Sentaro Emotional Support Device

When the port to other platforms became more likely in our quest to broaden the user base, we had to consider making the project more portable so I started converting the project to CMake. As I was making progress, a saw a demo of a more recent cross platform meta-build system, internal to Adobe but rather similar to Bazel. I tried converting a few dependencies to it. The difference was stark: the declarative nature, dependency handling and binary caching got me making progress a lot faster, so I ended up abandoning the CMake port. The debate between CMake and that Adobe meta-build system for other products is still ongoing and while there is no clear winner due to technical tradeoffs, Modeler stayed with the Adobe one.

The decision came to make a Mac port. We expected it because "When is it coming for Mac?" was the most common question we got on the Discord servers. At least for me, it was not clear however how many more users we would get, being part of the Substance ecosystem at Adobe. Adobe's creative user base is largely composed of Mac users. On the other hand, in the Substance ecosystem, Mac adoption is rather small, due to the fact that our biggest demographics were game developers (mostly on Windows) and VFX/animation (mostly on Linux). Also, a Mac port was going to be a desktop only version, because the newly released Vision Pro had (at that time) some imposed rendering API constraints that made it impossible for us to achieve the experience we wanted, so betting on a Mac port was not a guaranteed win.

We hired a brilliant new engineer in our team who pretty much single handedly prepared a version of Modeler that ran on Mac hardware and macOS. It took about a year, and I didn't follow all the details, but it required getting the code to compile with Clang (which has stricter warnings and differences in the STL implementations, among other things), losing or replacing some Windows-only dependencies and a lot of Metal shader work. I got to try it, and it worked really well. 

The Mac build was in QA when it was decided not to ship it yet. It was a bit of a shock (we had been announcing the Mac version coming soon for a while) and we sure felt a mix of disappointment and relief across the team at the beginning. But it was also the right call: we really needed to focus on the desktop and primitives work, which was critical to expand our user base, and not spend our team's limited bandwidth in the inevitable initial issues that come up when a new port gets released.

Modeler and AI

"But what about AI, Davide?". Don't worry, I didn't forget you. Yes, all tools in my org and beyond were asked to consider using Generative AI and evaluate its impact on the users. And evaluate, we did. 

Turns out, to the surprise of nobody, our user base of professional and enthusiast sculptors weren't excited about Gen-AI if it was used to generate shapes. And that makes sense, since creating shapes is the fun and creative part of the process. We had lots of people ask about Gen-AI UV generation though, which is always a bit of a chore. And since sculptors aren't always lighting and photography experts, they didn't mind the idea of having Gen-AI make a better render of their sculpture, as long as it was accurate. So we did a bit of experimenting around using 3D shapes to guide GeniAI imagery, and we did get to something working and cool, but we also decided that Modeler was a 3D tool for 3D outputs (meshes) not 2D outputs (images). Adding features around rendering and GeniAI, that wasn't helpful for the sculpting process itself, was more of a distraction than an improvement and would have detracted from the creative "flow" that Modeler strived to achieve.

Some AI did ship in Modeler though. It was my pet project, and the evolution of an old internal hackathon. It used slightly older types of AI, that used deep learning to create embedding of 3D shapes into multi dimensional vectors. This technology, written in collaboration with Adobe Research and which I nicknamed Block to Stock, is what powers the kit bashing features of Modeler. The basic idea is: we indexed all of the Substance 3D Assets and stored their information in a small database, shipped with Modeler. The user sketches some shapes, maybe types a word, and voila, the UI proposes a number of models (human-generated by our crew of artists) from the Substance Source repository. On a click, you can swap your model (in place) with the one proposed. No generated geometry, the AI was used to find the closest matches to your sketch in the repository.

Block to Stock in action. Artwork by Gio Nakpil

To tell the truth, only a couple of years later, this tech feels obsolete already. Gen-AI models are looking more and more like the ones people make: perfect quads, believable UVs and good shapes. Searching a database of human-made models may start to feel limiting. In the end, we stuck to our choices, to show respect to the craft of our incredible users, whose talent and passion outshines anything an AI will generate for us any time soon.

Modeler taught me more than I expected: about VR, about graphics, about AI, about teams — and about how hard it is to build something that feels simple. More importantly, it reminded me that tools come and go, technologies rise and fall, but the spark that drives someone to create — that part stays. Modeler was built for that spark, and I’m proud of the part I played. 

Comments

Popular Posts