Private infrastructure · 2024 — present

Bit-Depth Expansion Network

A surprising amount of source material in a modern VFX pipeline arrives at eight bits per channel. Frames pulled from compressed video. AI-generated imagery from standard diffusion models. Archival plates, web references, screenshots used as background elements. None of this material is sufficient for a comp — eight-bit content fractures the moment any meaningful color operation touches it — and throwing it away is rarely an option. The pragmatic path is to lift it: convert eight-bit source into a true sixteen-bit float container in a way that doesn't just zero-pad the missing precision but reconstructs the smooth gradients and continuous tonal transitions that the original eight-bit truncation destroyed. That is the problem this network solves.

The implementation is private. The shape of the work, and why it has earned a place in the broader HDR pipeline, is below.

01 — The problem

Eight-bit material has 256 discrete values per channel. In a small region of an image — a clean blue sky, a shadow rolling off a face, a smoke gradient — the actual visible tonal range often spans only ten or twenty of those values. Stretched across screen-sized smoothness, the steps between adjacent values become visible as banding. The eye is, unfortunately, very good at finding these steps.

A compositing pipeline makes the problem worse, not better. Every operation a comp applies — a curve, a grade, an exposure adjustment, a color balance — re-quantizes the eight-bit grid in non-uniform ways. A subtle highlight push can move a smooth gradient onto a coarser grid and create banding that wasn't visible in the source. By the time the plate has been through a comp's worth of operations, the banding is everywhere and there is nothing left in the eight-bit signal to recover from.

The right fix is structural. The plate needs to be in a sixteen-bit float container before the comp touches it. But naive bit-depth expansion is not enough — converting an eight-bit value to a sixteen-bit float by multiplication preserves exactly the banding the work is meant to eliminate. The new container has the headroom to represent smooth gradients; the data inside it doesn't.

What's actually needed is a plausible reconstruction of the smooth gradients that would have existed if the source had been captured at higher precision in the first place. That is a learned mapping, not a deterministic transform.

A naive bit-depth lift gives you sixteen bits of room and eight bits of data. The network's job is to fill the other eight bits with values that are plausible rather than blank.

02 — Why this isn't a smoothing filter

The first instinct, when describing the problem, is to reach for a smoothing filter. Apply a Gaussian blur, dither carefully, slightly widen the histogram. This works for about three minutes before the failure modes become obvious.

Smoothing destroys real detail along with the banding. A sky that is genuinely smooth and a face that is genuinely sharp need to be treated differently, and a filter that doesn't know the difference will either fail to deband the sky or destroy the texture of the face. Adaptive filters that try to distinguish gradient regions from textured regions get the easy cases right and the hard cases — the soft transitions inside an otherwise textured area — wrong in ways that look uncanny rather than uncorrected.

The deeper problem is that "what should this gradient look like at sixteen-bit precision" is a predictive question, not a smoothing one. A network that has seen enough sixteen-bit material learns priors about how light actually rolls off across a surface, how skin tones grade into shadow, how a sky transitions from horizon to zenith. A filter does not have those priors. It only has the eight-bit input.

Once the framing shifts from smoothing to prediction, the rest of the project shape follows. A learned model. Trained on real sixteen-bit source data with synthetic eight-bit inputs derived from it. Optimized against losses that specifically penalize the artifacts the work is trying to eliminate. Validated on metrics that a colorist would trust rather than a researcher's go-to PSNR-and-call-it-a-day.

03 — Where it sits in the pipeline

The network is part of a broader HDR pipeline rather than a standalone tool. It handles the cases the HDR diffusion model doesn't — material that comes in as eight-bit and needs to be lifted, rather than material being generated from scratch. Concretely, three workflows depend on it:

AI-generated content from non-HDR models. Standard diffusion output — anything from a stock FLUX or SDXL run — is eight-bit display-referred. The bit-depth network is the bridge that makes this content useful in a comp without forcing the artist to use only the HDR-finetuned model upstream.

Archival and web reference plates. A comp may need to integrate eight-bit reference material as a background or texture element. The network lifts the precision before the comp begins, removing the constant low-grade banding that would otherwise show up under exposure adjustment.

Source from compressed delivery formats. Footage delivered through an eight-bit codec for review or for low-bandwidth shoots, where the on-set capture was higher precision but the working copy is not.

In each case, the output of the network is a sixteen-bit float container in a working color space appropriate to the rest of the comp, ready to be color-managed through OCIO into whatever downstream representation the show requires.

04 — On not sharing the implementation

This is the only case study on the site whose implementation cannot be shared. The work was developed alongside my current employment, and personal projects under those terms require sign-off before public release. That sign-off was not granted, and the implementation will remain private.

The case study exists on this site as evidence that the work happened, not as a path for other people to reproduce it. The HDR generation work and the Nuke nodes for ComfyUI are deliberately public because the value to the community is highest when those projects are reproducible — and because the broader research conversation around HDR diffusion benefits from open contribution. This project lives on the other side of that boundary. The reasoning is contractual rather than strategic; my default with research-adjacent work is to publish, not to hold back.

05 — Reflection

The honest reflection on this project is that the most important decisions were made before any code was written. What is this network actually for? What does the failure mode look like in production, not on a benchmark? Which artifacts are worth designing the loss function around, and which are real but not pipeline-critical? These are not questions a researcher unfamiliar with VFX would think to ask. The architecture, in the end, is a small variation on a well-understood family. The dataset and the loss design — the parts that aren't being shared — are where the work actually lives.

This is consistent with the pattern across the rest of the site. The model is rarely the interesting part. The interesting part is knowing what function the model needs to approximate, which is a question of understanding the use case in unusual depth. For this project, that meant treating "what does a colorist consider acceptable" as the primary specification, not as a downstream verification step.

The network has been in active use for several months across the workflows above. It has not failed in production review. The cases where it produces output a colorist would notice are rare and predictable, and the post-processing tools handle the residual cases. The next step is consolidating this work into a stable internal release rather than expanding its public surface — which is the right priority for a piece of infrastructure rather than a research project.

Next case study

ComfyUI MCP Server

Exposing the full image-generation stack to coding agents via Model Context Protocol.

← Back to Selected Research