Rate this Page

The Zero-One Specialization Problem#

Created On: Sep 22, 2025 | Last Updated On: Sep 22, 2025

Before you read this section, you should understand the basics of dynamic shapes. Make sure you have read the following sections:

In torch.compile, we specialize automatically on inputs with sizes 0 or 1 and assume that any remaining inputs cannot be 0 or 1. This simplifies tasks like contiguity and broadcasting checks, as it avoids adding extra guards. However, this can cause problems for sparse models with many symbolic integers that in practice have tensors of size 0, 1, or 2. For example, consider when you a task is something like collecting likes on page.

While it’s possible to stop specializing on 0/1 upfront, executing normal PyTorch code often reintroduces 0/1 guards, as many conditions in PyTorch check for values being 0 or 1. Although models that work for N > 2 often generalize to N = 1, this isn’t guaranteed, especially with symbolic variables. For example, in hand tracking, a dimension size of N = 0, 1, or 2 may lead to different graph behaviors. Simply hoping that the N > 2 model generalizes can expose soundness issues.