Whether for artistic or pragmatic reasons,
pixel art is often authored for resolutions smaller than the screen they are ultimately displayed on. Upscaling to the display resolution is usually done
using nearest-neighbor point sampling, which keeps the pixels sharp, but can cause distracting
aliasing artifacts when the scaling ratio is not a whole number, or if there’s stretching or rotating effects,
which are especially noticeable in motion. For conventional 3D assets, texture aliasing
is usually mitigated by some combination of bilinear sampling, mipmapping,
and anisotropic filtering, all of which are implemented
directly on GPU hardware. While these filters do eliminate the aliasing,
they also smear out the result in a way that’s usually undesirable for pixel art assets. Ideally, what we want is to produce just
enough filtering to eliminate aliasing while maintaining the original sharpness. In this video, we’ll be developing
a method which is able to display arbitrarily scaled, rotated, offset, and distorted
pixel art textures without aliasing artifacts. I should also mention that this is
all possible without any additional draw calls, VRAM usage, texture samples, or indeed
any changes to the render pipeline. Initial research into this subject led
me to this 2017 blog post by Cole Cecil, describing a method for drawing pixel
art at non integer texel-to-pixel ratios. In this context, a “texel”, or “texture element” refers to each square element in the texture,
and a “pixel”, or “picture element” refers to each square element in the render target,
which we’ll assume to be the screen. Broadly explained, when drawing a
texture to screen, the texture’s coordinate system, typically called its UVs,
is transformed to the screen’s coordinate system. This transformation is carried out through
the polygons the texture is attached to: Each vertex of each polygon is associated with a
UV coordinate, and when the GPU draws the polygon, the UVs at its vertices are interpolated to
each pixel the polygon appears on in the screen. Then inside a pixel shader, the UVs at
each pixel are given to the sampler, which returns the texture color at that location,
and the result is drawn to that pixel on screen. In general, the interpolated UVs at a pixel
don’t fall exactly on the middle of a texel, but typically fall somewhere between texels. Nearest-neighbor sampling is so-named because it returns
the texel nearest to the sampled UV coordinate. Bilinear sampling is so-named because it
interpolates linearly between nearby texels, along one axis, then the other. If a pixel’s UVs line up exactly with the center of a texel, bilinear sampling will result in returning
the exact color of that texel. The method described in Cole’s blog takes
advantage of this fact by manipulating the UVs in the pixel shader before sampling. The UVs, which are normalized between 0 and 1,
are first scaled to the texture size to get the texel coordinates. Then, the texel coordinates are snapped to the
nearest texel center to produce a sharp sample, with an extra term to interpolate from
the center of one texel to the next over a narrow band based on the ratio
between pixel size and texel size. Finally, the texel coordinates are converted back to UVs, and passed to the bilinear
sampler to produce the final color. When the upscale ratio is a whole number, and
the image is placed pixel-perfect on screen, the points evaluated by the pixel shader will just
happen to dodge all of the interpolation regions, meaning no interpolation will be visible, and
the result will be a perfectly sharp upscale. This method works great for uniform
scaling, and we can even extend it to support non-uniform stretching,
which is done by making the upscale ratio a 2-dimensional variable to separately handle
the horizontal and vertical texture dimensions. However, it’s not capable of handling
more complex distortions, such as the perspective distortion from 3D rotations,
which complicate what it means to calculate the upscaling ratio when texel shape
and size varies across the image. In order to extend this method to
cover these types of distortions, we’ll first need a slightly deeper
understanding of the problem. Bilinear sampling is often thought of as
interpolation between the nearest 4 texels, but there’s another way of explaining what it does. If we think of the texels as individual
cells each with its own solid color, the bilinear sample at a point is equal to the average
color inside a single-texel-sized box at that point. We can say that the bilinear filter is type of
box filter, with a box size of a single texel, and we can use it to sample a box
average at any texture location. Conceptually, to render with the ideal
amount of anti-aliasing, we need to find the average texture color inside every pixel, and for that we need a pixel-sized box filter. For example, to color this pixel, we need to
find the average texture color in this box, but there’s a way to stretch the box such that the
average color contained within it doesn’t change. If we stretch the box this way
until it’s the size of a texel, then we know from earlier that we can use the bilinear
sampler to compute the average color in this box. And since we arrived at this box from our original pixel
while carefully preserving the average color, that means the pixel color here is equal to the bilinear sample here. To render a pixel art texture
with correct anti-aliasing, we just have to repeat this
process on every other pixel. The shader code we had before
was secretly already doing this. Or rather, this is just another way of
thinking about what the code was doing. Instead of thinking about moving
around the interpolation regions, we can think of this code as performing the
calculations needed to compute some box average by converting it into a bilinear sample, and our shader works by passing in
the ratio between the pixel and texel size, to make the box into the size of a pixel. Now let’s look at the problem from before again
with this new perspective. When the texture is rotated, or distorted from perspective, we can’t compute the box size
directly from a pixel per texel ratio, because pixels and texels are no longer uniformly aligned. If we tried to compute the average color in a pixel now, the region of the texture contained
within the pixel is no longer a rectangle, and the trick we used before of stretching
the box into the size of a texel while preserving average color no longer works. Given that our stretching trick only
works for axis-aligned rectangles, we can compromise by using the axis-aligned
bounding box of our distorted pixel, which will be the smallest box filter
that we can use to sample the texture using the bilinear sampler while still providing
enough anti-aliasing to cover the entire pixel. To compute this bounding box, we first need to
compute the shape of the pixel in texture space. We know the shape of the pixel in pixel
space is, by definition, a unit square. To transform it into texture space, we need to characterize the deformation of the texture space at the given pixel. Given that there are 2 pixel dimensions XY and 2 texture dimensions UV, the deformation
can be determined from 4 measurements: The change in U per X, the change in U per Y, the change in V per X, and the change in V per Y. These 4 rates of change define a 2x2 matrix, which I’ll call the deformation gradient from pixel space to texture space. We can multiply the deformation
gradient by any vector in pixel space to get the same vector in texture space. Expanding the matrix multiplication
gives us a formula for solving UV vector components given XY vectors. If we evaluate this formula at the corners of a unit square, with XY components (0, 0), (1, 0), (1, 1), and (0, 1), we get the texture space
coordinates of the distorted pixel. To compute the width and height
of the axis-aligned bounding box, we just need the difference between the
maximum and minimum values of U and V. When we simplify this expression,
we end up with the absolute value of du/dx plus the absolute value of
du/dy for the texture-space width, and the same expression with dv/dx and
dv/dy for the texture-space height. On the GPU, we can compute the 4 deformation
terms using the ddx and ddy functions, and compute both the u and v box size in parallel.
In fact, this operation is common enough that the shader language has a built-in function
called fwidth() that does the whole thing. Now that we’re computing box size directly
instead of the pixel per texel ratio, we can also simplify the sample offset code a bit. At this point, there are a few loose
ends that remain to be tied up. First, the box size we calculate needs
to be clamped to at most a single texel, because our box stretching trick assumes
we don’t cross any texel boundaries. The box size also needs to be larger than
zero, because we don’t want to divide by zero. Since the box size has to be smaller than 1 texel, there will be aliasing in situations where the
desired filter amount is larger than 1 texel. To address this, we’ll take advantage of the other two filtering methods: mipmapping and anisotropic filtering. Because of the way we’ve manipulated the
UVs, the GPU’s usual way of calculating how to apply these filters doesn’t work reliably. To fix this, we just need to switch the sampling function to one that explicitly passes in the original UV’s deformation gradient. Next, since our axis-aligned bounding box
will tend to overestimate the amount of blurring required to perform anti-aliasing, we can purposefully compensate for this by using a slightly undersized box. Alternatively, what I prefer is to weigh the center of the box higher than the edges when computing the average color. If we choose a quadratic averaging window, the implementation turns out
to be quite cheap to compute because integrating this window over a texel
boundary results in a cubic transition, which can be directly computed
using the smoothstep function. Lastly, I want to address an issue that may show up when using bilinear sampling of color sprites with transparency. When a texture is encoded with straight alpha blending, where the image opacity is encoded
independent of the image color, transparent texels have no well-defined color data, meaning that any box average that includes
a fully-transparent texel is not well-defined. engines like Unity usually hide this problem by guessing the color of the transparent texels
based on their surrounding opaque texels, but pixel art textures often contain
regions where a transparent texel borders multiple different colored opaque texels,
resulting in incorrect blending at these locations. The correct solution is to use
pre-multiplied alpha blending, which is an alternative way of encoding
transparency that bypasses this issue entirely by having well-defined color
for transparent texels, while otherwise producing
exactly equivalent results. In my own game, which is primarily
polygon- rather than sprite-based, I use the pixel art anti-aliasing method
described in this video for upscaling the game’s internal render resolution
to the game’s display resolution, as well as for UI and text elements
that are drawn directly to the display. While this works optimally for integer upscaling, it also allows for less conventional upscaling ratios, such as when running in windowed mode. Ironically, none of my current use
cases involve any of the rotating or 3D distortions discussed in this video, but I thought it was an interesting
enough idea to fully develop, and I think it could potentially be a
useful reference for others whose own design goals and use cases
may differ from my own. To that end, for those who have
expressed interest in supporting my work, I’ve opened an account on Patreon, where I plan on posting a bit more candidly
about the upcoming content pipeline, as well as sharing some supplementary
material to published videos, such as source code examples or blog
posts expanding on relevant ideas. To everyone else, thanks for watching.