Sitting at your desk looking at your monitor
during the day, there’s really nothing out of the ordinary. But if the sun sets, or I
just cut the lights, and the room darkens, you get this characteristic glow from the
monitor, contributing to that feeling of an overwhelmingly bright light in the darkness.
This is called a bloom effect, and at its heart, it can be both a simple and surprisingly
complex effect to implement. Do it right, and it looks subtle and sells that idea of an
overwhelmingly bright light, but do it wrong, and the scene ends up looking a bit wonky,
kinda straight out of the mid 2000’s.
Let’s talk about how it used to be done, way
back in the day. There’s really no official “correct” way to do this, whatever works, but the
premise basically works like this. You’ve got a copy of your main framebuffer, and you’ll apply
some basic operations. You threshold the image, meaning you choose the bright parts to bloom out.
You downsample it, mostly for performance reasons. And you apply a bunch of blur to it, that’s
how you get that bleed over effect.
Once you’ve done all that, they can be combined in
some handwavy way that looks nice to end up with a bloomed out image. That’s the theory anyway.
It’s not really set in stone, this is kinda an artistic effect, so in reality you’re going to
just go with whatever works and looks nice.
Let’s talk about thresholding an image first.
So what most people do is take the image, and do some sort of luminance cutoff. The
reasoning behind this was simple. A lot of stuff only ran with 8-bit per channel. Nowadays
modern hardware is fast enough to support floating point buffers wherever you need them, but back
in the dark ages, you often went with 8 bit channels. Xbox 360 had a 10 bit float format,
but I don’t think ps3 had an equivalent.
So yeah, you take your image and there’s a bunch
of ways people did this. You could contrast the heck out of your image, basically preserving
the bright areas and dropping off the rest.
You can also just set a luminance
threshold, so calculate luminance per-pixel and then drop off towards 0
everything below a certain threshold.
The general idea is that you need to isolate some
part of the image for blooming, not all of it, because we’re going to be additively blending
this bad boy back on the screen later, and without dropping off some parts,
it’ll look like a bit of a disaster.
You can use the full size main colour target
directly, but working with a smaller version of that is faster and cheaper. So it’s
normal to take your main colour target, and down sample that to a fraction of the size.
Every time you halve your resolution, the # of pixels processed gets cut by 75%, which is pretty
insane savings. The tradeoff here is that, as you progressively downsample, you risk shimmering
artefacts due to the lack of resolution.
Then there’s a blur pass, so you’ll take your main
colour, which has probably been downsampled, then you’ll blur it somehow. Now, a dead simple way is
good ol’ box blur, nothing beats that. The basic idea is to convolve the image with an equal weight
kernel, that might look something like this.
And that basically amounts to, if you were to
zoom in and look at individual pixels, for any given pixel in the input image, you’d sample the
nearby pixels and then average them all together to produce the final pixel in the output image.
Go and do that a few zillion times for every pixel in the image, and you’ll get a slightly blurry
image. This has some obvious limitations. For starters, to get a bigger radius, you can
either skip pixels, which just kinda looks bad, you can see the artefacts there pretty clearly
Or you can expand the kernel, which looks better but comes at the steep cost of being N^2 for the
radius you wanna blur. Bad options all around.
Luckily, the box filter is what’s referred
to as a separable filter in image processing, meaning that the 2d version can be
expressed as 2 1-dimensional filters. That is, it can be done along the
horizontal axis, or the x axis, and then the output of that can be run through
another filter on the vertical or y axis.
So I can start with the source image, and we can
start by applying a box filter along the x-axis. Notice the blurring in the
image is strictly horizontal. Once we’ve finished doing that, we can apply a
2nd box filter along the y-axis, giving us the box filter in 2n steps rather than N^2.
An arguably higher quality blur is to do a gaussian, which is just changing
the weights around in the kernel. Instead of equal weights, you can bias
the weight towards the centre pixel.
The kernel basically looks like this, kind of
a hill with raised centre and a gentle falloff towards the sides. Contrast that to a box filter,
which is exactly what it says, it’s a box.
This is separable, just like the box blur, so it’s
easy to get going. There’s other kernels here too, but I feel like the important takeaway
here is the general idea that you wanna make a blurry version of the scene.
And now that you’ve got your blurred out copy of the scene, you blend it back over the
screen as a post effect, generally an additive pass. And that’s it, that’s all it is.
Problem is, doing it this way is part of the reason why the games in the mid 2000’s looked
so bloom heavy. They relied on using things like thresholds to bloom out. If you’ve got an 8 bit
per channel buffer, that means that the maximum value is 255. If you choose to say, use luminance
to threshold what’s bright and what isn’t, ie. what blooms and what doesn’t. That approach makes
zero distinction between just a white object, and a glowing hot one. The end result being,
it's super hard to get under control.
Back in the day, we developers thought it looked
awesome and soooo next-gen, and it kinda did, so how about shut up. But let’s look at how
they’ve improved the approach in some of the major engines since those bright, bloomy days.
So, in general, any current generation implementation of bloom is going to be
trying to accomplish a few things.
High quality, nuanced. We’re long past
the days where bloom is new and exciting, so bloom implementations want to give that
feeling of an overwhelmingly bright light, without beating over the head with it.
Performant: This should be obvious, it should be fast, realllll fast.
Kawase’s Blur Filter
Let’s talk about new approaches, but before that,
way back at GDC 2003, this was I think before I even started in the industry, there was
a presentation by Masaki Kawase, Kawacy? Uh whatever, it was a very clever way to use gpu
hardware to get a really nice blur for cheap.
The idea is, take the main image, take 4
bilinear samples, and use those to downsample progressively to a smaller resolution, in the
talks I think they mentioned ¼ resolution.
After that, you’ll ping pong between 2
textures, each time taking 4 samples at a time with an increasing offset. By iterating
this over and over again, you can get a really good approximation to a large gaussian blur, for
really few texture samples. This is important, because this algorithm serves as the backbone
of pretty much all the modern approaches.
Let’s start by taking a look at Unity’s bloom
implementation. In 2014, the folks at Sledgehammer games, they were the ones behind the latest
(at the time) iteration of Call of Duty.
So they outlined how their bloom
implementation worked. Their technique was an evolution of the Kawase filter we
just outlined. The idea is as follows:
You take your framebuffer, and there’s no
threshold step. With HDR commonplace now, this isn’t needed. Now you downsample it to
create basically a mip chain. The default way is to take a bilinear sample right in the
middle of every 4 texels, and the Kawase method showed you could take 4 taps instead of 1.
They decided, screw it, we’re taking 13 taps.
And it’s really neat what they do here. So
if you’re looking at this group of pixels, and they’re going to take 9 samples total,
the upper left 4 will form a group, the upper right will form another, and so on. Lastly,
there’ll be one group of 4 pixels in the middle, and that puts a large bias towards those
centre pixels. The idea being that this should help reduce pulsing artefacts.
These are all combined with a simple equation with half the weight being the centre 4, and the
rest of the 4 groups getting equal weight.
One extra trick they added here is
that on the very first downsample, they take the Karis average based on luminance,
which helps preserve bright pixels.
Next, they progressively upsample, starting
with the smallest downsampled image. Each step, they apply a tent filter to the
downsampled version and also sum in the previous mip. I think. This is based on the slides.
So once you do that for a while, you’ll end up with a fully downsampled and then
re-upsampled image, which you then lerp with the main image to get a bloomed out
scene, and overall it looks pretty good.
Notice that the scene isn’t too bloomed out
like in shots from earlier in the video, the bloom is more subtle and comes off
more as a glow rather than a weird haze.
So unreal’s approach seems to be mostly an
extension on those same older techniques. They go over the general approach way back in
their Siggraph 2012 presentation on the technology behind the “Unreal Engine 4 Elemental Demo”.
From what I can gather, they take the main image, the framebuffer, and from that they
progressively downsample a fixed number of times using a plain Kawase blur.
They then do a similar upscaling process as Unity, but from the sounds of it, they do a small
separable gaussian blur at each level, which is artist configurable. So you can modify the
size of the blur at each level to your taste.
They also add some knobs and dials for doing
things like applying tints to each level, giving you that extra level of artistic control.
So I could, for example, specifically choose to tint an individual layer of the bloom chain,
while leaving the rest as varying shades of grey, or something to that effect. I haven’t tried
profiling it, my guess is that it’s slightly more expensive than Unity’s approach but can
give higher quality and more flexible results. This is based on no actual research on my part.
So while most of these approaches are just tweaks and evolutions on Kawase’s approach from
way back, the newer versions of Unreal engine come with a fancy new convolution
bloom, which is really interesting.
If you have a signal, let’s represent it
with a seemingly random function here, then you can use something called the Fourier
transform to decompose that function into a series of sinusoids, a series of sin and cosines
of varying amplitudes and frequencies.
If you take an image and imagine each
pixel as data points of a 2d signal, of a 2 dimensional function, then we can use a
2D transform to convert this to frequency space, and then we can use the inverse fourier transform
to bring this back to our original image.
The whole theory and background behind the
fourier transform deserves a long video by itself, which I clearly didn't do. There
are some phenomenal resources on Youtube, which I’ve got linked in the description. For
our purposes, I think it’s probably sufficient to know that this transformation exists and
what it buys you, in terms of functionality.
Once you’ve converted this to the
frequency domain, there’s some really cool manipulations you can do.
A high pass filter is one that preserves high frequency information while removing low
frequency information, and that’s as simple as multiplying by zero’ing out the image near
the origin before doing the inverse FFT.
A low pass filter is the exact opposite,
preserving low frequency information while discarding high, which is the inverse of what we
just did, and that gives us a really blurry image. In fact, I can actually perform a gaussian blur
in a similar way, by using the fourier transform of a gaussian filter, which lets me perform a
gaussian blur for pretty much free. I mean, free after the FFT, which wasn’t even slightly free.
That kind of leads us to an interesting property of the frequency domain, multiplying 2 images
together in the frequency domain is the equivalent to convolving them in the time domain.
This is where Unreal’s new convolution bloom is so neat and different. You
can create custom bloom kernel shapes, so here for example I can create a star like
object which I’ve done just by drawing some 2d sdf’s. I’ve also added a bit of a gaussian blurs
on top of that. Now, we can transform that kernel that I just made into frequency space via the
FFT, and multiple our fourier transformed image
Then when we take the inverse
of that using the inverse FFT, that ends up being our blurry bloom buffer. When
we combine that with the framebuffer, it can give us this neat streaking effect that would be
a lot more difficult to do with conventional methods but trivial using this approach.
The unfortunate downside is that doing an FFT is a heavy operation, but I imagine it’s one that
we’ll probably see more often as graphics hardware undoubtedly gets more and more powerful. There was
a time when doing simple loops was unfathomable, so a few more iterations on hardware and
this will be a small blip on the profiler.
Until next time,
Cheers