## Fighting fireflies

### December 23, 2010

I’ve been playing around with path tracing on and off for longer than I care to admit. Although my dalliances never produced anything earth-shattering (and certainly nothing I’d be willing to post about on ompf), I’ve found it to be an endlessly fascinating subject. No matter how much I read about it, I only ever seem to be scratching the surface.

One of the biggest headaches I encountered were caused by “fireflies”: those bright pixels that can occur when a sampling a strong response combined with a small PDF somewhere along the path. For a long time, I was “fixing” these by hand painting over the offending pixels and pretending like nothing ever happened. Eventually though, the guilt of this gnawed away at me long enough to motivate finding some kind of better solution.

My first thought was to write a filter that estimated variance in an image and replace any “bad” pixels it found with a weighted average of their neighbours. Luckily, my second thought was of shadow maps, the only other context in which I’d read about variance before. Based on the ideas in that paper, I accumulate two separate per-pixel buffers: one storing the running sum of the samples and the other storing the sum of their squares. Having these two buffers then makes it trivial to compute the sample variance of each pixel in the image.

My path tracer already had support for progressive refinement, so it was straightforward to add a separate “variance reduction” pass that would run at the touch of a button. During this pass, the N pixels with the highest variance are identified and oversampled a few hundred times, which hopefully reduces their variance sufficiently. If not, I just run the pass again.

As an example, here’s a render of the Manifold mesh from Torolf Sauermann’s awesome model repository, stopped after only a few paths have been traced per pixel:

I’ve highlighted a few areas that contain fireflies and below is a comparison of those areas before and after running the variance reduction pass:

And here’s the complete result of running the pass; it’s still noisy, but the pixels with particularly high variance have been cleaned up reasonably well:

I’m not too hot at statistics, but I would guess that this adds bias to the final result, which is frowned upon in some circles (but not others). Admittedly, a better solution would be to simply not generate so much variance in the first place, but this will do as a kludge until then. At least it’s better than painting pixels by hand!

Ok, since this post was mostly just an excuse to dump some results from my path tracer, here they are.

The XYZ RGB Dragon from Stanford’s 3D Scanning Repository:

A heat map of the BIH built for the same model.

The *other* Dragon from Stanford:

Manifold again, from jotero.com.

Some Stanford Bunnies:

Crytek’s updated version of Marko Dabrovic’s Sponza model:

And Stanford’s Lucy, just to prove that I can:

I’ve not been able to find any close up renders of Lucy to compare with, but I believe that the “pimples” on the model are noise in the original dataset.

## Why my fluids don’t flow

### December 14, 2010

I have an unopened copy of Digital Color Management sitting on my desk. It’s staring at me accusingly.

In order to keep myself distracted from its dirty looks, I’ve been tinkering around with fluid simulation. Miles Macklin has done some great work with Eulerian (grid based) solvers, so in an effort to distance myself from the competition, I’m sticking to 2D Lagrangian (particle based) simulation.

Until recently, I’d always thought that particle based fluid simulation was complicated and involved *heavy maths*. This wasn’t helped by the fact that most of the papers on the subject have serious sounding names like Particle-based Viscoelastic Fluid Simulation, Weakly compressible SPH for free surface flows, or even Smoothed Particle Hydrodynamics and Magnetohydrodynamics.

It wasn’t until I finally took the plunge and tried writing my own Smoothed Particle Hydrodynamics simulation that I found that it can be quite easy, provided you work from the right papers. SPH has a couple of advantages over grid based methods: it is trivial to ensure that mass is exactly conserved, and free-surfaces (the boundary between fluid and non-fluid) come naturally. Unfortunately, SPH simulations have a tendency to explode if the time step is too large and getting satisfactory results is heavily dependent on finding “good” functions with which to model the inter-particle forces.

I had originally intended to write an introduction to SPH, but soon realised that it would make this post intolerably long, so instead I’ll refer to the papers that I used when writing my own sim. Pretty much every SPH paper comes with an introduction to the subject, invariably in section **2. Related Work**.

The first paper I tried implementing was Particle-Based Fluid Simulation for Interactive Applications by Müller et. al. It serves as a great introduction to SPH with a very good discussion of kernel weighting functions, but I had real difficulty getting decent results. In the paper pressure, viscosity and surface tension forces are modeled using following equations:

The pressure for each particle is calculated from its density using:

where is the some non-zero rest density.

The first problem I encountered was with the pressure model; it only acts as a repulsive force if the particle density is greater than the rest density. If a particle has only a small number of neighbours, the pressure force will attract them to form a cluster of particles all sharing the same space. In my experiments, I often found large numbers of clusters of three or four particles all in the same position. It took me a while to figure out what was going on because Müller states that the value of the rest density “mathematically has no effect on pressure forces”, which is only true given a fairly uniform density of particles far from the boundary.

The second problem I found was with the surface tension force. It was originally developed for multiphase fluid situations with no free surfaces and doesn’t behave well near the surface boundary; in fact it can actually pull the fluid into concave shapes. Additionally, because it’s based on a Laplacian, it’s very sensitive to fluctuations in the particle density, which are the norm at the surface boundary.

After a week or so of trying, this was my best result:

From the outset, you can see the surface tension force is doing weird things. Even worse, once the fluid starts to settle the particles tended to stack on top of each and form a very un-fluid blob.

On the up side, I did create possibly my best ever bug when implementing the surface tension model; I ended up with something resembling microscopic life floating around under the microscope:

The next paper I tried was Particle-based Viscoelastic Fluid Simulation by Clavet et al. I actually had a lot of success with their paper and had a working implementation of their basic model up and running in less than two hours. Albeit minus the viscoelasticity. In addition to the pressure force described in Müller’s paper, they model “near” density and pressure, which are similar to their regular counterparts but with a zero rest density and different kernel functions:

This near pressure ensures a minimum spacing and as an added bonus performs a decent job of modelling surface tension too. This is the first simulation I ran using their pressure and viscosity forces:

Although initial results were promising, I struggled when tweaking the parameters to find a good balance between a fluid that was too compressible and one that was too viscous. Also, what I really wanted was to do multiphase fluid simulation. This wasn’t covered in the viscoelastic paper, so my next port of call was Weakly compressible SPH for free surface flows by Becker et al. In this paper, surface tension is modeled as:

They also discuss using Tait’s equation for the pressure force, rather than one based on the ideal gas law:

with

I gave that a shot, but the large exponent caused the simulation to explode unless I used a *really* small time step. Instead, I found that modifying the pressure forces from the viscoelastic paper slightly gave a much less compressible fluid without the requirement for a tiny time step:

Here’s one of my more successful runs:

And here is a slightly simplified version of the code behind it. Be warned, it’s quite messy; I’m rather enjoying hacking code together these days:

#include <float.h> #include <math.h> #include <stdio.h> #include <stdlib.h> #include <assert.h> #include <memory.h> #include <glut.h> #define kScreenWidth 640 #define kScreenHeight 480 #define kViewWidth 10.0f #define kViewHeight (kScreenHeight*kViewWidth/kScreenWidth) #define kPi 3.1415926535f #define kParticleCount 3000 #define kRestDensity 82.0f #define kStiffness 0.08f #define kNearStiffness 0.1f #define kSurfaceTension 0.0004f #define kLinearViscocity 0.5f #define kQuadraticViscocity 1.0f #define kParticleRadius 0.05f #define kH (6*kParticleRadius) #define kFrameRate 20 #define kSubSteps 7 #define kDt ((1.0f/kFrameRate) / kSubSteps) #define kDt2 (kDt*kDt) #define kNorm (20/(2*kPi*kH*kH)) #define kNearNorm (30/(2*kPi*kH*kH)) #define kEpsilon 0.0000001f #define kEpsilon2 (kEpsilon*kEpsilon) struct Particle { float x; float y; float u; float v; float P; float nearP; float m; float density; float nearDensity; Particle* next; }; struct Vector2 { Vector2() { } Vector2(float x, float y) : x(x) , y(y) { } float x; float y; }; struct Wall { Wall() { } Wall(float _nx, float _ny, float _c) : nx(_nx), ny(_ny), c(_c) { } float nx; float ny; float c; }; struct Rgba { Rgba() { } Rgba(float r, float g, float b, float a) : r(r), g(g), b(b), a(a) { } float r, g, b, a; }; struct Material { Material() { } Material(const Rgba& colour, float mass, float scale, float bias) : colour(colour) , mass(mass) , scale(scale) , bias(bias) { } Rgba colour; float mass; float scale; float bias; }; #define kMaxNeighbourCount 64 struct Neighbours { const Particle* particles[kMaxNeighbourCount]; float r[kMaxNeighbourCount]; size_t count; }; size_t particleCount = 0; Particle particles[kParticleCount]; Neighbours neighbours[kParticleCount]; Vector2 prevPos[kParticleCount]; Vector2 relaxedPos[kParticleCount]; Material particleMaterials[kParticleCount]; Rgba shadedParticleColours[kParticleCount]; #define kWallCount 4 Wall walls[kWallCount] = { Wall( 1, 0, 0), Wall( 0, 1, 0), Wall(-1, 0, -kViewWidth), Wall( 0, -1, -kViewHeight) }; #define kCellSize kH const size_t kGridWidth = (size_t)(kViewWidth / kCellSize); const size_t kGridHeight = (size_t)(kViewHeight / kCellSize); const size_t kGridCellCount = kGridWidth * kGridHeight; Particle* grid[kGridCellCount]; size_t gridCoords[kParticleCount*2]; struct Emitter { Emitter(const Material& material, const Vector2& position, const Vector2& direction, float size, float speed, float delay) : material(material), position(position), direction(direction), size(size), speed(speed), delay(delay), count(0) { float len = sqrt(direction.x*direction.x + direction.y*direction.y); this->direction.x /= len; this->direction.y /= len; } Material material; Vector2 position; Vector2 direction; float size; float speed; float delay; size_t count; }; #define kEmitterCount 2 Emitter emitters[kEmitterCount] = { Emitter( Material(Rgba(0.6f, 0.7f, 0.9f, 1), 1.0f, 0.08f, 0.9f), Vector2(0.05f*kViewWidth, 0.8f*kViewHeight), Vector2(4, 1), 0.2f, 5, 0), Emitter( Material(Rgba(0.1f, 0.05f, 0.3f, 1), 1.4f, 0.075f, 1.5f), Vector2(0.05f*kViewWidth, 0.9f*kViewHeight), Vector2(4, 1), 0.2f, 5, 6), }; float Random01() { return (float)rand() / (float)(RAND_MAX-1); } float Random(float a, float b) { return a + (b-a)*Random01(); } void UpdateGrid() { // Clear grid memset(grid, 0, kGridCellCount*sizeof(Particle*)); // Add particles to grid for (size_t i=0; i<particleCount; ++i) { Particle& pi = particles[i]; int x = pi.x / kCellSize; int y = pi.y / kCellSize; if (x < 1) x = 1; else if (x > kGridWidth-2) x = kGridWidth-2; if (y < 1) y = 1; else if (y > kGridHeight-2) y = kGridHeight-2; pi.next = grid[x+y*kGridWidth]; grid[x+y*kGridWidth] = π gridCoords[i*2] = x; gridCoords[i*2+1] = y; } } void ApplyBodyForces() { for (size_t i=0; i<particleCount; ++i) { Particle& pi = particles[i]; pi.v -= 9.8f*kDt; } } void Advance() { for (size_t i=0; i<particleCount; ++i) { Particle& pi = particles[i]; // preserve current position prevPos[i].x = pi.x; prevPos[i].y = pi.y; pi.x += kDt * pi.u; pi.y += kDt * pi.v; } } void CalculatePressure() { for (size_t i=0; i<particleCount; ++i) { Particle& pi = particles[i]; size_t gi = gridCoords[i*2]; size_t gj = gridCoords[i*2+1]*kGridWidth; neighbours[i].count = 0; float density = 0; float nearDensity = 0; for (size_t ni=gi-1; ni<=gi+1; ++ni) { for (size_t nj=gj-kGridWidth; nj<=gj+kGridWidth; nj+=kGridWidth) { for (Particle* ppj=grid[ni+nj]; NULL!=ppj; ppj=ppj->next) { const Particle& pj = *ppj; float dx = pj.x - pi.x; float dy = pj.y - pi.y; float r2 = dx*dx + dy*dy; if (r2 < kEpsilon2 || r2 > kH*kH) continue; float r = sqrt(r2); float a = 1 - r/kH; density += pj.m * a*a*a * kNorm; nearDensity += pj.m * a*a*a*a * kNearNorm; if (neighbours[i].count < kMaxNeighbourCount) { neighbours[i].particles[neighbours[i].count] = &pj; neighbours[i].r[neighbours[i].count] = r; ++neighbours[i].count; } } } } pi.density = density; pi.nearDensity = nearDensity; pi.P = kStiffness * (density - pi.m*kRestDensity); pi.nearP = kNearStiffness * nearDensity; } } void CalculateRelaxedPositions() { for (size_t i=0; i<particleCount; ++i) { const Particle& pi = particles[i]; float x = pi.x; float y = pi.y; for (size_t j=0; j<neighbours[i].count; ++j) { const Particle& pj = *neighbours[i].particles[j]; float r = neighbours[i].r[j]; float dx = pj.x - pi.x; float dy = pj.y - pi.y; float a = 1 - r/kH; float d = kDt2 * ((pi.nearP+pj.nearP)*a*a*a*kNearNorm + (pi.P+pj.P)*a*a*kNorm) / 2; // relax x -= d * dx / (r*pi.m); y -= d * dy / (r*pi.m); // surface tension if (pi.m == pj.m) { x += (kSurfaceTension/pi.m) * pj.m*a*a*kNorm * dx; y += (kSurfaceTension/pi.m) * pj.m*a*a*kNorm * dy; } // viscocity float du = pi.u - pj.u; float dv = pi.v - pj.v; float u = du*dx + dv*dy; if (u > 0) { u /= r; float a = 1 - r/kH; float I = 0.5f * kDt * a * (kLinearViscocity*u + kQuadraticViscocity*u*u); x -= I * dx * kDt; y -= I * dy * kDt; } } relaxedPos[i].x = x; relaxedPos[i].y = y; } } void MoveToRelaxedPositions() { for (size_t i=0; i<particleCount; ++i) { Particle& pi = particles[i]; pi.x = relaxedPos[i].x; pi.y = relaxedPos[i].y; pi.u = (pi.x - prevPos[i].x) / kDt; pi.v = (pi.y - prevPos[i].y) / kDt; } } void ResolveCollisions() { for (size_t i=0; i<particleCount; ++i) { Particle& pi = particles[i]; for (size_t j=0; j<kWallCount; ++j) { const Wall& wall = walls[j]; float dis = wall.nx*pi.x + wall.ny*pi.y - wall.c; if (dis < kParticleRadius) { float d = pi.u*wall.nx + pi.v*wall.ny; if (dis < 0) dis = 0; pi.u += (kParticleRadius - dis) * wall.nx / kDt; pi.v += (kParticleRadius - dis) * wall.ny / kDt; } } } } void Render() { glClearColor(0.02f, 0.01f, 0.01f, 1); glClear(GL_COLOR_BUFFER_BIT); glMatrixMode(GL_PROJECTION); glLoadIdentity(); glOrtho(0, kViewWidth, 0, kViewHeight, 0, 1); glEnable(GL_POINT_SMOOTH); glEnable(GL_BLEND); glBlendFunc(GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA); for (size_t i=0; i<particleCount; ++i) { const Particle& pi = particles[i]; const Material& material = particleMaterials[i]; Rgba& rgba = shadedParticleColours[i]; rgba = material.colour; rgba.r *= material.bias + material.scale*pi.P; rgba.g *= material.bias + material.scale*pi.P; rgba.b *= material.bias + material.scale*pi.P; } glEnableClientState(GL_VERTEX_ARRAY); glEnableClientState(GL_COLOR_ARRAY); glPointSize(2.5f*kParticleRadius*kScreenWidth/kViewWidth); glColorPointer(4, GL_FLOAT, sizeof(Rgba), shadedParticleColours); glVertexPointer(2, GL_FLOAT, sizeof(Particle), particles); glDrawArrays(GL_POINTS, 0, particleCount); glDisableClientState(GL_COLOR_ARRAY); glDisableClientState(GL_VERTEX_ARRAY); glutSwapBuffers(); } void EmitParticles() { if (particleCount == kParticleCount) return; static int emitDelay = 0; if (++emitDelay < 3) return; for (size_t emitterIdx=0; emitterIdx<kEmitterCount; ++emitterIdx) { Emitter& emitter = emitters[emitterIdx]; if (emitter.count >= kParticleCount/kEmitterCount) continue; emitter.delay -= kDt*emitDelay; if (emitter.delay > 0) continue; size_t steps = emitter.size / (2*kParticleRadius); for (size_t i=0; i<=steps && particleCount<kParticleCount; ++i) { Particle& pi = particles[particleCount]; Material& material = particleMaterials[particleCount]; ++particleCount; ++emitter.count; float ofs = (float)i / (float)steps - 0.5f; ofs *= emitter.size; pi.x = emitter.position.x - ofs*emitter.direction.y; pi.y = emitter.position.y + ofs*emitter.direction.x; pi.u = emitter.speed * emitter.direction.x*Random(0.9f, 1.1f); pi.v = emitter.speed * emitter.direction.y*Random(0.9f, 1.1f); pi.m = emitter.material.mass; material = emitter.material; } } emitDelay = 0; } void Update() { for (size_t step=0; step<kSubSteps; ++step) { EmitParticles(); ApplyBodyForces(); Advance(); UpdateGrid(); CalculatePressure(); CalculateRelaxedPositions(); MoveToRelaxedPositions(); UpdateGrid(); ResolveCollisions(); } glutPostRedisplay(); } int main (int argc, char** argv) { glutInitWindowSize(kScreenWidth, kScreenHeight); glutInit(&argc, argv); glutInitDisplayString("samples stencil>=3 rgb double depth"); glutCreateWindow("SPH"); glutDisplayFunc(Render); glutIdleFunc(Update); memset(particles, 0, kParticleCount*sizeof(Particle)); UpdateGrid(); glutMainLoop(); return 0; }

I’m pretty happy with the results, even if at three seconds per frame for the video above, my implementation isn’t exactly fast. Here are a few other videos from various stages of development:

## Why Reinhard desaturates my blacks

### August 19, 2010

I’m doing it wrong.

Now that TFU2 is *almost* out of the door, I’ve been catching up on this year’s GDC presentations. One that was of particular interest to me was John Hable’s Uncharted 2 HDR Lighting talk because I think we’re all in agreement about how awesome the game looks. That led me to checking out his blog and his discussions of various tone mapping operators.

I agree with him on most of his points and I really like the results of his operator, but I was a bit disappointed by the treatment of Erik Reinhard’s tone mapping operator.

In order to explain why, I’ve pinched the HDR photo from John’s blog and it’s accompanied by some colour ramps to illustrate the results of applying various operations. The ramps go from a luminance of 0 up to a luminance of er… *very large* and are in linear RGB space.

Shown below are the source images (the photo is exposed at +4 stops). Click through for less tiny versions:

In both his blog and GDC presentation, John describes a simplified version of Reinhard’s operator as applying the following function to each colour channel:

Let’s do that to our test image and see what happens:

The top end isn’t nearly so blown out, but where did all my colour go?! That’s no good at all!

Let’s check out how John’s operator does:

It’s *much* better, especially in the blacks, but it’s still rather desaturated towards the top end. Perhaps that’s the price one pays for compressing the dynamic range so heavily.

Actually it doesn’t have to be.

The problem with the tone mapping operators that John describes is that they all operate on the RGB channels independently. Applying *any* non-linear transform in this way will result in both hue and saturation shifts, which is something that should be performed during final colour grading, not tone mapping. Instead, Reinhard’s tone mapping operator should be applied on each pixel’s *luminance*, which will preserve both the hue and saturation of the original image.

There’s some confusion on the internet about the correct way to apply Reinhard’s operator. Some sources recommend converting from linear RGB into CIE xyY, a colour space derived from CIE XYZ. The advantage of this colour space is that luminance is stored in the Y channel, independently of chromacity in xy. The idea is that you convert your image from RGB to xyY, perform the tone mapping on the Y channel only and then convert back to RGB.

While not complicated, the transform between linear RGB and xyY isn’t exactly trivial either. Here’s a simple implementation in C#, for a reference white of D65:

void RGBtoxyY(double R, double G, double B, out double x, out double y, out double Y) { // Convert from RGB to XYZ double X = R * 0.4124 + G * 0.3576 + B * 0.1805; double Y = R * 0.2126 + G * 0.7152 + B * 0.0722; double Z = R * 0.0193 + G * 0.1192 + B * 0.9505; // Convert from XYZ to xyY double L = (X + Y + Z); x = X / L; y = Y / L; } void xyYtoRGB(double x, double y, double Y, out double R, out double G, out double B) { // Convert from xyY to XYZ double X = x * (Y / y); double Z = (1 - x - y) * (Y / y); // Convert from XYZ to RGB R = X * 3.2406 + Y * -1.5372 + Z * -0.4986; G = X * -0.9689 + Y * 1.8758 + Z * 0.0415; B = X * 0.0557 + Y * -0.2040 + Z * 1.0570; }

I was using this colour space transform for a couple of days, until my esteemed colleague Miles pointed out that I was doing it wrong. A much simpler approach is to calculate your luminance directly from the RGB values, perform the tone mapping on this value and then scale the original RGB values appropriately:

double L = 0.2126 * R + 0.7152 * G + 0.0722 * B; double nL = ToneMap(L); double scale = nL / L; R *= scale; G *= scale; B *= scale;

This yields the same results as the conversion to and from xyY and has fewer magic numbers, which is always a win.

Now lets see what happens when we apply the same *x / (x+1)* to each pixel’s *luminance*:

Balls. This has preserved the colours, but at a terrible price; now all the whites are gone. The reason is that by preserving the hue and saturation, the operator prevents any colours being blown out to a full white. Luckily, Reinhard’s got our back. In his paper, the preceding operation is written as:

Almost immediately after this equation, Reinhard goes on to say:

“This formulation is guaranteed to bring all luminances within displayable range. However, as mentioned in the previous section, this is not always desirable”

He then presents the following:

Here, is the smallest luminance that will be mapped to 1.0.

Let’s give that a whirl, with an of 4 for the colour ramps and an of 2.4 for the condo photo:

Well that’s much better than the previous effort; whether its better than John’s results is up for debate. What I like about it is that because it mostly preserves the hues, so you haven’t lost any information; if you want to crisp the blacks or desaturate the whites, you can do that in your final colour grade. If you’re colour grading via a volume texture lookup, this is for free!

Having said all that, the TFU2 environment artists were most happy with a simple brightness & contrast adjustment with a small 0 – 0.05 toe, which in the end only costs one extra madd instruction at the end of our shaders. Whatever works for your game, makes the artists happy and means you get to go home at 6pm, right?

A full implementation of Reinhard’s operator adapts to the local changes in luminance around each pixel and is a little more involved that what I’ve described here, but hopefully I’ve managed to contribute something useful to the tone mapping in games discussion and not just reduced the signal to noise ratio on the subject.

## Hello, world!

### August 18, 2010

~~To use their parlance, I work as a “core generalist” engineer at LucasArts.~~

~~As such I tend to dabble in a lot of different fields without ever really understanding what it is that I’m doing. Here, I shall catalogue the many ways in which I do it wrong.~~

I work at Google these days, though I still don’t really know what I’m doing. Fortunately, they have a very thorough code review process.