Joe's blog

Self-aware struct-like types in C++11

updated July 5, 2012 11:49:30 PDT

Even with C++11, C++ offers inadequate metaprogramming facilities for user-defined types compared to other programming languages. Given an arbitrary struct type, you can't iterate its fields and get useful information like name, offset, and size, without implementing those facilities by hand. (Nearly every would-be C++ replacement language fixes this, but they unfortunately aren't always viable options.) The C++11 standard library introduced the tuple template as a general-purpose metaprogrammable composite type, but it sucks in a number of ways:

Here's an alternative approach I came up with that provides a user interface nearly equivalent to primitive structs, is much easier to metaprogram with than tuple, and is easier to implement as well, requiring about 150 lines of header-only code. I've put a sample implementation up on Github at https://github.com/jckarter/selfaware. Here's a rundown of how it works.

Self-aware field templates

The main idea is to inherit the "struct" type from a set of field class templates, each of which defines a single field along with methods to generically access its name string, value, and type. For example, a field template named foo looks like this:

template<typename T>
struct foo {
    T foo;

    // field name
    constexpr static char const *name() { return "foo"; }

    // field type
    using type = T;

    // field value generic accessor
    T &value() & { return this->foo; }
    T const &value() const & { return this->foo; }
    T &&value() && { return this->foo; }
};

A preprocessor macro can generate these for us:

#define SELFAWARE_IDENTIFIER(NAME) \
    template<typename T> \
    struct NAME { \
        T NAME; \
        // field name \
        constexpr static char const *name() { return #NAME; } \
        // field type \
        using type = T; \
        // field value generic accessor \
        T &value() & { return this->NAME; } \
        T const &value() const & { return this->NAME; } \
        T &&value() && { return this->NAME; } \
    };

The self-aware struct template

The "struct" template now needs only to inherit a set of field template instances and provide some constructors:

template<typename...Fields>
struct Struct : Fields... {
    // A convenience alias for subclasses
    using struct_type = Struct;

    // Preserve default constructors
    Struct() = default;
    Struct(Struct const &) = default;

    // Forwarding elementwise constructor
    template<typename...T>
    constexpr Struct(T &&...x) : Fields{static_cast<T&&>(x)}... {}
};

A Struct type can then be used either by aliasing a Struct instance or by inheriting an instance and its constructors. (As of Clang 3.1 and GCC 4.7, neither compiler yet supports inheriting constructors, so aliasing is currently more practical.)

SELFAWARE_IDENTIFIER(foo)
SELFAWARE_IDENTIFIER(bar)
// Aliasing a Struct instance
using FooBar = Struct<foo<int>, bar<double>>;
// Inheriting a Struct instance (requires inheriting constructors)
struct FooBar2 : Struct<foo<int>, bar<double>> { using struct_type::struct_type; };

Values of the type look like normal structs to user code:

FooBar frob(int x) {
    FooBar f = {x, 0.0};
    f.foo += 1;
    f.bar += 1.0;
    return f;
}

The type is trivial if its component types are trivial, and its instances can be used in compile-time calculations if its component types can, like primitive structs. (However, because it inherits multiple nonempty types, it's not standard-layout, and thus not quite POD.)

static_assert(std::is_trivial<FooBar>::value, "should be trivial");
static_assert(FooBar{2, 3.0}.foo + FooBar{2, 4.0}.foo == 4, "2 + 2 == 4");

Metaprogramming with self-aware structs

Since the fields of the Struct template are encoded in a template parameter pack, there's a lot you can do to it with unpack expressions and recursive templates. Here are a few examples:

Function application

Applying a function object to a Struct's unpacked fields is easy—just unpack the value() method of each field superclass into a function call expression:

template<typename Function, typename...Fields>
auto apply(Function &&f, Struct<Fields...> const &a_struct)
    -> decltype(f(a_struct.Fields::value()...))
{
    return f(a_struct.Fields::value()...);
}

double hypotenuse(double x, double y) { return sqrt(x*x, y*y); }

double fooBarHypotenuse(FooBar const &x) { return apply(hypotenuse, x); }

Interop with tuple

A Struct can be converted into a tuple or tie similarly:

template<typename...Fields>
auto structToTuple(Struct<Fields...> const &s)
    -> std::tuple<typename Fields::type...>
{
    return std::make_tuple(s.Fields::value()...);
}
template<typename...Fields>
void assignStructFromTuple(Struct<Fields...> &s,
                           std::tuple<typename Fields::type...> const &t)
{
    std::tie(s.Fields::value()...) = t;
}

Generating code from field metadata

The Struct template can implement a static method to iterate through its fields, passing the name string, offset, size, and type of each field to a function object in turn. (Getting the offset unfortunately relies on undefined behavior, because C++11 restricts offsetof to standard-layout types and provides no other well-defined means that I know of for determining offsets independent of an instance.)

template<typename...Fields>
struct Struct : Fields... {
    // ... see above ...

    // NB: relies on undefined behavior
    template<typename Field>
    static std::uintptr_t offset_of() {
        return reinterpret_cast<std::uintptr_t>(&static_cast<Struct*>(nullptr)->Field::value());
    }

    template<template<typename T> class Trait, typename Function>
    static void each_field(Function &&f)
    {
        // Unpack expressions are only allowed in argument lists and initialization lists,
        // so this expression unpacks the function call expression into the initializer list
        // for an unused array (which the optimizer is nice enough to discard)
        char pass[] = {
            (f(Fields::name(), offset_of<Fields>(), sizeof(typename Fields::type),
              Trait<typename Fields::type>::value()), '\0')...};
        (void)pass; // suppress unused variable warnings
    }
};

Many libraries that deal with binary data have finicky APIs for describing struct layouts. A good example is OpenGL's glVertexAttribPointer interface, which is used to describe the format of vertex information in memory. The each_field function template, paired with a traits class, can generate the correct sequence of glVertexAttribPointer automatically from a Struct instance's metadata:

struct GLVertexType { GLuint size; GLenum type; GLboolean normalized; };

// A trait class to provide glVertexAttribPointer arguments appropriate for a type
template<typename> struct GLVertexTraits;
template<GLuint N>
struct GLVertexTraits<float[N]> {
    static GLVertexType value() { return {N, GL_FLOAT, GL_FALSE}; }
};
template<GLuint N>
struct GLVertexTraits<std::uint8_t[N]> {
    static GLVertexType value() { return {N, GL_UNSIGNED_BYTE, GL_TRUE}; }
};

template<typename Struct>
bool bindVertexAttributes(GLuint program)
{
    _VertexAttributeBinder iter(program, sizeof(T));
    Struct::template each_field<GLVertexTraits>(
        [=program](char const *name, size_t offset, size_t size, GLVertexType info) {
            GLint location = glGetAttribLocation(program, name);
            glVertexAttribPointer(location, info.size, info.type, info.normalized,
                                  sizeof(Struct), reinterpret_cast<const GLvoid*>(offset));
            glEnableVertexAttribArray(location);
        });
    return iter.ok;
}

Selecting a field at runtime by string name

A recursive template can generate code to pick a field at runtime from a string argument, passing the value through a function object to narrow the return type:

template<typename R, typename Field, typename...Fields, typename Visitor, typename...AllFields>
R _select_field(Visitor &&v, char const *name, Struct<AllFields...> const &a_struct)
{
    if (strcmp(name, Field::name()) == 0)
        return v(a_struct.Field::value());
    else
        return _select_field<R, Fields...>(static_cast<Visitor&&>(v), name, a_struct);
}

template<typename R, typename Visitor, typename...AllFields>
R _select_field(Visitor &&v, char const *name, Struct<AllFields...> const &a_struct)
{
    throw std::runtime_error("bad field name");
}

template<typename Visitor, typename Field, typename...AllFields>
auto select_field(Visitor &&v, char const *name, Struct<Field, AllFields...> const &a_struct)
    -> decltype(v(a_struct.Field::value()))
{
    return _select_field<decltype(v(a_struct.Field::value())), Field, AllFields...>
        (static_cast<Visitor&&>(v), name, a_struct);
}
select_field can then be used like this:
template<typename T>
struct converter {
    template<typename U> T operator()(U &&x) { return T(x); }
};

void testStructSelectField()
{
    FooBar x{11, 22.0};

    double foo = select_field(converter<double>(), "foo", x);
    double bar = select_field(converter<double>(), "bar", x);
    assert(foo == 11.0);
    assert(bar == 22.0);
}

Problems

This technique still isn't ideal. Most obviously, field templates all need to be defined somewhere, which adds maintenance friction, and they rely on unseemly preprocessor magic to create. Fields and Struct instances could perhaps be instantiated together in one macro, perhaps by pulling in boost::preprocessor. Compile time, always an issue with C++, also suffers from use of the Struct template. Clang 3.1 takes almost a second on this 2.4 GHz Core 2 Duo just to compile the 199-line selfaware-test.cpp test suite. And tuple, for all its faults, is standard, and will be available on any platform that purports to support C++11. Neither Struct nor tuple is standard-layout and thus can't interoperate with C in a standard-guaranteed, portable way. I'd love to hear about other approaches to enabling composite type metaprogramming in C++.

An intro to modern OpenGL. Chapter 4: Rendering a Dynamic 3D Scene with Phong Shading

updated July 15, 2010 08:06:09 PDT

« Chapter 3 | Table of Contents

At this point, we've seen the most important core parts of the OpenGL API and gotten a decent taste of the GLSL language. Now's a good time to start exercising OpenGL and implementing some graphic effects, introducing new nuances and specialized features of OpenGL and GLSL as we go. For the next few chapters, I've prepared a new demo program you can get from my Github ch4-flag repository. The flag demo renders a waving flag on a flagpole against a simple background:

With the flat, wallpaper-looking grass and brick textures and the unnatural lack of shadow cast by the flag, it looks like something a Nintendo 64 would have rendered, but it's a start. We'll improve the graphical fidelity of the demo over the next few chapters. For this chapter, we'll render the above image by implementing the Phong shading model, which will serve as the basis for more advanced effects we'll look at later on.

Overview of the flag program

I've organized flag into four C files and four headers. You've already seen a good amount of it in hello-gl: the file-util.c and file-util.h files contain the read_tga and file_contents functions, and gl-util.c and gl-util.h contain the make_texture, make_shader, and make_program functions we wrote in chapter 2. The vec-util.h header contains some basic vector math functions. flag.c looks a lot like hello-gl.c did: in main, we initialize GLUT and GLEW, set up callbacks for GLUT events, call a make_resources function to allocate a bunch of GL resources, and call out to glutMainLoop to start running the demo. However, the setup and rendering are a bit more involved than last time. Let's look at what's new and changed:

Mesh construction

The meshes.c file contains code that generates the vertex and element arrays, collectively called a mesh, for the flag, flagpole, ground, and wall objects that we'll be rendering. Most objects in the real world, including real flagpoles and flags, have smooth curving surfaces, but graphics cards deal with triangles. To render these objects, we have to approximate their surfaces as a collection of triangles. We do this by filling a vertex array with vertices placed along its surface, storing attributes of the surface with each vertex, and connecting the samples into triangles using the element array to give an approximation of the original surface.

The fundamental properties a mesh stores for each vertex are its position in world space and its normal, a vector perpendicular to the original surface. The normal is fundamental to shading calculations, as we'll see shortly. Normals should be unit vectors, that is, vectors whose length is one. Each vertex also has material parameters that indicate how the surface is shaded. The material can consist of a set of per-vertex values, texture coordinates that sample material information from a texture, or some combination of both.

For the flag demo, the material consists of a texture coordinate for sampling the diffuse color from the mesh texture, a specular color, and shininess factor. We'll see how these parameters are used shortly. Our vertex buffer thus contains an array of flag_vertex structs looking like this:

struct flag_vertex {
    GLfloat position[4];
    GLfloat normal[4];
    GLfloat texcoord[2];
    GLfloat shininess;
    GLubyte specular[4];
};

Although the position and normal are three-dimensional vectors, we pad them out to four elements because most GPUs prefer to load vector data from 128-bit-aligned buffers, like SIMD instruction sets such as SSE. For each mesh, we collect the vertex buffer, element buffer, texture object, and element count into a flag_mesh struct. When we render, we set up glVertexAttribPointers to pass all of the flag_vertex attributes to the vertex shader:

 struct flag_mesh {
    GLuint vertex_buffer, element_buffer;
    GLsizei element_count;
    GLuint texture;
};
static void render_mesh(struct flag_mesh const *mesh)
{
    glBindTexture(GL_TEXTURE_2D, mesh->texture);

    glBindBuffer(GL_ARRAY_BUFFER, mesh->vertex_buffer);
    glVertexAttribPointer(
        g_resources.flag_program.attributes.position,
        3, GL_FLOAT, GL_FALSE, sizeof(struct flag_vertex),
        (void*)offsetof(struct flag_vertex, position)
    );
    glVertexAttribPointer(
        g_resources.flag_program.attributes.normal,
        3, GL_FLOAT, GL_FALSE, sizeof(struct flag_vertex),
        (void*)offsetof(struct flag_vertex, normal)
    );
    glVertexAttribPointer(
        g_resources.flag_program.attributes.texcoord,
        2, GL_FLOAT, GL_FALSE, sizeof(struct flag_vertex),
        (void*)offsetof(struct flag_vertex, texcoord)
    );
    glVertexAttribPointer(
        g_resources.flag_program.attributes.shininess,
        1, GL_FLOAT, GL_FALSE, sizeof(struct flag_vertex),
        (void*)offsetof(struct flag_vertex, shininess)
    );
    glVertexAttribPointer(
        g_resources.flag_program.attributes.specular,
        4, GL_UNSIGNED_BYTE, GL_TRUE, sizeof(struct flag_vertex),
        (void*)offsetof(struct flag_vertex, specular)
    );

    glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, mesh->element_buffer);
    glDrawElements(
        GL_TRIANGLES,
        mesh->element_count,
        GL_UNSIGNED_SHORT,
        (void*)0
    );
}

Note that the glVertexAttribPointer call for the specular color attribute passes GL_TRUE for the normalized argument. The specular colors are stored as four-component arrays of bytes between 0 and 255, much as they would be in a bitmap image, but with the normalized flag set, they'll be presented to the shaders as normalized floating-point values between 0.0 and 1.0.

The actual code to generate the meshes is fairly tedious, so I'll just describe it at a high level. We construct two distinct meshes: the background mesh, created by init_background_mesh, which consists of the static flagpole, ground, and wall objects; and the flag, set up by init_flag_mesh. The background mesh consists of two large rectangles for the ground and wall, and a thin cylinder with a pointed truck making the flagpole. The wall, ground, and flagpole are assigned texture coordinates to sample out of a single texture atlas image containing the grass, brick, and metal textures, stored in background.tga. This allows the entire background to be rendered in a single pass with the same active texture. The flagpole is additionally given a yellow specular color, which will give it a metallic sheen when we shade it. The flag is generated by evaluating the function calculate_flag_vertex at regular intervals between zero and one on the s and t parametric axes, generating something that looks sort of like a flag flapping in the breeze. The flag being a separate mesh makes it easy to update the mesh data as the flag animates, and lets us render it with its own texture, loaded from flag.tga.

Streaming dynamic mesh data

void update_flag_mesh(
    struct flag_mesh const *mesh,
    struct flag_vertex *vertex_data,
    GLfloat time
) {
    GLsizei s, t, i;
    for (t = 0, i = 0; t < FLAG_Y_RES; ++t)
        for (s = 0; s < FLAG_X_RES; ++s, ++i) {
            GLfloat ss = FLAG_S_STEP * s, tt = FLAG_T_STEP * t;

            calculate_flag_vertex(&vertex_data[i], ss, tt, time);
        }

    glBindBuffer(GL_ARRAY_BUFFER, mesh->vertex_buffer);
    glBufferData(
        GL_ARRAY_BUFFER,
        FLAG_VERTEX_COUNT * sizeof(struct flag_vertex),
        vertex_data,
        GL_STREAM_DRAW
    );
}

To animate the flag, we use our glutIdleFunc callback to recalculate the flag's vertices and update the contents of the vertex buffer. We update the buffer with the same glBufferData function we used to initialize it. However, both on initialization and on each update, we give the flag vertex data the GL_STREAM_DRAW hint instead of the GL_STATIC_DRAW hint we've been using until now. This tells the OpenGL driver to optimize for the fact that we'll be continuously replacing the buffer with new data. Since only the positions and normals of the vertices themselves changes, the element buffer for the flag can remain static. The connectivity of the vertices doesn't change.

Using a depth buffer to order 3D objects

Since we're drawing multiple objects in 3d space, we need to ensure that objects closer to the viewer render on top of the objects behind them. An easy way to do this would be to just render the objects back-to-front—in our case, render the background mesh first, then the flag on top of it—but this is inefficient because of the overdraw this approach leads to: fragments get generated and processed by the fragment shader for background objects, only to be immediately overwritten by the foreground objects in front of it. Back-to-front rendering also cannot render mutually overlapping objects, such as two interlocked rings, on its own, for rendering either object first will cause it to entirely overlap the other.

Graphics cards use depth buffers to provide efficient and reliable ordering of 3d objects. A depth buffer is a part of the framebuffer that sits alongside the color buffer, and like the color buffer, is a two-dimensional array of pixel values. Instead of color values, the depth buffer stores a depth value, associating a projection-space z coordinate to each pixel. When a triangle is rasterized with depth testing enabled, each fragment's projected z value is compared to the z value currently stored in the depth buffer. If the fragment would be further away from the viewer than the current depth buffer value, the fragment is discarded. Otherwise, the fragment gets rendered to the color and depth buffers, the new z value replacing the old depth buffer value.

In addition to providing correct ordering of objects, depth buffering also minimizes the cost of overdraw if you render objects front-to-back. Although the rasterizer will still generate fragments for parts of objects obscured by already-rendered objects, modern GPUs can discard these obscured fragments before they get run through the fragment shader, reducing the number of overall fragment shader invocations the processor needs to execute. Since our flag mesh appears in front of the background mesh, we thus render the flag before the background so that the obscured parts of the background don't need to be shaded.

To use depth testing in our program, we need to ask for a depth buffer in our framebuffer and then enable depth testing in the OpenGL state. With GLUT, we can ask for a depth buffer for a window by passing the GLUT_DEPTH flag to glutInitDisplayMode:

int main(int argc, char* argv[])
{
    glutInit(&argc, argv);
    glutInitDisplayMode(GLUT_RGB | GLUT_DEPTH | GLUT_DOUBLE);
    /* ... */
}

We enable and disable depth testing by calling glEnable or glDisable with GL_DEPTH_TEST:

static void init_gl_state(void)
{
    /* ... */
    glEnable(GL_DEPTH_TEST);
    /* ... */
}

When we start rendering our scene, we need to clear the depth buffer along with the color buffer to ensure that stale depth values don't affect rendering. We can clear both buffers with a single glClear call by passing it both GL_COLOR_BUFFER_BIT and GL_DEPTH_BUFFER_BIT:

static void render(void)
{
    glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
    /* ... */
}

Back-face culling

Another potential source of overdraw comes from within an object. If you look at the cylindrical flagpole from any direction, you're going to see at most half of its surface. The front-facing triangles appear in front of the back-facing triangles, but they rasterize into the same pixels on screen. Depending on the ordering of triangles in the mesh, the front-facing triangles will either overdraw the back-facing triangles or the fragments of the back-facing triangles will fail the depth test, requiring some extra work from the GPU in either case.

However, we can get the GPU to cheaply and quickly discard back-facing triangles even before they get rasterized or depth-tested. If we enable back-face culling, the graphics card will classify every triangle as front- or back-facing after running the vertex shader and immediately prior to rasterization, completely discarding back-facing triangles. It does this by looking at the winding of each triangle in projection space. By default, triangles winding counterclockwise are considered front-facing. This works because transforming a triangle to face the opposite direction from the viewer reverses its winding. By constructing our meshes so that all of the triangles wind counterclockwise when viewed from the front, we can use back-face culling to eliminate most of the work of rasterizing those triangles when they face away from the viewer. Only the vertex shader will need to run for their vertices.

Back-face culling is enabled and disabled by passing GL_CULL_FACE to glEnable/glDisable:

static void init_gl_state(void)
{
    /* ... */
    glEnable(GL_CULL_FACE);
    /* ... */
}

Updating the projection matrix and viewport

If you go back a chapter and try resizing the hello-gl window, you'll notice that the image stretches to fit the new size of the window, ruining the aspect ratio we worked so hard to preserve. In order to maintain an accurate aspect ratio, we have to recalculate our projection matrix when the window size changes, taking the new aspect ratio into account. We also have to inform OpenGL of the new viewport size by calling glViewport. GLUT allows us to provide a callback that gets invoked when the window is resized using glutReshapeFunc:

static void reshape(int w, int h)
{
    g_resources.window_size[0] = w;
    g_resources.window_size[1] = h;
    update_p_matrix(g_resources.p_matrix, w, h);
    glViewport(0, 0, w, h);
}
int main(int argc, char* argv[])
{
    /* ... */
    glutReshapeFunc(&reshape);
    /* ... */
}

The update_p_matrix function implements the perspective matrix formula from last chapter and stores the new projection matrix in the g_resources.p_matrix array, from which we'll feed our shaders' p_matrix uniform variable.

Handling mouse and keyboard input with GLUT

GLUT provides extremely primitive support for mouse and keyboard input. In flag, I've made it so that dragging the mouse moves the view around, and the view snaps back to its original position when the mouse button is released. GLUT offers a glutMotionFunc callback that gets called when the mouse moves while a button is held down and a glutMouseFunc that gets called when a mouse button is pressed or released. (There's also glutPassiveMotionFunc to handle mouse motion when a button isn't pressed, which we don't use.) Our glutMotionFunc adjusts the model-view matrix relative to the distance from the center of the window, and our glutMouseFunc resets it when the mouse button is let go:

static void drag(int x, int y)
{
    float w = (float)g_resources.window_size[0];
    float h = (float)g_resources.window_size[1];
    g_resources.eye_offset[0] = (float)x/w - 0.5f;
    g_resources.eye_offset[1] = -(float)y/h + 0.5f;
    update_mv_matrix(g_resources.mv_matrix, g_resources.eye_offset);
}

static void mouse(int button, int state, int x, int y)
{
    if (button == GLUT_LEFT_BUTTON && state == GLUT_UP) {
        g_resources.eye_offset[0] = 0.0f;
        g_resources.eye_offset[1] = 0.0f;
        update_mv_matrix(g_resources.mv_matrix, g_resources.eye_offset);
    }
}
int main(int argc, char* argv[])
{
    /* ... */
    glutMotionFunc(&drag);
    glutMouseFunc(&mouse);
    /* ... */
}

The update_mv_matrix function is similar to update_p_matrix. It generates a translation matrix, following the formula from last chapter, and stores it to g_resources.mv_matrix, from which we feed the shaders' mv_matrix uniform variable.

I also rigged flag so you can reload the GLSL program from disk while the demo is running by pressing the R key. The glutKeyboardFunc callback gets called when a key is pressed. Our callback checks if the pressed key was R, and if so, calls update_flag_program:

static void keyboard(unsigned char key, int x, int y)
{
    if (key == 'r' || key == 'R') {
        update_flag_program();
    }
}
int main(int argc, char* argv[])
{
    /* ... */
    glutKeyboardFunc(&keyboard);
    /* ... */
}

update_flag_program attempts to load, compile, and link the flag.v.glsl and flag.f.glsl files from disk, and if successful, replaces the old shader and program objects.

That covers the C code for the flag demo. The actual shading happens inside the GLSL code, which we'll look at next.

Phong shading

Physically accurate light simulation requires expensive algorithms that have only recently become possible for even high-end computer clusters to calculate in real time. Fortunately, human eyes don't require perfect physical accuracy, especially not for fast-moving animated graphics, and real-time computer graphics has come a long way rendering impressive graphics on typical consumer hardware using cheap tricks that approximate the behavior of light without simulating it perfectly. The most fundamental of these tricks is the Phong shading model, an inexpensive approximation of how light interacts with simple materials developed by computer graphics pioneer Bui Tuong Phong in the early 1970s. Phong shading is a local illumination simulation—it only considers the direct interaction between a light source and a single point. Because of this, Phong shading alone cannot calculate effects that involve the influence of other objects in a scene, such as shadows and mirror reflections. This is why the flag casts no shadow on the ground or wall behind it.

The Phong model involves three different lighting terms:

Diffuse and ambient reflection

If you hold a flat sheet of paper up to a lamp in a dark room, it will appear brightest when it faces the lamp head-on, and appear dimmer as you rotate it away from the light, reaching its darkest when it's perpendicular to the light. Curved surfaces behave the same way; if you roll up or crumple the paper, its surface will be brightest where it faces the light the most directly. The wider the angle between the surface normal and the light direction, the darker the paper appears. If the paper and light remain stationary but you move your head, the paper's apparent color and brightness won't change. Likewise, in the flag demo, if you drag the view with the mouse, you can see the flag's shading remains the same. The surface reflects light evenly in every direction, or "diffusely." This basic lighting effect is thus called diffuse reflection.

There's an inexpensive operation called the dot product that produces a scalar value from two vectors related to the angle between them. Given two unit vectors u and v, if their dot product u · v (pronounced "u dot v") is one, then the vectors face the exact same direction; if zero, they're perpendicular; and if negative one, they face exact opposite directions. Positive dot products indicate acute angles while negative dot products indicate obtuse angles. GLSL provides a function dot(u,v) to calculate the dot product of two same-sized vec values.

The dot product's behavior follows that of diffuse reflection: surfaces reflect more light the more parallel to a light source they become, or in other words, the closer the dot product of their normal and the light's direction gets to one. Perpendicular or back-facing surfaces reflect no light, and their dot product will be zero or negative. This relationship between the dot product and diffuse brightness was first observed by 18th-century physicist Johann Lambert and is referred to as Lambertian reflectance, and surfaces that exhibit the behavior are called Lambertian surfaces. Phong shading uses Lambertian reflectance to model diffuse reflection, taking the dot product of the surface normal and the direction from the surface to the light source. If the dot product is greater than zero, it is multiplied by the diffuse color of the light, and the result is multiplied with the surface diffuse color to get the shaded result. (Multiplying two color values involves multiplying their corresponding red, green, blue, and alpha components together, which is what GLSL's * operator does when given two vec4s.) If the dot product is zero or negative, the diffuse color will be zero.

However, in the real world, even when a surface isn't directly lit, it still won't appear pitch black. In any enclosed area, there will be a certain amount of ambient reflection bouncing around, dimly illuminating areas that the light sources don't directly hit. The Phong model simulates the ambient effect by assigning light sources a constant ambient color. This ambient color gets added to the light's diffuse color after it's been multiplied by the dot product. The sum of ambient and diffuse effect colors is then multiplied by the surface's diffuse color to give the shaded result.

Specular reflection

Not all surfaces reflect light uniformly; many materials, including metals, glass, hair, and skin, have a reflective sheen. Unlike with diffuse reflection, if the viewer moves while a light source and shiny object remain stationary, the shine will move along the surface with the viewer. You can see this simulated in the flag demo by looking at the flagpole: as you drag the view up and down, the gold sheen moves along the pole with you. Physically, an object appears shiny when its surface is covered in highly reflective microfacets. These facets face every direction, creating a bright shiny spot where the light source reflects directly toward the viewer. This effect is called specular reflection.

The specular effect is caused by reflection from the light source to the viewer, so Phong shading simulates the specular effect by reflecting the light direction around the surface normal to give a reflection direction. We can then take the dot product of the reflection direction and the direction from the surface to the viewer. Microfacets on a specular surface follow a normal distribution: a plurality of facets lie parallel to the surface, and there is an exponential dropoff in the number of facets at steeper angles from the surface. The dropoff is sharper for more polished surfaces, giving a smaller, tighter specular highlight. Phong shading approximates this distribution by raising the dot product to an exponent called the shininess factor, with higher shininess giving a more polished shine and lower factors giving a more diffuse sheen. This final specular factor is then multiplied by the specular colors of the light source and surface, and the result added to the diffuse and ambient colors to give the final color. Non-specular surfaces have a transparent specular color with red, green, blue, and alpha components set to zero, which eliminates the specular term from the shading equation.

Implementing Phong shading in GLSL

Shading calculations are usually performed in the vertex and fragment shaders, where they can leverage the GPU's parallel processing power. (This is where the term "shader" for GPU programs comes from.) Let's bring back the graphics pipeline diagram to get an overview of the Phong shading dataflow:

For the best accuracy, we perform shading at a per-fragment level. (For better performance, shading can also be done in the vertex shader and the results interpolated between vertices, but this will lead to less accurate shading, especially for specular effects.) The vertex shader, flag.v.glsl, thus only performs transformation and projection, using the p_matrix and mv_matrix we pass in as uniforms. The shader forwards most of the material vertex attributes to varying variables for the fragment shader to use:

#version 110

uniform mat4 p_matrix, mv_matrix;
uniform sampler2D texture;

attribute vec3 position, normal;
attribute vec2 texcoord;
attribute float shininess;
attribute vec4 specular;

varying vec3 frag_position, frag_normal;
varying vec2 frag_texcoord;
varying float frag_shininess;
varying vec4 frag_specular;

void main()
{
    vec4 eye_position = mv_matrix * vec4(position, 1.0);
    gl_Position = p_matrix * eye_position;
    frag_position = eye_position.xyz;
    frag_normal   = (mv_matrix * vec4(normal, 0.0)).xyz;
    frag_texcoord = texcoord;
    frag_shininess = shininess;
    frag_specular = specular;
}

In addition to the texture coordinate, shininess, and specular color, the vertex shader also outputs to the fragment shader the model-view-transformed vertex position. The model-view matrix transforms the coordinate space so that the viewer is at the origin, so we can determine the surface-to-viewer direction needed by the specular calculation from ths transformed position. We likewise transform the normal vector to keep it in the same frame of reference as the position. Since the normal is a directional vector without a position, we apply the matrix to it with a w component of zero, which cancels out the translation of the modelview matrix and only applies its rotation. With this set of varying values, the fragment shader, flag.f.glsl, can perform the actual Phong calculation:

#version 110

uniform mat4 p_matrix, mv_matrix;
uniform sampler2D texture;

varying vec3 frag_position, frag_normal;
varying vec2 frag_texcoord;
varying float frag_shininess;
varying vec4 frag_specular;

const vec3 light_direction = vec3(0.408248, -0.816497, 0.408248);
const vec4 light_diffuse = vec4(0.8, 0.8, 0.8, 0.0);
const vec4 light_ambient = vec4(0.2, 0.2, 0.2, 1.0);
const vec4 light_specular = vec4(1.0, 1.0, 1.0, 1.0);

void main()
{
    vec3 mv_light_direction = (mv_matrix * vec4(light_direction, 0.0)).xyz,
         normal = normalize(frag_normal),
         eye = normalize(frag_position),
         reflection = reflect(mv_light_direction, normal);

    vec4 frag_diffuse = texture2D(texture, frag_texcoord);
    vec4 diffuse_factor
        = max(-dot(normal, mv_light_direction), 0.0) * light_diffuse;
    vec4 ambient_diffuse_factor
        = diffuse_factor + light_ambient;
    vec4 specular_factor
        = max(pow(-dot(reflection, eye), frag_shininess), 0.0)
            * light_specular;
    
    gl_FragColor = specular_factor * frag_specular
        + ambient_diffuse_factor * frag_diffuse;
}

To keep things simple, the shader defines a single light source using const values in the shader source. A real renderer would likely feed these light parameters in as uniform values, so that lights can be moved or their material attributes changed from the host program. With the light attributes embedded in the GLSL as constants, it's easy to change the light attributes in the source, press R to reload the shader, and see the result. Our light source acts as if it were infinitely far away, shining from the same light_direction on every surface in the scene. The light is white, with a 20% baseline ambient light level. It can be made colored by replacing light_diffuse, light_ambient, light_specular with RGBA values.

The fragment shader uses several new GLSL functions we haven't seen before:

We transform our constant light_direction to put it in the same coordinate space as the normal and eye vectors. We then sample the surface's diffuse color from the mesh texture We assign the shaded value to gl_FragColor to generate the final shaded fragment.

Tweaking the Phong model for stylistic effects

Before we wrap things up, let's take a quick look at how the Phong framework can be manipulated to give more stylized results. The classic Phong model is a photorealistic model: it attempts to model real-world light behavior. But photorealism isn't always desirable. Many games set themselves apart visually by using more stylized shading effects. These effects often use the basic Phong model of diffuse, ambient, and specular lighting, but they warp the individual factors before summing them together.

As a trivial example, we can get a brighter, softer shading effect if, instead of clamping the diffuse dot product of back-facing surfaces to zero, we scale it so that perpendicular surfaces receive half illumination, and back-facing surfaces scale linearly toward zero. Team Fortress 2 uses this "half Lambert" reflectance scale, so called because the standard Lambertian dropoff rate is halved, as a basis for its cartoonish but semi-photorealistic look (albeit heavily modified). Let's modify flag.f.glsl to warp the diffuse dot product:

float warp_diffuse(float d)
{
    return d * 0.5 + 0.5;
}

void main()
{
    // ...
    vec4 diffuse_factor
        = max(warp_diffuse(-dot(normal, mv_light_direction)), 0.0) * light_diffuse;
    // ...
}

A popular effect that builds from this half-Lambert scale is cel shading, in which a stair-step function is applied to the half-Lambert factor so that surfaces are shaded flatly with higher contrast between light and dark areas, in the style of traditional hand-drawn animation cels. Jet Set Radio pioneered this look, and it's since been used in countless games. Implementing it in GLSL is easy:

float cel(float d)
{
    return smoothstep(0.35, 0.37, d) * 0.4 + smoothstep(0.70, 0.72, d) * 0.6;
}

float warp_diffuse(float d)
{
    return cel(d * 0.5 + 0.5);
}

GLSL's smoothstep(lo,hi,x) function behaves like this: if x is less than lo, it returns 0.0; if greater than hi, it returns 1.0; if in between, it transitions linearly from zero to one. Our cel function above uses smoothstep to create three flat shading levels with short linear transitions in between.

There are other effects that can be performed by messing with the warp_diffuse function. For example, the function doesn't need to be float-to-float but could also map to a color scale; you could map greater dot products to warmer reddish colors while lesser products map to cooler bluish colors to give an artistic illustration effect. I encourage you to experiment with the fragment shader code to see what other effects you can create.

Conclusion

With Phong shading implemented, we can start adding additional effects to further improve the look of the flag scene. The most glaring problem is the lack of shadow cast by the flag, so next chapter we'll look at shadow mapping, a technique for rendering accurate shadows into a scene, and learn about off-screen framebuffer objects in the process. Meanwhile, if you're interested in learning more about real-time shading techniques on your own without an OpenGL bias, I highly recommend the book Real-Time Rendering.

« Chapter 3 | Table of Contents

An intro to modern OpenGL. Chapter 3: 3D transformation and projection

updated July 14, 2010 15:34:17 PDT

« Chapter 2.3 | Table of Contents | Chapter 4 »

The GPU's specialty, and by extension OpenGL's, is in rendering three-dimensional scenes. If you compare last chapter's hello-gl program to, say, Crysis, you might notice that our demo is missing one of those dimensions (among other things). In this chapter, I'm going to fix that. We'll cover the basic math that makes 3d rendering happen, looking at how transformations are done using matrices and how perspective projection works. Wikipedia does a great job going in-depth about the algorithmic details, so I'm going to spend most of my time talking at a high level about what math we use and why, linking to the relevant Wikipedia articles if you're interested in exploring further. As we look at different transformations, we're going to take the vertex shader from last chapter and extend it to implement those transformations, animating the "hello world" image by moving its rectangle around in 3d space.

Before we start, there are some changes we need to make to last chapter's hello-gl program so that it's easier to play around with. These changes will allow us to write different vertex shaders and supply them as command-line arguments when we run the program, like so:

./hello-gl hello-gl.v.glsl

You can pull these changes from my hello-gl-ch3 github repo.

Updating hello-gl

We'll start by expanding our vertex array to hold three-dimensional vectors. We'll actually pad them out to four components—the fourth component's purpose will become clear soon. For now, we'll just set all the fourth components to one. Let's update our vertex array data in hello-gl.c:

static const GLfloat g_vertex_buffer_data[] = { 
    -1.0f, -1.0f, 0.0f, 1.0f,
     1.0f, -1.0f, 0.0f, 1.0f,
    -1.0f,  1.0f, 0.0f, 1.0f,
     1.0f,  1.0f, 0.0f, 1.0f
};

and our glVertexAttribPointer call:

    glVertexAttribPointer(
        g_resources.attributes.position,  /* attribute */
        4,                                /* size */
        GL_FLOAT,                         /* type */
        GL_FALSE,                         /* normalized? */
        sizeof(GLfloat)*4,                /* stride */
        (void*)0                          /* array buffer offset */
    );

When we start transforming our rectangle, it will no longer completely cover the window, so let's add a glClear to our render function so we don't get garbage in the background. We'll set it to dark grey so it's distinct from the black background of our images:

static void render(void)
{
    glClearColor(0.1f, 0.1f, 0.1f, 1.0f);
    glClear(GL_COLOR_BUFFER_BIT);
    /* ... */
}

Now let's generalize a few things. First, we'll change our uniform state to include GLUT's timer value directly rather than the fade_factor precalculated. This will let our new vertex shaders perform additional time-based effects.

static void update_timer(void)
{
    int milliseconds = glutGet(GLUT_ELAPSED_TIME);
    g_resources.timer = (float)milliseconds * 0.001f;
    glutPostRedisplay();
}

You'll also have to search-and-replace all of the other references to fade_factor with timer. Once that's done, we'll change our main and make_resources functions so they can take the vertex shader filename as an argument. This way, we can easily switch between the different vertex shaders we'll be writing:

static int make_resources(const char *vertex_shader_file)
{
    /* ... */
    g_resources.vertex_shader = make_shader(
        GL_VERTEX_SHADER,
        vertex_shader_file
    );
    /* ... */
}
int main(int argc, char** argv)
{
    /* ... */
    if (!make_resources(argc >= 2 ? argv[1] : "hello-gl.v.glsl")) {
        fprintf(stderr, "Failed to load resources\n");
        return 1;
    }
    /* ... */
}

Now let's update our shaders to match our changes to the uniform state and vertex array. We can move the fade factor calculation into the vertex shader, which will pass it on to the fragment shader as a varying value. In hello-gl.v.glsl:

#version 110

uniform float timer;

attribute vec4 position;

varying vec2 texcoord;
varying float fade_factor;

void main()
{
    gl_Position = position;
    texcoord = position.xy * vec2(0.5) + vec2(0.5);
    fade_factor = sin(timer) * 0.5 + 0.5;
}

A new feature of GLSL I use here is vector swizzling: not only can you address the components of a vec type as if they were struct fields by using .x, .y, .z, and .w for the first through fourth components, you can also string together the element letters to collect multiple components in any order into a longer or shorter vector type. position.xy picks out as a vec2 the first two elements of our now four-component position vector. We can then feed that vec2 into the calculation for our texcoord, which remains two components long.

Finally, in hello-gl.f.glsl, we make fade_factor assume its new varying identity:

#version 110

uniform sampler2D textures[2];

varying float fade_factor;
varying vec2 texcoord;

void main()
{
    gl_FragColor = mix(
        texture2D(textures[0], texcoord),
        texture2D(textures[1], texcoord),
        fade_factor
    );
}

With those changes out of the way, we can recompile the executable once and not have to mess with C any more for the rest of the chapter. We can write new vertex shader files and execute them using ./hello-gl vertex-shader.v.glsl without recompiling anything. Now we're ready do some math!

Projection and world space

The destination space for the vertex shader, which I've been informally referring to as "screen space" in the last couple of chapters, is more precisely called projection space. The visible part of projection space is the unit-radius cube from (–1, –1, –1) to (1, 1, 1). Anything outside of this cube gets clipped and thrown out. The x and y axes map across the viewport, the part of the screen in which any rendered output will be displayed, with (–1, –1, z) corresponding to the lower left corner, (1, 1, z) to the upper right, and (0, 0, z) to the center. The rasterizer uses the z coordinate to assign a depth value to every fragment it generates; if the framebuffer has a depth buffer, these depth values can be compared against the depth values of previously rendered fragments, allowing parts of newly-rendered objects to be hidden behind objects that have already been rendered into the framebuffer. (x, y, –1) is the near plane and maps to the nearest depth value. At the other end, (x, y, 1) is the far plane and maps to the farthest depth value. Fragments with z coordinates outside of that range get clipped against these planes just like they do the edges of the screen.

Projection space is computationally convenient for the GPU, but it's not very usable by itself for modeling vertices within a scene. Rather than input projection-space vertices directly to the pipeline, most programs use the vertex shader to project objects into it. The pre-projection coordinate system used by the program is called world space, and can be moved, scaled, and rotated relative to projection space in whatever way the program needs. Within world space, objects also need to move around, changing position, orientation, size, and shape. Both of these operations, mapping world space to projection space and positioning objects in world space, are accomplished by performing transformations with mathematical structures called matrices.

Linear transformations with matrices

Linear transformations are operations on an object that preserve the relative size and orientation of parts within the object while uniformly changing its overall size or orientation. They include rotation, scaling, and shearing. If you've ever used the "free transform" tool in Photoshop or GIMP, these are the sorts of transformations it performs. You can think of a linear transformation as taking the x, y, and z axes of your coordinate space and mapping them to a new set of arbitrary axes x', y', and z':

For clarity, the figure is two-dimensional, but the same idea applies to 3d. To represent a linear transformation numerically, we can take the vector values of those new axes and arrange them into a 3×3 matrix. We can then perform an operation called matrix multiplication to apply a linear transformation to a vector, or to combine two transformations into a single matrix that represents the combined transformation. In standard mathematical notation, matrices are represented so that the axes are represented as columns going left-to-right. In GLSL and in the OpenGL API, matrices are represented as an array of vectors, each vector representing a column in the matrix. In source code, this results in the values looking transposed from their mathematical notation. This is called column-major order (as opposed to row-major order, in which each vector element of the matrix array would be a row of the matrix). GLSL provides 2×2, 3×3, and 4×4 matrix types named mat2 through mat4. It also overloads its multiplication operator for use between matn values of the same type, and between matns and vecns, to perform matrix-matrix and matrix-vector multiplication.

A nice property of linear transformations is that they work well with the rasterizer's linear interpolation. If we transform all of the vertices of a triangle using the same linear transformation, every point on its surface will retain its relative position to the vertices, so textures and other varying values will transform with the vertices they fill out.

Note that all linear transformations occur relative to the origin, that is, the (0, 0, 0) point of the coordinate system, which remains constant through a linear transformation. Because of this, moving an object around in space, called translation in mathematical terms, is not a linear transformation, and cannot be represented with a 3×3 matrix or composed into other 3×3 linear transform matrices. We'll see how to integrate translation into transformation matrices shortly. For now, let's try some linear transformations:

Rotation

We'll start by writing a shader that spins our rectangle around the z axis. Using the timer uniform value as a rotation angle, we'll construct a rotation matrix, using the sin and cos functions to rotate our matrix axes around the unit circle. The shader looks like this; it's in the repo as rotation.v.glsl:

#version 110

uniform float timer;

attribute vec4 position;

varying vec2 texcoord;
varying float fade_factor;

void main()
{
    mat3 rotation = mat3(
        vec3( cos(timer),  sin(timer),  0.0),
        vec3(-sin(timer),  cos(timer),  0.0),
        vec3(        0.0,         0.0,  1.0)
    );
    gl_Position = vec4(rotation * position.xyz, 1.0);
    texcoord = position.xy * vec2(0.5) + vec2(0.5);
    fade_factor = sin(timer) * 0.5 + 0.5;
}

(I'm going to be listing only the main function of the next few shaders; the uniform, attribute, and varying declarations will all remain the same from here.) With our changes to hello-gl we can run it like so:

./hello-gl rotation.v.glsl

And this is the result:

Scaling to fit the aspect ratio

You probably noticed that the rectangle appears to be horizontally distorted as it rotates. This is because our window is wider than it is tall, so the screen distance covered along a unit on the x axis of projection space is longer than the distance the same unit would cover along the y axis. The window is 400 pixels wide and 300 pixels high, giving it an aspect ratio of 4:3 (the width divided by the height). (This will change if we resize the window, but we won't worry about that for now.) We can compensate for this by applying a scaling matrix that scales the x axis by the reciprocal of the aspect ratio, as in window-scaled-rotation.v.glsl:

    mat3 window_scale = mat3(
        vec3(3.0/4.0, 0.0, 0.0),
        vec3(    0.0, 1.0, 0.0),
        vec3(    0.0, 0.0, 1.0)
    );
    mat3 rotation = mat3(
        vec3( cos(timer),  sin(timer),  0.0),
        vec3(-sin(timer),  cos(timer),  0.0),
        vec3(        0.0,         0.0,  1.0)
    );
    gl_Position = vec4(window_scale * rotation * position.xyz, 1.0);
    texcoord = position.xy * vec2(0.5) + vec2(0.5);
    fade_factor = sin(timer) * 0.5 + 0.5;

Note that the order in which we rotate and scale is important. Unlike scalar multiplication, matrix multiplication is noncommutative: Changing the order of the arguments gives different results. This should make intuitive sense: "rotate an object, then squish it horizontally" gives a different result from "squish an object horizontally, then rotate it". As matrix math, you write transformation sequences out right-to-left, backwards compared to English: scale * rotate * vector rotates the vector first, whereas rotate * scale * vector scales first.

Now that we've compensated for the distortion of our window's projection space, we've revealed a dirty secret. Our input rectangle is really a square, and it doesn't match the aspect ratio of our image, leaving it scrunched. We need to scale it again outward, this time before we rotate, as in window-object-scaled-rotation.v.glsl:

    mat3 window_scale = mat3(
        vec3(3.0/4.0, 0.0, 0.0),
        vec3(    0.0, 1.0, 0.0),
        vec3(    0.0, 0.0, 1.0)
    );
    mat3 rotation = mat3(
        vec3( cos(timer),  sin(timer),  0.0),
        vec3(-sin(timer),  cos(timer),  0.0),
        vec3(        0.0,         0.0,  1.0)
    );
    mat3 object_scale = mat3(
        vec3(4.0/3.0, 0.0, 0.0),
        vec3(    0.0, 1.0, 0.0),
        vec3(    0.0, 0.0, 1.0)
    );
    gl_Position = vec4(window_scale * rotation * object_scale * position.xyz, 1.0);
    texcoord = position.xy * vec2(0.5) + vec2(0.5);
    fade_factor = sin(timer) * 0.5 + 0.5;

(Alternately, we could change our vertex array and apply a scaling transformation to our generated texcoords. But I promised we wouldn't be changing the C anymore in this chapter.)

With this shader, our rectangle now rotates the way we would expect it to:

Projection and model-view matrices

The window_scale matrix conceptually serves a different purpose from the rotation and object_scale matrices. While the latter two matrices set up our input vertices to be where we want them in world space, the window_scale serves to project world space into projection space in a way that gives an undistorted final render. Matrices used to orient objects in world space, like our rotation and object_scale matrices, are called model-view matrices, because they are used both to transform models and to position them relative to the viewport. The matrix we use to project, in this case window_scale, is called the projection matrix. Although both kinds of matrix behave the same, and the line drawn between them is mathematically arbitrary, the distinction is useful because a 3d application will generally only need a few projection matrices that change rarely (usually only if the window size or screen resolution changes). On the other hand, there can be countless model-view matrices for all of the objects in a scene, which will update constantly as the objects animate.

Orthographic and perspective projection

Projecting with a scaling matrix, as we're doing here, produces an orthographic projection, in which objects in 3d space are rendered at a constant scale regardless of their distance from the viewport. Orthographic projections are useful for rendering two-dimensional display elements, such as the UI controls of a game or graphics tool, and in modeling applications where the artist needs to see the exact scales of different parts of a model, but they don't adequately present 3d scenes in a way most viewers expect. To demonstrate this, let's break out of the 2d plane and alter our shader to rotate the rectangle around the x axis, as in orthographic-rotation.v.glsl:

    const mat3 projection = mat3(
        vec3(3.0/4.0, 0.0, 0.0),
        vec3(    0.0, 1.0, 0.0),
        vec3(    0.0, 0.0, 1.0)
    );

    mat3 rotation = mat3(
        vec3(1.0,         0.0,         0.0),
        vec3(0.0,  cos(timer),  sin(timer)),
        vec3(0.0, -sin(timer),  cos(timer))
    );
    mat3 scale = mat3(
        vec3(4.0/3.0, 0.0, 0.0),
        vec3(    0.0, 1.0, 0.0),
        vec3(    0.0, 0.0, 1.0)
    );
    gl_Position = vec4(projection * rotation * scale * position.xyz, 1.0);
    texcoord = position.xy * vec2(0.5) + vec2(0.5);
    fade_factor = sin(timer) * 0.5 + 0.5;

With an orthographic projection, the rectangle doesn't very convincingly rotate in 3d space—it just sort of accordions up and down. This is because the top and bottom edges of the rectangle remain the same apparent size as they move toward and away from the view. In the real world, objects appear smaller in our field of view proportional to how far from our eyes they are. This effect is called perspective, and transforming objects to take perspective into account is called perspective projection. Perspective projection is accomplished by shrinking objects proportionally to their distance from the "eye". An easy way to do this is to divide each point's position by some function of its z coordinate. Let's arbitrarily decide that zero on the z axis remains unscaled, and that points elsewhere on the z axis scale by half their distance from zero. Correspondingly, let's also scale the z axis by half, so that the end of the rectangle coming toward us doesn't get clipped to the near plane as it gets magnified. We'll end up with the shader code in naive-perspective-rotation.v.glsl:

    const mat3 projection = mat3(
        vec3(3.0/4.0, 0.0, 0.0),
        vec3(    0.0, 1.0, 0.0),
        vec3(    0.0, 0.0, 0.5)
    );

    mat3 rotation = mat3(
        vec3(1.0,         0.0,         0.0),
        vec3(0.0,  cos(timer),  sin(timer)),
        vec3(0.0, -sin(timer),  cos(timer))
    );
    mat3 scale = mat3(
        vec3(4.0/3.0, 0.0, 0.0),
        vec3(    0.0, 1.0, 0.0),
        vec3(    0.0, 0.0, 1.0)
    );

    vec3 projected_position = projection * rotation * scale * position.xyz;
    float perspective_factor = projected_position.z * 0.5 + 1.0;

    gl_Position = vec4(projected_position/perspective_factor, 1.0);
    texcoord = position.xy * vec2(0.5) + vec2(0.5);
    fade_factor = sin(timer) * 0.5 + 0.5;

Now the overall shape of the rectangle appears to rotate in perspective, but the texture mapping is all kinky. This is because perspective projection is a nonlinear transformation—different parts of the rectangle get scaled differently depending on how far away they are. This interferes with the linear interpolation the rasterizer applies to the texture coordinates across the surface of our triangles. To properly project texture coordinates as well as other varying values in perspective, we need a different approach that takes the rasterizer into account.

Homogeneous coordinates

Directly applying perspective to an object may not be a linear transformation, but the divisor that perspective applies is a linear function of the perspective distance. If we stored the divisor out-of-band as an extra component of our vectors, we could apply perspective as a matrix transformation, and the rasterizer could linearly interpolate texture coordinates correctly before the perspective divisor is applied. This is in fact what that mysterious 1.0 we've been sticking in the fourth component of our vectors is for. The projection space that gl_Position addresses uses homogeneous coordinates. That fourth component, labeled w, divides the x, y, and z components when the coordinate is projected. In other words, the homogeneous coordinate [x:y:z:w] projects to the linear coordinate (x/w, y/w, z/w).

With this trick, we can construct a perspective matrix that maps distances on the z axis to scales on the w axis. As I mentioned, the rasterizer also interpolates varying values in homogeneous space, before the coordinates are projected, so texture coordinates and other varying values will blend correctly over perspective-projected triangles using this matrix. The 3×3 linear transformation matrices we've covered extend to 4×4 easily—just extend the columns to four components and add a fourth column that leaves the w axis unchanged. Let's update our vertex shader to use a proper perspective matrix and mat4s to transform our rectangle, as in perspective-rotation.v.glsl:

    const mat4 projection = mat4(
        vec4(3.0/4.0, 0.0, 0.0, 0.0),
        vec4(    0.0, 1.0, 0.0, 0.0),
        vec4(    0.0, 0.0, 0.5, 0.5),
        vec4(    0.0, 0.0, 0.0, 1.0)
    );

    mat4 rotation = mat4(
        vec4(1.0,         0.0,         0.0, 0.0),
        vec4(0.0,  cos(timer),  sin(timer), 0.0),
        vec4(0.0, -sin(timer),  cos(timer), 0.0),
        vec4(0.0,         0.0,         0.0, 1.0)
    );
    mat4 scale = mat4(
        vec4(4.0/3.0, 0.0, 0.0, 0.0),
        vec4(    0.0, 1.0, 0.0, 0.0),
        vec4(    0.0, 0.0, 1.0, 0.0),
        vec4(    0.0, 0.0, 0.0, 1.0)
    );

    gl_Position = projection * rotation * scale * position;
    texcoord = position.xy * vec2(0.5) + vec2(0.5);
    fade_factor = sin(timer) * 0.5 + 0.5;

The texture coordinates now project correctly with the rectangle as it rotates in perspective.

Affine transformations

Homogeneous coordinates let us pull another trick using 4×4 matrices. Earlier, I noted that translation cannot be represented in a 3×3 linear transformation matrix. While translation can be achieved by simple vector addition, combinations of translations and linear transformations can't be easily composed that way. However, by using the w axis column of a 4×4 matrix to map the w axis value back onto the x, y, and z axes, we can set up a translation matrix. The combination of a linear transformation with a translation is referred to as an affine transformation. Like our 3×3 linear transformation matrices, 4×4 affine transformation matrices can be multiplied together to give new matrices combining their transformations.

Constructing a view frustum matrix

The perspective projection matrix we constructed above gets the job done, but it's a bit ad-hoc. An easier to understand way of projecting world space would be to consider the origin to be the camera position and project from there. Now that we know how to make translation matrices, we can leave the model-view matrix to position the camera in world space. Different programs will also want to control the angle of view (α) of the projection, and the distance of the near (zn) and far (zf) planes in world space. A narrower angle of view will project a far-away object to a scale more similar to close objects, giving a zoomed-in effect, while a wider angle makes objects shrink more relative to their distance, giving a wider field of view. The ratio between the near and far planes affects the resolution of the depth buffer. If the planes are too far apart, or the near plane too close to zero, you'll get z-fighting, where the z coordinates of projected triangles differ by less than the depth buffer can represent, and depth testing gives invalid results, causing nearby objects to "fight" for pixels along their shared edge.

From these variables, we can come up with a general function to construct a projection matrix for any view frustum. The math is a little hairy; I'll describe what it does in broad strokes. With the camera at the origin, we can project the z axis directly to w axis values. In an affine transformation matrix, the bottom row is always set to [0 0 0 1]. This leaves the w axis unchanged. Changing this bottom row will cause the x, y, or z axis values to project onto the w axis, giving a perspective effect along the specified axis. In our case, setting that last row to [0 0 1 0] projects the z axis value directly to the perspective scale on w.

We'll then need to remap the range on the z axis from zn to zf so that it projects into the space between the near (–1) and far (1) planes of projection space. Taking the effect of the w coordinate into account, we'll have to map into the range from –zn (which with a w coordinate of zn will project to –1) to zf (which with a w coordinate that's also zf will project to 1). We do this by translating and scaling the z axis to fit this new range. The angle of view is determined by how much we scale the x and y axes. A scale of one gives a 45° angle of view; shrinking the axes gives a wider field of view, and growing them gives a narrower field, inversely proportional to the tangent of the angle of view. So that our output isn't distorted, we also scale the y axis proportionally to the aspect ratio (r) of the viewport.

Let's write one last shader using the view frustum matrix function. We'll translate the rectangle to set it 3 units in front of us. In addition to rotating around the x axis, we'll also change the translation over time to set it moving in a circle left to right and toward and away from us. Here's the code, from view-frustum-rotation.v.glsl:

#version 110

uniform float timer;

attribute vec4 position;

varying vec2 texcoord;
varying float fade_factor;

mat4 view_frustum(
    float angle_of_view,
    float aspect_ratio,
    float z_near,
    float z_far
) {
    return mat4(
        vec4(1.0/tan(angle_of_view),           0.0, 0.0, 0.0),
        vec4(0.0, aspect_ratio/tan(angle_of_view),  0.0, 0.0),
        vec4(0.0, 0.0,    (z_far+z_near)/(z_far-z_near), 1.0),
        vec4(0.0, 0.0, -2.0*z_far*z_near/(z_far-z_near), 0.0)
    );
}

mat4 scale(float x, float y, float z)
{
    return mat4(
        vec4(x,   0.0, 0.0, 0.0),
        vec4(0.0, y,   0.0, 0.0),
        vec4(0.0, 0.0, z,   0.0),
        vec4(0.0, 0.0, 0.0, 1.0)
    );
}

mat4 translate(float x, float y, float z)
{
    return mat4(
        vec4(1.0, 0.0, 0.0, 0.0),
        vec4(0.0, 1.0, 0.0, 0.0),
        vec4(0.0, 0.0, 1.0, 0.0),
        vec4(x,   y,   z,   1.0)
    );
}

mat4 rotate_x(float theta)
{
    return mat4(
        vec4(1.0,         0.0,         0.0, 0.0),
        vec4(0.0,  cos(timer),  sin(timer), 0.0),
        vec4(0.0, -sin(timer),  cos(timer), 0.0),
        vec4(0.0,         0.0,         0.0, 1.0)
    );
}

void main()
{
    gl_Position = view_frustum(radians(45.0), 4.0/3.0, 0.5, 5.0)
        * translate(cos(timer), 0.0, 3.0+sin(timer))
        * rotate_x(timer)
        * scale(4.0/3.0, 1.0, 1.0)
        * position;
    texcoord = position.xy * vec2(0.5) + vec2(0.5);
    fade_factor = sin(timer) * 0.5 + 0.5;
}

And this is what we get:

Conclusion

Matrix multiplication is by far the most common operation in a 3d rendering pipeline. The rotation, scaling, translation, and frustum matrices we've covered are the basic structures that make 3d graphics happen. With these fundamentals covered, we're now ready to start building 3d scenes. If you want to learn more about 3d math, the book 3d Math Primer for Graphics and Game Development gives excellent in-depth coverage beyond the basics I've touched on here.

In this chapter, I've been demonstrating matrix math by writing code completely within the vertex shader. Constructing our matrices in the vertex shader will cause the matrices to be redundantly calculated for every single vertex. For this simple four-vertex program, it's not a big deal; I stuck to GLSL because it has great support for matrix math built into the language, and demonstrating both the concepts of matrix math and a hoary C math library would make things even more confusing. Unfortunately, OpenGL provides no matrix or vector math through the C API, so we'd need to use a third-party library, such as libSIMDx86, to perform this math outside of shaders. In a real program with potentially thousands of vertices, the extra matrix math overhead in the vertex shader will add up. Projection matrices generally apply to an entire scene and only need to be recalculated when the window is resized or the screen resolution changed, and model-view matrices usually change only between frames and apply to sets of vertices, so it is more efficient to precalculate these matrices and feed them to the shader as uniforms or attributes. This is how we'll do things from now on.

In the next chapter, we'll leave this lame "hello world" program behind and write a program that renders a more sophisticated 3d scene. In the process, we'll look at the next most important aspect of 3d rendering after transformation and projection: lighting.

« Chapter 2.3 | Table of Contents | Chapter 4 »

An intro to modern OpenGL. Table of Contents

updated July 14, 2010 15:33:27 PDT

To make it easier for people jumping into my OpenGL tutorial from the middle, I'm going to keep this post up to date with the new articles as I post them.

An intro to modern OpenGL, in Chinese

updated April 25, 2010 19:25:51 PDT

Kang Songrui is in the process of translating my intro to modern OpenGL articles to Chinese. He recently posted his translation of the first chapter.

As for chapter 4 of the English tutorial, rest assured it's coming. I've had money-making-related projects getting in the way.



Archives
© 2012 Durian Software. | Contact Us