This page is incomplete.
Projection
A camera’s main job is to project the 3D scene onto a 2D image by defining a mapping between 2D image space and 3D camera or eye space (see spaces). The camera model refers to how a virtual camera is implemented to imitate a real camera. This may be as simple as defining a rectilinear projection, or include more complex optical distortion effects that appear with real camera lenses or lens systems.
A perspective projection is the most common. Objects further away appear smaller, it is rectilinear (preserving straight lines), and can be defined by a projection matrix, unlike fisheye projections for example. A typical perspective projection matrix models a perfect pinhole camera, visualized in the image below, where light from the scene passes through a point (which is considered the camera’s origin, or position) and falls on the image plane. The image is formed by a bounding region on the image plane, where the film or digital sensor would be (sensor is a good term as it includes the concepts of pixels and image resolution). In a real pinhole camera the field of view would be defined by how far away the image plane is from the pinhole and how big the sensor is. However, these things are not necessary to model and the perspective projection can simply include a field of view attribute directly. Aspect ratio, near and far planes are also important to define a viewing volume, discussed further at projections.
Position
The next thing to discuss is positioning the camera in the virtual world. An x, y, z position and either yaw, pitch, roll euler rotation or x, y, z, w quaternion (discussed later. also see rotations) is standard. Both camera position and rotation can be combined into a single transformation matrix which defines a mapping between world space and eye space.
It is logical to think of the perspective projection’s viewing volume placed at the camera’s position and rotation in world space along with all the other objects in the scene, as in the image below. However this may lead to some confusion when dealing with the camera’s transform. Rather than placing the camera at a position in world space, like other objects, it is the inverse transform. It may help to think of the camera transform as moving the world around it to position and orientate it correctly (i.e. rotate the world left to look right). To explain this further, rendering begins with geometry (primarily vertex positions) in object space, which is ultimately transformed into image space. The visualization here is in world space, so the chain is image?eye?world?object. An important distinction is that the transform to place an object in the world space is in the opposite direction as eye to world. The camera transform is the inverse of a transform to place a model at the same spot in world space.
Many 3D graphics tutorials start by introducing a “look-at” function, which places the camera at a given position in world space and looking towards a target position. An “up” direction vector is also specified to support arbitrary rotation. This may be fine for a fixed test view, but in most cases the camera’s orientation needs to be set directly for smooth movement/interpolation. Euler rotations are typically used for cameras where there is a fixed up-direction due to the human-relatable Gimbal lock effect. Any orientation can still be described by euler angles, although further rotation by summing rotation components is not the same as adding arbitrary rotations. For example, a yaw rotation (horizontal rotation) becomes a roll (the image rotates about its centre) if the pitch (vertical rotation) is far from horizontal, pointing up or down.
Quaternions allow efficient arbitrary rotation addition and smooth interpolation. Some usage examples include the orientation of a plane in a flight simulator or a trackball in a modelling package.