Requirements of the physical setup

This document describes the physical setup and challenges for utilizing tag-grid-scanner in real-world applications.

Grid

The tags need to be aligned in a uniform grid, usually on a transparent surface, such that the camera can view the tag grid from below:

Rectangular, equidistant, uniform: All rows have the same number of tags. All columns have the same number of tags. The spacing between tags always the same (see Gaps). All tags have the same size.
Transparent: The camera needs to be able to see the tags. If you choose to use an infrared camera and infrared lighting, you may get away with a surface that is opaque to visible light, but this is untested.
Gaps: It is possible to add gaps between the tags. This usually makes it easier to insert and remove tags from the grid. Note that additional light might shine through the gaps and glare into the camera. Therefore, the gaps should either be kept small or covered by an opaque material.

Illuminating the tag grid

Uniform lighting: Try to achieve similar lighting conditions for all tags, at least when viewed from the camera position.
Noise reduction: If the lighting is not strong enough, the camera image will contain a lot of noise that can drastically reduce detection quality. It may also cause detection results to vary from one frame to the next, even though the tags did not change.
Glare: The light sources must not be visible to the camera. Add a lens hood if necessary. Consider that reflections of light sources off glossy surfaces (such as the transparent board holding the tags) may cause glare issues are well.

Camera choice

Choosing the right camera is important for the best detection results:

Wide angle lens: If you plan to place the camera below the tag grid, the camera probably needs a somewhat wide angle lens to capture the whole grid from a short distance. You might get away with a longer focal length if the grid and tag sizes are small or you capture the grid via a mirror, but care has to be taken not to distort the image by the additional reflection (e.g. due to a mirror that’s not perfectly flat). If you use a zoom lens, it should be possible to lock it such that it does not change accidentally.
Fixed or manual focus: Many lenses have different fields of view for different focus distances. Lens calibration and region of interest setting may also only be valid for a specific focus distance. Therefore, the lens should either be fixed focus or manual focus. Another possibility is to use a camera that allows locking the focus distance via a software setting.
Wide dynamic range: If there are additional light sources visible from the camera’s location, e.g. directly or via reflections on glossy surfaces, the camera image might get overexposed in certain areas and this may detection issues when these bright spots are close to one of the tags. We have made good experience with cameras that support a wide dynamic range and, therefore, attenuate this issue automatically. Another option is to tweak the exposure time, if your camera supports that, but this may not be possible depending on the dynamic range of your scene.
Lens distortion: Either choose a camera/lens that does not take any measures to correct lens distortion, or try to choose a camera/lens combination that corrects distortion very well. Certain types of lens distortion can be corrected by the tag-grid-scanner given proper calibration data. However, trying to correct distortion on an image that has been generated through in-camera distortion correction might not give satisfactory results.

We settled on the ELP 3MP WDR USB camera with 2.9mm lens (ELP USB3MP01H) and corrected lens distortion in software.

Camera placement

The positioning of the camera also affects the quality of the detection results.

Position the camera as orthogonal as possible to the tag grid to avoid excessive keystone distortion.
If something is projected onto the tag grid, the projection might shine through the grid (between the tags or if tags are removed) and glare into the camera lens. Depending on the brightness of the projection, this may result in overexposed areas in the camera image causing detection problems for adjacent tags.

Take both aspects into account and position the camera just outside the frustum created by the projector and the tag grid, but not too far away to keep keystone distortion low. Add a bit of margin to be on the safe side with respect to overexposure issues.

Camera mount

The camera mount needs to be very rigid. The camera should not move at all. Otherwise, re-calibration will be necessary.

Fiducial markers

The tag-grid-scanner can also detect fiducial markers, such as ArUco or AprilTag markers, that act as physical reference points for the region of interest. These markers are tracked from one frame to the next to compensate for camera movements after initial configuration. If all preconfigured markers are found in the distortion-corrected camera image, a perspective coordinate transformation (homography) is derived from the matched markers. This transformation is then used to map the originally configured region of interest into a valid region of interest with respect to the current image. In many cases, this can compensate for small to medium camera movements caused by vibrations or thermal expansion of the camera mount or furniture. It is highly recommended to include such fiducial markers in the physical setup.

The markers must be placed such that the polygon defined by the convex hull of all marker corners does not intersect the tag grid.
The markers must be fully visible in the camera image after the lens distortion correction has been applied.

The final setup may look like this: Example setup with fiducial markers