Managing Draw Calls For Zoomed Out Tilemap Views
Reducing Draw Calls from Distant Tiles
The core problem: Too many draw calls when rendering large tilemaps at a zoom level showing much of the map
A common optimization problem in 2D games using tilemap systems is managing the number of draw calls generated when viewing a large portion of the map from a zoomed out perspective. Each visible tile typically generates at least one draw call to render the tile sprite, resulting in potentially thousands of draw calls per frame at extreme zoom levels encompassing the entire tilemap. This overwhelms the GPU with too many draw calls to process efficiently, causing performance drops.
Unity’s native tilemap system utilizes the TilemapRenderer component to handle rendering batched chunks of tiles in an efficient manner. However, when zooming out to view most or all of a large tilemap, the number of draw calls can still easily reach problematic levels. There are several techniques we can use to reduce draw calls in this situation, centered around culling off-screen tiles, utilizing Unity’s dynamic batching capabilities, and instancing tile chunks.
Setting up tilemap rendering in Unity
Unity’s Tilemap feature allows arranging tiles on a grid to build up 2D environments. A Grid GameObject acts as a container, with one or more Tilemap GameObjects as children that store the tile data. Painting tiles in the Tile Palette tool maps sprites onto the Tilemap’s grid. The Tilemap Renderer then handles generating meshes from this grid data and getting it to the graphics hardware for the fastest rendering method available.
The key components involved are:
- Grid – Rasterized space to build the tilemap on
- Tilemap – Assigns sprites onto grid cells
- Tile – Holds a sprite texture
- Tile Palette – Tool to paint tiles onto the tilemap
- Tilemap Renderer – Handles batching and rendering
There are also optional components like Tilemap Collider 2D for collision detection. The Tilemap Renderer is the main focus here for optimization.
Using the TilemapRenderer component efficiently
The TilemapRenderer batches compatible nearby chunks of tiles together into meshes for fewer draw calls. Tiles that don’t batch well with others such as animated tiles may still cause extra draw calls.
Here are some ways to configure the TilemapRenderer for optimal performance:
- Enable Batching in the Tilemap Renderer settings – This enables chunk batching
- Use Static if the map won’t change – Allows faster rendering using GPU memory
- Enable Auto Refresh – Automatically rebatches on changes
- Use Match Width Or Height on Tilemaps – Improves batching efficiency
These provide a good starting point, but more work is needed at extreme zoom levels.
Implementing view frustum culling
Understanding the camera view frustum
The view frustum is the 3D viewing volume visible by a camera. With an orthographic camera used in 2D, this appears as a rectangular box. Any objects outside of this box are culled (hidden). We can leverage this by culling tile chunks to not render tiles the camera can’t see.
Key concepts related to using view frustums for culling include:
- Planes – The 6 box planes (top, bottom, left, right, near, far) define the view volume edges
- Distance from plane – Can be checked per object to see if inside/outside the frustum
- World space vs. view space – Frustum planes defined relative to the camera viewing orientation
By transforming tile locations to view space and checking against the planes, tiles outside the frustum can be culled from rendering.
Culling tiles outside the view frustum
View frustum culling works by taking some extra steps in the tile rendering process:
- On pre-render, get view frustum planes from the current camera
- Transform all potential tile locations to view coordinates
- Check transformed positions against planes, cull tiles outside
- Render only un-culled tiles for the frame
This avoids submitting tiles that won’t be visible for rendering, reducing GPU load. The tiles to check depend on factors like scroll position and zoom level, tracking which tile coordinates could be visible.
A simple way to integrate frustum culling without modifying Unity’s TilemapRenderer is to selectively disable Tilemap GameObjects when the associated tiles are out of view. This leverages Unity’s culling of disabled GameObjects.
Dynamic batching to meshes
In addition to chunk batching within TilemapRenderers, enabling Unity’s dynamic batching feature can combine meshes from multiple objects including tile chunks. This further reduces draw calls by consolidating compatible meshes.
Dynamic batching provides increase performance by:
- Reducing draw calls for improved CPU/GPU communication
- More efficient shader variety handling
- Potentially faster rendering by keeping data on the GPU
The key considerations when enabling dynamic batching include:
- Compatible shaders and textures is required on all objects
- Mesh vertices are limited to 65k total in a single batch
- Best applied selectively where benefits outweigh the extra batching overhead
Test if dynamically batching tile chunks helps optimization further at a particular zoom level.
Creating tilemap layers and sorting them back-to-front
By splitting up one large Tilemap into layers segmented by render order, tiles can be sorted properly back to front. This reduces overdraw from distant background tiles getting drawn atop closer tiles.
Potential approaches include:
- Separate tilemaps per depth layer
- Divide tilemap into tile block chunks sorted by depth bucket
- Custom sort logic at render time
Automatically sorting small chunks works well. Testing different subdivision thresholds can find the right balance between sort overhead, overdraw reduction, and keeping enough tiles batched together.
Custom shader code can also optimize based on depth, avoiding pixel work for distant background layers obscured by foreground.
Optimizing draw calls with GPU instancing
GPU instancing is a rendering technique that allows drawing multiple instances of the same mesh with one draw call. This provides major performance gains when many copies of the same object are visible, applicable to tilemaps.
The advantages of GPU instancing include:
- Drastically fewer draw calls compared to drawing tiles individually
- Better utilization of GPU hardware
- Scales efficiently to display hundreds of thousands of instances
Instancing works by passing an array of transformation matrices to the GPU, displacement each instance while only referencing one shared mesh. Interface changes allow configuring properties like color per-instance too.
Applying GPU instancing to tilemaps would involve creating mesh instances for visible tile chunks. Compatible tiles get consolidated into pooled archetype meshes for efficient re-use in multiple areas.
Example code showing implementation
This C# code demonstrates a simplified approach to apply some of the optimization techniques discussed on a component managing TilemapRenderer objects:
using UnityEngine; using System.Collections.Generic; public class TilemapOptimization : MonoBehaviour { public TilemapRenderer[] tilemapRenderers; void OnPreCull() { // Frustum culling CalculateVisibleTileBounds(); DisableOutOfViewTilemaps(); // Depth sorting SortTilemapsByDepth(); // Dynamic batching EnableDynamicBatching(tilemapRenderers); // GPU Instancing SetUpTilemapInstancing(); } void CalculateVisibleTileBounds() { // Pseudocode // Get camera view frustum planes // Transform tile bounds to view space // Check against planes // Update array of visible tiles } void DisableOutOfViewTilemaps() { // Disable tilemap gameObjects with out of view tiles } void SortTilemapsByDepth() { // Custom sort logic // Update sort order on tilemapRenderers } void EnableDynamicBatching(TilemapRenderer[] maps) { // Enable dynamic batching on eligible tilemapRenderers } void SetUpTilemapInstancing() { // Instantiate pooled tile mesh archetypes // Set up instance data buffer } }
Modifying the built-in TilemapRenderer inspector directly allows integrating some optimizations, while other techniques require custom handling in scripts.
Reviewing Optimization Results
Profile GPU usage with Unity profiler
The Unity profiler includes vital statistics for diagnosing graphics and CPU bottlenecks. Keep this open with detailed GPU profiling enabled when testing view zooms and scrolling tilemap areas. The key indicators are:
- Number of draw calls – Should decrease significantly after optimizations
- Batch count – Should increase from techniques like dynamic batching
- GPU time – Look for reductions when applying optimizations
The profiler helps gauge exactly where time is spent – identify heavy tiles or areas to focus further improvements.
Check number of draw calls
Lower draw call count is the primary goal of applying optimizations. At extreme zooms the original count could be near 10k or more. Expect orders of magnitude reduction from combining optimizations, ideally well under 1k.
If graphics performance remains low despite decent draw call count, investigating other bottlenecks like excessive GPU overdraw is worthwhile.
Compare before and after optimization
Quantifying by how much optimizations enhance performance provides essential feedback. Record key metrics like draw calls and frame time while scrolling around a map both before and after integrating changes.
Ideally average frame times should be consistently under 16.6ms to sustain 60 FPS. Pay special attention to spikes above this from suboptimal regions of the map.
Test game performance at various zoom levels
The actual gameplay experience is what matters for players enjoying smooth scrolling and interactions at any zoom. Thorough playthroughs after optimization iterations help ensure consistent maintainable framerates:
- Zoom way out and slowly pan around entire map
- Zoom in and rapidly scroll through local areas
- Check frame time fluctuations when quickly changing zoom levels
- Verify no lag in controls when interacting with tilemap
Further profiling and optimizations can target any remaining problem areas as needed.
Further Optimization Possibilities
There are even more options to explore for pushing tilemap performance and graphics quality further. Some additional optimization directions include:
- Utilize ECS and Jobs for more efficient multi-threaded updates
- Leverage Burst compilation for cores physics/game code
- Offload work onto the GPU by implementing custom shaders
- Faking details with normal maps instead of dense geometry
- Experiment with Tongue twister textures for detailed environments
Optimization often requires rethinking core architecture and algorithms outside quick fixes. But the payoff from optimized graphics code scales in enabling richer gameplay unhindered by lag across target platforms.