240MB Per Frame: The saveLayer() Trap
The renderer dropped to 15fps at 3000×2000 resolution. Same scene ran smoothly at 1500×1000. Four times fewer pixels, four times faster.
That math didn't add up. GPU fill rate shouldn't be the bottleneck here.
The Problem
Shadows and blur effects need saveLayer()—Skia's way of rendering to an offscreen buffer so you can apply filters before compositing back. Standard approach:
canvas.saveLayer(paint, nullptr); // Create offscreen buffer
// Draw content
canvas.restore(); // Apply filter and composite
That nullptr means "I don't know how big the content will be, allocate enough space for the whole canvas."
At 3000×2000, each saveLayer(nullptr) allocates:
- 3000 × 2000 × 4 bytes (RGBA) = 24MB
- 6 million pixels to clear, fill, composite
The scene had 10 effects: 5 shadows, 2 blend modes, 3 masks.
10 layers × 24MB = 240MB per frame
At 30fps: 7.2 GB/sec memory bandwidth
No wonder it was slow.
First Attempt: Ignore It
"Modern GPUs have plenty of memory bandwidth. This should be fine."
It wasn't fine. The allocations were killing performance—not because the GPU ran out of memory, but because touching 240MB of VRAM every 16 milliseconds destroyed cache locality and stalled the pipeline.
Second Attempt: Bounded Layers
Skia's saveLayer() takes an optional bounds rectangle:
SkRect bounds = SkRect::MakeLTRB(x, y, w, h);
canvas.saveLayer(paint, &bounds);
Instead of allocating for the full 3000×2000 canvas, allocate only what's needed for this specific element and its effects.
A typical UI element: 400×300 pixels Shadow blur radius: ~50 pixels Bounded layer size: 500×400 = 200,000 pixels
Memory per layer: 200,000 × 4 = 0.8MB 30× reduction
Great! Ship it.
Except rotated elements got their shadows clipped. The bounds were wrong.
Why It Failed: Coordinate Space Confusion
The coordinate space problem. When saveLayer() is called:
- Canvas has the parent's transform already applied
saveLayer(bounds)needs bounds in parent space- Then we call
setTransform()to apply the node's transform - Content gets drawn in local space
Understanding Transform Matrices
A 2D transform matrix looks like:
| a c tx | where: a, d = scale
| b d ty | b, c = rotation/skew
| 0 0 1 | tx, ty = translation
To transform a point:
new_x = a * x + c * y + tx
new_y = b * x + d * y + ty
Example - 45° rotation:
cos(45°) ≈ 0.707, sin(45°) ≈ 0.707
| 0.707 -0.707 0 |
| 0.707 0.707 0 |
| 0 0 1 |
Point (100, 0) transforms to:
new_x = 0.707 * 100 + (-0.707) * 0 + 0 = 70.7
new_y = 0.707 * 100 + 0.707 * 0 + 0 = 70.7
Result: (70.7, 70.7) ← 45° rotation!
The Bug: Wrong Order of Operations
We were calculating bounds like this:
// WRONG: Spread added in local space
SkRect corners[] = {
{-spread, -spread}, // Top-left with spread
{width+spread, -spread}, // etc...
};
// Then transform to parent space
For a non-rotated 100×100 square with 20px shadow, this works fine:
Local corners with spread: (-20, -20) to (120, 120)
After identity transform: (-20, -20) to (120, 120) ✓
Shadow extends 20px uniformly
For a 45° rotated 100×100 square, it breaks:
Local corners with spread: (-20, -20) to (120, 120)
After 45° rotation:
Top-left: (-20, -20) → (-0, -28.3) ← rotated
Top-right: (120, -20) → (70.7, -99) ← extends too far!
Bottom-right: (120, 120) → (170, 70.7)
...
The shadow extends 141px along the rotated diagonal (20√2), not 20px uniformly. Wrong!
What We Need
The shadow should extend 20px in screen space (what the user sees), not along the element's rotated axes.
Think about it physically: when you hold a square piece of paper at 45° under a light, the shadow doesn't suddenly become longer—it stays the same width in real space. That's screen space.
The Solution
Transform first, then add spread:
// Get the COMPOSED transform (parent × node)
Matrix worldMatrix = composedMatrix;
// Step 1: Transform node corners to parent space
SkPoint corners[4] = {
{0, 0}, {width, 0}, {width, height}, {0, height}
};
for (auto& corner : corners) {
float parentX = a * corner.x + c * corner.y + tx;
float parentY = b * corner.x + d * corner.y + ty;
// Track min/max
}
// Step 2: Add spread in PARENT SPACE (screen space)
bounds.left = minX - maxSpread;
bounds.top = minY - maxSpread;
bounds.right = maxX + maxSpread;
bounds.bottom = maxY + maxSpread;
Now a 45° rotated square gets its shadow spreading 20px uniformly in screen coordinates, not along the rotated axes.
The trick: effects extend in screen space, not element-local space.
The Composed Matrix Mistake
Initially we used node._matrix—the node's own transform. Missing: all parent transforms (ancestor rotations, scales, translations).
For a rotated element inside a rotated parent, we were only accounting for one of the rotations. Bounds were wrong by the accumulated parent transform.
Fixed by storing the composed matrix (parent × node) passed into effectsStart():
effectsStart(node: Node, matrix: Matrix) {
// matrix = full parent × node transform
node._composedMatrix = matrix; // Store for bounds calculation
buildEffect(node, ...);
}
Now bounds are calculated in the correct coordinate space: parent space where saveLayer() interprets them.
Results
| Before | After | Improvement |
|---|---|---|
| 240MB/frame | 8MB/frame | 30× less memory |
| 60-100ms frames | 10-20ms frames | 6× faster |
| 10-16fps | 60fps | Target achieved |
The bounded layers worked. Rotated elements rendered correctly. Shadows extended properly in all directions.
The Irony
The code already had calculateWorldSpaceBounds()—the function that does the correct math. It just wasn't being called. Someone had written the solution, then used nullptr anyway with a comment:
// Bounded saveLayer is complex because effectsStart() is called BEFORE
// setTransform(). Canvas has parent's transform but content will be drawn
// after node's transform is applied. This creates a coordinate space
// mismatch that's difficult to resolve cleanly.
Translation: "I don't understand the coordinate spaces, so I'll allocate 24MB to be safe."
Thirty times more memory because coordinate math seemed hard.
What This Enabled
Bounded layers made large canvas rendering viable:
- 3000×2000 at 60fps: Previously impossible, now standard
- Effect stacking: Multiple shadows/blurs no longer compound memory usage quadratically
- Mobile support: 8MB per frame fits in mobile GPU budgets
The fix was 20 lines of math. The performance gain was 30×.
Sometimes the biggest wins come from using the API correctly, not from clever optimization.
Read next: The WebAssembly Memory Dance - Passing arrays between JavaScript and C++ without crashing.