March 15, 2017

240MB Per Frame: The saveLayer() Trap

The renderer dropped to 15fps at 3000×2000 resolution. Same scene ran smoothly at 1500×1000. Four times fewer pixels, four times faster.

That math didn't add up. GPU fill rate shouldn't be the bottleneck here.

The Problem

Shadows and blur effects need saveLayer()—Skia's way of rendering to an offscreen buffer so you can apply filters before compositing back. Standard approach:

canvas.saveLayer(paint, nullptr);  // Create offscreen buffer
// Draw content
canvas.restore();  // Apply filter and composite

That nullptr means "I don't know how big the content will be, allocate enough space for the whole canvas."

At 3000×2000, each saveLayer(nullptr) allocates:

3000 × 2000 × 4 bytes (RGBA) = 24MB
6 million pixels to clear, fill, composite

The scene had 10 effects: 5 shadows, 2 blend modes, 3 masks.

10 layers × 24MB = 240MB per frame

At 30fps: 7.2 GB/sec memory bandwidth

No wonder it was slow.

First Attempt: Ignore It

"Modern GPUs have plenty of memory bandwidth. This should be fine."

It wasn't fine. The allocations were killing performance—not because the GPU ran out of memory, but because touching 240MB of VRAM every 16 milliseconds destroyed cache locality and stalled the pipeline.

Second Attempt: Bounded Layers

Skia's saveLayer() takes an optional bounds rectangle:

SkRect bounds = SkRect::MakeLTRB(x, y, w, h);
canvas.saveLayer(paint, &bounds);

Instead of allocating for the full 3000×2000 canvas, allocate only what's needed for this specific element and its effects.

A typical UI element: 400×300 pixels Shadow blur radius: ~50 pixels Bounded layer size: 500×400 = 200,000 pixels

Memory per layer: 200,000 × 4 = 0.8MB 30× reduction

Great! Ship it.

Except rotated elements got their shadows clipped. The bounds were wrong.

Why It Failed: Coordinate Space Confusion

The coordinate space problem. When saveLayer() is called:

Canvas has the parent's transform already applied
saveLayer(bounds) needs bounds in parent space
Then we call setTransform() to apply the node's transform
Content gets drawn in local space

Understanding Transform Matrices

A 2D transform matrix looks like:

| a  c  tx |    where: a, d = scale
| b  d  ty |           b, c = rotation/skew
| 0  0   1 |           tx, ty = translation

To transform a point:

new_x = a * x + c * y + tx
new_y = b * x + d * y + ty

Example - 45° rotation:

cos(45°) ≈ 0.707, sin(45°) ≈ 0.707

| 0.707  -0.707   0 |
| 0.707   0.707   0 |
| 0       0       1 |

Point (100, 0) transforms to:
new_x = 0.707 * 100 + (-0.707) * 0 + 0 = 70.7
new_y = 0.707 * 100 + 0.707 * 0 + 0 = 70.7
Result: (70.7, 70.7)  ← 45° rotation!

The Bug: Wrong Order of Operations

We were calculating bounds like this:

// WRONG: Spread added in local space
SkRect corners[] = {
  {-spread, -spread},         // Top-left with spread
  {width+spread, -spread},    // etc...
};
// Then transform to parent space

For a non-rotated 100×100 square with 20px shadow, this works fine:

Local corners with spread: (-20, -20) to (120, 120)
After identity transform: (-20, -20) to (120, 120) ✓
Shadow extends 20px uniformly

For a 45° rotated 100×100 square, it breaks:

Local corners with spread: (-20, -20) to (120, 120)

After 45° rotation:
Top-left: (-20, -20) → (-0, -28.3)   ← rotated
Top-right: (120, -20) → (70.7, -99)  ← extends too far!
Bottom-right: (120, 120) → (170, 70.7)
...

The shadow extends 141px along the rotated diagonal (20√2), not 20px uniformly. Wrong!

What We Need

The shadow should extend 20px in screen space (what the user sees), not along the element's rotated axes.

Think about it physically: when you hold a square piece of paper at 45° under a light, the shadow doesn't suddenly become longer—it stays the same width in real space. That's screen space.

The Solution

Transform first, then add spread:

// Get the COMPOSED transform (parent × node)
Matrix worldMatrix = composedMatrix;

// Step 1: Transform node corners to parent space
SkPoint corners[4] = {
  {0, 0}, {width, 0}, {width, height}, {0, height}
};

for (auto& corner : corners) {
  float parentX = a * corner.x + c * corner.y + tx;
  float parentY = b * corner.x + d * corner.y + ty;
  // Track min/max
}

// Step 2: Add spread in PARENT SPACE (screen space)
bounds.left = minX - maxSpread;
bounds.top = minY - maxSpread;
bounds.right = maxX + maxSpread;
bounds.bottom = maxY + maxSpread;

Now a 45° rotated square gets its shadow spreading 20px uniformly in screen coordinates, not along the rotated axes.

The trick: effects extend in screen space, not element-local space.

The Composed Matrix Mistake

Initially we used node._matrix—the node's own transform. Missing: all parent transforms (ancestor rotations, scales, translations).

For a rotated element inside a rotated parent, we were only accounting for one of the rotations. Bounds were wrong by the accumulated parent transform.

Fixed by storing the composed matrix (parent × node) passed into effectsStart():

effectsStart(node: Node, matrix: Matrix) {
  // matrix = full parent × node transform
  node._composedMatrix = matrix;  // Store for bounds calculation
  buildEffect(node, ...);
}

Now bounds are calculated in the correct coordinate space: parent space where saveLayer() interprets them.

Results

Before	After	Improvement
240MB/frame	8MB/frame	30× less memory
60-100ms frames	10-20ms frames	6× faster
10-16fps	60fps	Target achieved

The bounded layers worked. Rotated elements rendered correctly. Shadows extended properly in all directions.

The Irony

The code already had calculateWorldSpaceBounds()—the function that does the correct math. It just wasn't being called. Someone had written the solution, then used nullptr anyway with a comment:

// Bounded saveLayer is complex because effectsStart() is called BEFORE
// setTransform(). Canvas has parent's transform but content will be drawn
// after node's transform is applied. This creates a coordinate space
// mismatch that's difficult to resolve cleanly.

Translation: "I don't understand the coordinate spaces, so I'll allocate 24MB to be safe."

Thirty times more memory because coordinate math seemed hard.

What This Enabled

Bounded layers made large canvas rendering viable:

3000×2000 at 60fps: Previously impossible, now standard
Effect stacking: Multiple shadows/blurs no longer compound memory usage quadratically
Mobile support: 8MB per frame fits in mobile GPU budgets

The fix was 20 lines of math. The performance gain was 30×.

Sometimes the biggest wins come from using the API correctly, not from clever optimization.

Read next: The WebAssembly Memory Dance - Passing arrays between JavaScript and C++ without crashing.