July 29, 2020

Face Colors Extraction

One project’s crucial feature was to suggest makeup based on photo of a person. There already was a code that supposed to do the job, but it was too inaccurate.

After testing different libraries that mark face parts I decided to use face-api.js because it was the fastest free accurate enough solution. In examples they used jQuery and a bunch of other libraries, so I’ve made a pure js MVP that takes a photo and draw dots and areas over it.

Correct data would probably give an adequate result. Face-api give 5 points for each brow. Previous solution just picked a center one and used it as a color source, but it is definitely a bad idea.

Extract data plan

My approach was to take face regions, get all points from them, remove side deviations, cluster the result and take the median of that clusters.

Mark out triangles on a face was the easiest part. Just some geometry calculations applied to points and I get a lot of triangle areas:

It was crucial to mark that triangles in such a way that they only contain one region of a face and do not overlap other parts.

After a lot of tests I’ve found out that on different face angles some areas can become irrelevant. For example, cheek can become hidden by the nose. As a solution I started to drop right part of face zones when left orange triangle is more than two times longer than the right one.

Extracting points from triangles

Now we have these triangles and want to get the color data. Usually developers solve the opposite problem: any 3D application fill triangles to display a 3D model.

Today this coloring is done by the GPU and developers actually do not calculate positions of all points in these triangles.

I used this approach of getting points:

Sort all points by Y axis and get P₁, P₂, P₃
Find the position of point P_Z on the P₁-P₃ edge that has the same Y as P₂
Get points between lines P₁-P₂ and P₁-P_Z by getting points in horizontal lines. It is easy to do by calculating the ratio of (P₂x-P₁x)/(P₂y-P₁y) and adding this.offset on each line.

var xStart = P1.x, xEnd = P1.x;
for(var y = P1.y; y <= P2.y; y++){
    for(var x = xStart; x <= xEnd; y++){
    getColor(x,y) // do something with it
    }
  xStart -= P1P2ratio;
  xEnd += P1PZratio;
}

Get points between lines P₂-P₃ and P_Z-P₃

Edge cases:

If P₂.y equals to P₁.y → P_Z = P₁ and we skip the third step
If P₂.y equals to P₃.y → P_Z = P₃ and we skip the fourth step

Here is a live demo:

After this part it was easy to extract all points of marked up regions.

Getting similar colors

After extracting all points from a triangle it is time for clustering and getting the median color. To get similar colors we should have a color difference formula.

Previous solution just used distance measure in RGB space:

This formula gives more difference between different grades of blue color than between blue and violet.

LAB space

In LAB space colors are made from Lumina, A (green-red) scale, and B (blue-yellow). Here is a corresponding wikipedia article. While I was reading that article I understood that I want some another approach.

HSV/HSB/HSL

This color space was developed by one of Pixar founders. Hue for color, S for saturation, and V\B\L for Value\Brightness\Lightness. These values are different for printers and monitors. For example, in photoshop you can alter hue of an image and recolorize everything:

Here is a great article that describes all the depths of this color models.

This model suits perfectly for the task, it would give a huge difference between different colors and lightness. Tiny hue changes greatly affects the color, so I’ve set a huge weight on that scale.

And here we go into the cloistering problem.

Clustering points

There are a lot of clustering algorithms. Simplest one can be made by anyone: just get distance between all data. The closest group of elements would be the solution. But here is a problem with this approach: we need to compare each point to all other points, it is a quadratic complexity. So for 1000 points we need make a million comparisons! But it would work.

Clever approach (k-mean implementation).

We take k random points from the data and call them “crystallization centers”.
Compare other points only with this k crystallization centers, and decide which cluster every point belongs.
For each cluster get the middle point and call it a new “crystallization center”.
Repeat steps 2-3 until centers are not settled.

This algorithm can go into the infinite loop, so I added a force stop after 500 iterations.

Result:

Finally, with this data I used CIELAB color distance measure to compare extracted colors with available store products.

Today I prefer oklch color space. Here is a good article from Evil Martians.