April 25, 2024

PoW Captcha

While I was thinking about protecting potion generator world from bots — my first idea was to make 5-10 requests between client and server to ensure that client receives some data, process it and return the result. It would already be a problem for stupid crawlers, because they would need keep the context, send-respond to ajax for a lot of time, so it would take more work than received profit. I've implemented this sequence, and then — idea of Proof Of Work captcha explodes right in my head.

This concept is from the crypto world. Mining new bitcoin block is a complicated process that can take a lot of time, but proofing that block was actually mined can be done instantly by anyone.

What would be the work?

I would use a mulberry32 algorithm. It is a seedable well-distributed random generator.

If we have some fixed string abcdef, and we seed the random with it — we would always generate the same random sequence. Here is my article about seeded random and mulberry32.

Let's try:

We have got the same sequence over and over, but where is the actual PoW?

As the bitcoin mining process — we would ask to find some easily verifiable result. We would look for code substring in the result sequence.

Steps of work:

Ask server for the work string
Bruteforce through other string that we would add to the work string and use as seed until base 36 random would give substring that we are searching.
Pass other string back to the server, so it can instantly verify that random seed: work string+other string would give the correct result.

Let's mine

First match happens fast, but then we can see that result strings repeats. This can be caused by my dummy implementation of seeding or due to mulberry32 function. It gives rather good pseudorandom, but not perfect. If you continue clicking Work button till three letters are used — next match would happen only when counter hits 1mzz.

In real implementation I use another counter that I add to the end of the bruteforce progress. Here we see 5 random sequences (actually each is combined from two). They are all produced with the same seed. So, after trying different techniques — the best one is to generate 36*36 random sequences for each bruteforce number. It completely eliminates bad seeding or mulberry collisions.

Full captcha algorithm

Request seed from the server
Server generates the seed and stores first random sequence (FRS) generated by this seed in the hash table. Also, there would be stored number of verification steps.
Receive seed and generate FRS. Start bruteforce and try to find some keyword in the result.
Send FRS, and bruteforce number to the server.
Server validates that FRS is correct, adds bruteforce to the stored seed and check if this attempt generates sequence that contains a keyword. If this is the case — decrease number of verification steps by one, return to step 2.. If number of verification steps is zero — return token that would be used to perform the real action.