November 12, 2022

Pipe 2 JS

One day I caught myself in the middle of a quite boring cycle: I had to extract and aggregate some statistics from logs on one remote server. It was my project and I could implement this functionality into the API, but such requests happened ~twice a day and were unique.

So, I gziped this logs, copied them to the public folder that was served, downloaded the data to a local PC and then made an instant script that extracts the data, or I wrote code in VIM and executed it on the remote server. There were repetitive parts of code each time, but the only functionality of such code were:

Get the data
Convert them to an Array
Map-filter-reduce everything

Here I caught an insight! If you are a JavaScript developer, then this is a tool that you have missed for the whole life.

Use javaScript to Map, Filter, Reduce the standard input to the standard output. It uses streams, so you can process really huge files.

And on top of it — now you can dive into JSON files right on your servers.

*nix OSes have a pipe concept. They can process any amount of data.

Have you ever need to search data in some huge file and extract some information from it? Then this project would become a crucial life improver for you.

Pipes and duct tape

I do not know how to use a lot of linux common utils, for example, I know how to grep substring in a string, but do not remember how to find a pattern and this is one of cases when I use pipe2js on daily basis:

find | pipe2js -x

/etc/passwd contains lines with data concatenated by :. Here is an article with detailed description of each part. We want to convert it to list of users and print each user home directory path.

cat /etc/passwd | pipe2js -m 'line.split(":")' \
-m '{login: line[0], UID: line[2], GID: line[3], home: line[5], shell: line[6]}' \
-m '`${line.login} home at ${line.home}`'

root home at /root
daemon home at /usr/sbin
bin home at /bin
sys home at /dev
sync home at /bin
games home at /usr/games
man home at /var/cache/man
mail home at /var/mail
.........
snow home at /home/snow

I did it in 3 mappings. It can be done in different ways:

This would also work:

cat /etc/passwd | pipe2js -m 'line.split(":")' \
-m '`${line[0]} home at ${line[5]}`'

And this inline function declaration with argument Array destructuring would also do the job:

cat /etc/passwd | pipe2js \
-m '(([login,pass,uid,gid,smth,home])=>({login,pass,uid,gid,smth,home}))(line.split(":"))' \
-m '`${line.login} home at ${line.home}`'

And the -k (keys) modifier can make this code even simpler:

cat /etc/passwd | pipe2js \
-m 'line.split(":")' \
-m -k 'login, , UID, GID,, home' \
-m '`${obj.login} home at ${obj.home}`'

What is obj in the last example??? You can use line, obj, and a — they mean the same. I prefer to use obj when I am operating with object.

For now I would just add example of pipe2js usage

# JSON output in human readable way:
curl --header "PRIVATE-TOKEN: __PRIVATE-TOKEN__" "https://gitlab.com/api/v4/groups/10500637/projects" | pipe2js -j --flat

In the next iteration here would be a playground

On some projects we append JSON objects into log files, so our logs looks like this:

{"type":"listFiles","date":1682014163241,"user":"ivan.dg@work.com","arn":"com.work.playbooks-test"}
{"type":"listFiles","date":1682014163247,"user":"ivan.dg@work.com","arn":"com.work.another-bucket"}
{"type":"listFiles","date":1682059112054,"user":"w.ks@work.com","arn":"com.work.another-bucket"}
{"type":"listFiles","date":1682059112107,"user":"w.ks@work.com","arn":"com.work.playbooks-test"}
{"type":"download","date":1682059143729,"user":"w.ks@work.com","fileName":"test.txt","arn":"s3:::com.work.playbooks-test"}
{"type":"listFiles","date":1682145233055,"user":"w.ks@work.com","arn":"com.work.another-bucket"}
{"type":"listFiles","date":1682145233105,"user":"w.ks@work.com","arn":"com.work.playbooks-test"}
{"type":"listFiles","date":1682516010732,"user":"i.kubota@work.com","arn":"com.work.another-bucket"}
{"type":"listFiles","date":1682516010916,"user":"i.kubota@work.com","arn":"com.work.playbooks-test"}
{"type":"listFiles","date":1683532058211,"user":"i.kubota@work.com","arn":"com.work.playbooks-test"}
{"type":"listFiles","date":1683532058217,"user":"i.kubota@work.com","arn":"com.work.another-bucket"}
{"type":"download","date":1683532065413,"user":"i.kubota@work.com","fileName":"f1/6ip1.jfif","arn":"s3:::com.work.another-bucket"}

When I want to filter some event I do this:

 cat build-storage-log | pipe2js -j -f "obj.type==='download'"

It can be easily done by grep, but here I have an exact match of property without complicated RegExps. If I want to count such events — I would just add —count.

JSONs are processed in a Stream way, so it can process logs of any size. Parser is not very fast, but it is reliable. Tested it with 1GB logs.