jq – manipulating JSON in shell

jq is amazing. It is an unique combination of javascript and linux shell which gives an immensely powerful tool to work with JSON files (This post gives a introduction to JSON format) . It plays really well with the existing shell tools and has quickly become one of the most used tools in my data analysis/ processing pipeline.

jq is like sed (streaming editor). It is takes an input stream, applies the expression on it and returns an output stream. It does not modifies files directly. The syntax is,

 input stream | jq 'expression' | output stream

Input and Outputs

The input and output streams are just plain text streams. They can be a file, program, http request etc. etc. For example consider the following commands,

curl "https://jsonplaceholder.typicode.com/posts/" | jq '.[0:5]'  > posts.json

cat posts.json | jq '.[].id' > post_ids.json

cat post_ids.json | jq '.' | curl -X POST -d "$(</dev/stdin)" "http://ptsv2.com/t/5jo6w-1522072388/post"

The first one gets json data from the url, filters the first 5 elements and puts that in posts.json file. The second one takes this posts.json file, filters just the ids from each element and puts that in post_ids.json file. The third one takes this post_ids.json file and posts all of it to a http api as a post request (the results are here). In all these examples, jq does nothing but change input stream and send it to output text stream. This makes it extremely efficient and versatile.


The expression part in jq is essentially a tiny javascript engine which is used to manipulate the JSON. This is really powerful. A full list things than can be done is available in the manual. I’ll just outline some basic selection and filtering

selection expressions
. - Shows the original object
.keyname - selects the specific field in the object
.[] - selects all elements (if the object is an array)
.[index (:no of elements)] - selects the specified index from the array

function expressions (in addition to basic arithmetic)
length - returns length of the array
keys - returns fields in an object
map - applies a function to all the elements in an array
del - deletes an object
sel - returns an object in an array if the condition is met
test - regex like pattern matching

All these can be combined, nested and piped to each other (yes, these are pipes within pipes) indefinitely to manipulate JSON. For example consider the following JSON file named data.json

   "id": 1,
   "title": "sunt aut facere",
   "body": "quia et t architecto"
   "id": 2,
   "title": "qui est esse",
   "body": "est rerum tempore"
   "id": 3,
   "title": "ea molestias quasi",
   "body": "et iusto sed quo"
   "id": 4,
   "title": "eum et est occaecati",
   "body": "ullam et saepe"
   "id": 5,
   "title": "nesciunt quas odio",
   "body": "repudiandae veniam quaerat"

This can be filtered in the following ways,

'.' - all the data.

'.[0]' - first element of the data

'.[1:3]' - three elements from index 1 (ie, second, third and fourth elements.)

'.[0].title' - title of the first element

'.[].id' - ids of all elements "1,2,3,4,5"

'[.[].id]' - ids of all elements as an array. "[1,2,3,4,5]"

'. | length' - number of elements (5)

'.[] | length' - number of elements in each object of the array [3,3,3,3,3]

'.[0] | keys' - the fields/keys in the first element

'.[] | select(.id==3)' - the element with id as 3

'. | del(.[2])' - everything but third element

'. | del((.[] | select(.id==3)))' - everything but the element with id as 3

'. | map(.id = .id+1)' - increase the id variable for all elements by 1

'. | map(del(.id))' - remove the field id from all elements

'.[] | select(.body | test("et"))' - elements with 'et' in the body fields

Combining all these we can easily explore and process, json files right from linux terminal and finally the data can be organised in an array and exported as a csv using the @csv function. For example,

cat data.json | jq -r '.[] | [.id, .title, .body] | @csv' > data.csv

the -r is important since that makes jq to output raw csv text.


Importance of transitions in interactive visualisations.

I am a very big fan of transitions when presenting complex data with interactive visualisations. I believe that transitions play an important role in building a continuous narrative in audience’s mind when they are trying to understand connected series of information. To be clear, I am talking specifically about transition of data between the stages of visualisation and not the transition of the canvas of the visualisation. Today lets look at one of my favourite visualisations of all time – this brilliant interactive one by NYtimes, showcasing the budget proposal made by US government in 2012

The story here starts with showing whole budget along with the scale of 900bn budget deficit proposed, then we move to see that the most of the spending is mandatory over which the govt. has no control and the smaller discretionary spending is the one which we are having the discussion on, then we move to see these discretionary spending categories compared with each other to the gainers and losers from last year and finally we see them put together along with the mandatory spending to complete the picture.

The amazing thing about this graphic is the way the circles (departments / budget categories), which are the basic unit of the analysis, move between the stages of the story – specially from step 2 to step 3. Not only it reinforces the concept of mandatory spending it also prepares us to what we are going to be see in the next stage. This half second transition fundamentally changes the clarity and legibility of the whole story for a casual audience.

I am currently working on a medium sized dataset trying to filter it with various rules to produce a final number and this kind of visualisation seems perfect to show the changes in the results as I move to different stages of filtering. I will be posting on the progress and stay tuned for the updates.


Understanding Javascript Objects and JSON Data.

The first time I heard of JSON (JavaScript Object Notation) when was trying to get data out of twitter. At that time I was new to javascript and was figuring out a lot of stuff at once so the whole thing was very confusing and incredible hard to grasp. Now when I think back, It would have saved me a lot of time if someone just gave a clear overview of how objects worked in javascript and how the same pattern (jargon alert!) when used to transfer data becomes JSON. This is the reason why I am writing this post. As a disclaimer, I am not a programmer but an urban planner and my understanding of this subject is purely based on my practical experience trying to build things relevant in my field so bear with me if any of this is inaccurate or wrong. Please feel free to point out the mistakes.

To start, let us set up the environment for learning. Since objects in JavaScript and JSON data are abstract concepts there is no way for a normal person to understand these without actually seeing them in action. So it is time to open the console in your browser (preferably chrome) and start typing. The console in chrome can be started by pressing (cmd+alt+j or ctrl+shift+j) and you can just type in the commands one by one. It is as simple as that.

Chrome JS Console

By default the console opens in the global namespace of the tab which is open. In plain english the console is like a virtual space where you create, destroy, modify objects and these objects gets displayed in the browser window (rendering) based on their properties. For example the tab has an object associated with it, the html document, the body and every element (image, text etc) has an object associated with it. This is called DOM – Document Object Model which is used to manipulate HTML elements with JavaScript. The point is this is a virtual space to contain objects and this is where we would be working. This space already has a lot of objects and one shouldn’t confuse them with our own. To check our understanding lets type ‘window’ in the console and press return. It returns some text, this is the object which denotes the chrome window. When you click the triangle in the left side (expand it) you can see all the contents of the object.

Screen Shot 2014-12-03 at 16.01.29

Now typing window.innerHeight or window.innerWidth will give you the height and width of the window. You can resize the window and type it again to see it changing. Here the important things are, 1) We can create objects in this virtual space with properties to represent stuff we see (a window in this case) 2) the “.” denotes the property of an object (i’ll elaborate this later).

Now it is time for us to create our own object. Lets say this object is a digital representation of yourself. Lets start by creating the object by typing me = {}. This literally translates to english as ‘me’ is an empty object. The curly brackets denote that ‘me’ is an object and it is empty since it has nothing in it. Now typing ‘me’ again and pressing enter will return an empty object. Now lets go one step ahead and ask my name. Since it is a property, you use the “.”. So type “me.name” and see the result. It says it is “undefined” because we have created the object but never defined any properties.  Lets set one by typing me.name = ‘bala’ (note the single quotes. i’ll explain later) . Now asking me.name will return the name “bala” and asking for the whole object ‘me’ will now give the properties as well.

Screen Shot 2014-12-03 at 16.16.35

If you have come this far then you understand object, its properties, how to create and return them. Now it is time to understand some data structures (this sounds way too complex than necessary). For practical purposes we just have to know 4 major data structures – Number, String (of characters), Array (of elements) and Boolean. Most of the data we use will fall under these data structure. Numbers are stored as is without any special way to mark them. for example, Typing in me.age = 25 will set the property ‘age’ in me as a number which is 25. Strings are denoted by double/single quotes around them if the quotes are not present then it is considered as a name of an object (variable) and will return an error if it cannot find one. for example, me.city = london will return an error while me.city = ‘london’ will be OK. Arrays are collection of data in a specific order and is denoted by square brackets. for example, me.address = [221,’b’,’Baker Street’] will set the address as an array of 3 elements. typing me.address will return and array, while typing me.address[0] will return 221. the square brackets with a number is the way of accessing an element within an array when you know its place (the counting starts at 0). Try to get the street name out of the ‘me’ object.

Screen Shot 2014-12-03 at 16.48.55

A string is like a special case of array (with only single characters) so you can do the array type of queries to strings as well. for example, me.city[0] will return ‘l’. Boolean is either ‘true’ or ‘false’ (without quotes). for example, you can do me.graduate=true. This helps where you can directly use this instead checking for conditions (like if else statement). With this 4 basic data type you can create a model for almost every kind of objects we usually encounter. To sum up, we create objects with curly brackets, object is something which has properties, properties can be of various types of data (number,string, array, boolean), we set and access properties of objects by using the ‘.’, arrays are created by using square brackets and we set and access contents of array by using square brackets.

There is one more thing we need to know which is that there is a simpler way to create objects than setting properties one by one. which brings us closer to the JSON. instead of using the ‘.’ we can directly write the contents of the object using ‘,’ and ‘:’. So combining all the steps above in creating the me object, we just do,

Screen Shot 2014-12-04 at 18.23.37

With these basics we can now move to interesting stuff – Nesting and References.


When we talked about data structures though we talked about 4, we actually learned 5. The fifth one is objects. This means that a property of an object can be an object. This introduces amazing capabilities to javascript objects. for example, me.education = {} will create an empty object for education. and me.education.school = ‘kvp’ will set the property for me.education. This process can be theoretically repeated forever (if the memory permits).

Screen Shot 2014-12-04 at 17.56.38

This has two major significance. 1) While modelling sparse data, this makes our object memory efficient (this is the reason it is used in data transfer). for example, a table which shows scores of 5 students in 10 different courses and every student attends 3 courses can be modelled as a javascript object as shown below,

Screen Shot 2014-12-04 at 18.13.21

Screen Shot 2014-12-04 at 18.12.52

You essentially don’t need to have null/empty spaces. If something is null then it is just not there. 2) you can model infinitely rich objects by nesting different types of objects together. e.g. in the above example if some course has two markers and two scores you just introduce and object instead of the number, no need to change your structure of the data ( schema ).

Screen Shot 2014-12-04 at 18.28.44


This is another way javascript optimises memory usage. Every object’s association with its name is just a reference. for example create an variable a = 20, create another variable using this variable b = a , check both variables. now do b = 40 and check the value of a. The results are shown below. This is logical and is what you expect to happen.

Screen Shot 2014-12-04 at 18.34.36

Now try and do the same with objects, a = { name:’bala’, age:25 } ; b = a ; b.age = 27. Now check the contents of a!

Screen Shot 2014-12-04 at 18.42.44

This behaviour is because JavaScript does not duplicates objects in memory when assigned to variables but just references the object to the variable name. This allows us to create infinitely rich data structures with finite memory. Funny example is below,

Screen Shot 2014-12-04 at 18.43.53

That concludes our overall introduction to understanding objects in javascript. With this foundation, let us move to JSON.


To be brief, JSON is just a notation to create javascript objects. It is exactly the same as what we discussed above. The only thing which is new is the property names should be enclosed in quotes (since JSON has no concept of variables). You evaluate a JSON string and it gives you a JavaScript object thats it. Nothing more. While thinking about JSON, as an alternative you can think of it as a collection of key value pairs (this kind of thinking helps while working with php). Keys are always strings and values can be anything (even another collection of key value pairs).

Screen Shot 2014-12-04 at 18.51.11

The difference between JavaScript objects and JSON is that JSON is a notation i.e. it is a text file written outside javascript and cannot be executed. It is a normal data / text file like csv. You can essentially read a csv as a string, split it into parts using the newline and comma characters and create an object/array in javascript but with JSON it is much simpler (just do eval) and more powerful (nesting).

Thats it. Hope this helps absolute beginners as myself in understanding these concepts faster.

P.S. If you have come this far, congratulations! You now know how data is modelled and stored in mongodb! All you have to do to use mongodb is learn commands.

Complete graph creator using Raphael.js

update 27 Dec 2017: fixed mistake in code and fixed link.

Complete graph creator

This one is very similar to A Simple Gravity Model except that this one is made with javascript (Raphael.js)  and does not has the gravity model for the width of the links. I made this as a demonstration for how easy it is to make interactive graphics with javascript. With less than 40 lines of code, with raphael and javascript we can create this complete graph generator where you can click the canvas to create nodes and click and drag the nodes to move them. the links are generated and updated based on the position of the nodes. I am planning to create a full suite of tools for making and analysing networks online for which this is the first step. [ link]


<!DOCTYPE html>
	<script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/raphael/2.2.7/raphael.min.js"></script>
<body >
		var paper = Raphael(0,0,'100%','100%');
		var background = paper.rect(0,0,'100%','100%').attr({'fill':'#ddd','stroke-width':'0'});
		var circles = [];
		var lines = [];
			circles.push(new circle(event.clientX,event.clientY));
		function refreshLines(arr) {
			for (i in lines) {
			for (i in arr) {
				for (j in arr) {
					lines.push(new line(arr[i].attrs.cx,arr[i].attrs.cy,arr[j].attrs.cx,arr[j].attrs.cy));
		function circle (mx,my) {
			return paper.circle(mx,my,10).attr({fill:'#444',stroke:'#444'}).drag(
					var color = this.attrs.fill=="#444" ? '#f00' : '#444';
					var color = this.attrs.fill=="#444" ? '#f00' : '#444';
		function line (sx,sy,ex,ey) {
			return  paper.path( "M"+sx+","+sy+" "+"L"+ex+","+ey ).attr({stroke:'#444'});

New Project – RefNet

After trying to organise the reading for my research last week, I realised that research process in my mind is not organised as a list or a checklist but as a network of interconnected ideas from various sources. This is where I felt the reference managers which I was using were failing miserably. Though they did a good job in organising the meta data on the papers, books and articles which I was reading and including them as references in my write-up, they did not help me in the research process. My research still remained as an exercise where I go through search engines and list of references in other papers manually and trying to put together all the stuff in my mind by myself. This is where I decided that If I cannot find a tool which I want I would rather build one myself and also that all the things I learned about networks and web development in the past year has to be put in use somewhere.

So here it is, RefNet – A reference manager which organises the references/bibliography as network of objects rather than a list. The idea is to build a tool where you can drag and drop papers and books as objects, and based on the citations in them they are organised as network of interconnected ideas. I started a github repository and using vivagraph library (inspired from here), put together a very preliminary working concept and added some data on the things I have been reading the past week. The result is as below, (click the image for interactive version)


The plan forward is to make the tool more dynamic with drag and drop option, automatic citation importing from a database such as web of knowledge, possibly a suggestion tool to say which papers to read further based on the network properties and finally a plugin to integrate this with google docs/ ms word. As mentioned earlier the project and code as of now is up on github (here) and would be really happy to collaborate with interested people on building this.

First Crowdsourced Project.


The Image above is a static screenshot of dynamic, interactive and crowdsourced map I created to map people from School of Planning and Architecture, Delhi and see how they are distributed all over the world. I initially circulated within my batch and later in a broader group and the response has been really good so far with the counter crossing 100 as of yesterday.Though it is not some thing really advanced or jaw dropping, I am really excited to see how easy it is to collect and visualize data (especially geographic) if one knows the right tools. The tools used are MySQL server, Apache (PHP) server, JavaScript (with jQuery), Google Maps API v3, Chrome and Sublimetext.

The visualisation is similar to what I did for the IRIS competition earlier but the difference is in the backend. Instead of reading a preset datafile and displaying it, this map here has a MySQL database in backend and queries it through PHP and visualises the result. It also has PHP based POST mechanism to send data to the database from the user. The best part is that none of the data in the image above is collected or entered by me (except for my two data points). It is rather generated by the people who individually entered their own locations.

Webkit Speech API and Google Maps

For the past couple of weeks in addition to working for the dissertation project, I was working sporadically on a demo of a speech integrated map using x-webkit-speech API provided by Google Chrome and Google Maps JavaScript API.Now that the dissertation is over, here is the final polished version of the demo, (http://goo.gl/pqYUsN)

Demo for mixing -x-webkit-speech and google maps API

The current functions available with this map are,

1) Speak to navigate – Click on the microphone button (or  click “ctrl+alt+.” ) on the text box and start speaking. The field will recognise when you stop speaking, analyse and interpret the sound as text and if it is a place,  takes you right to place you just spoke.

2) Zoom and Pan – you can use the same trick with some preliminary commands as well, the system as of now understands ,

east direction“, “west direction“, “south direction” and “north direction” will pan the map in the corresponding direction.
zoom in/ zoom out ” zooms the map.

3) Other commands,

satellite – switches the map to a satellite map
simple – switched the map to, above shown simplified default look and feel.

As usual, I request the readers to give it a try and share the results & problems in the comments section below. Also feel free to put in your suggestions and point out any mistakes.