jq – manipulating JSON in shell

jq is amazing. It is an unique combination of javascript and linux shell which gives an immensely powerful tool to work with JSON files (This post gives a introduction to JSON format) . It plays really well with the existing shell tools and has quickly become one of the most used tools in my data analysis/ processing pipeline.

jq is like sed (streaming editor). It is takes an input stream, applies the expression on it and returns an output stream. It does not modifies files directly. The syntax is,

 input stream | jq 'expression' | output stream

Input and Outputs

The input and output streams are just plain text streams. They can be a file, program, http request etc. etc. For example consider the following commands,

curl "https://jsonplaceholder.typicode.com/posts/" | jq '.[0:5]'  > posts.json

cat posts.json | jq '.[].id' > post_ids.json

cat post_ids.json | jq '.' | curl -X POST -d "$(</dev/stdin)" "http://ptsv2.com/t/5jo6w-1522072388/post"

The first one gets json data from the url, filters the first 5 elements and puts that in posts.json file. The second one takes this posts.json file, filters just the ids from each element and puts that in post_ids.json file. The third one takes this post_ids.json file and posts all of it to a http api as a post request (the results are here). In all these examples, jq does nothing but change input stream and send it to output text stream. This makes it extremely efficient and versatile.


The expression part in jq is essentially a tiny javascript engine which is used to manipulate the JSON. This is really powerful. A full list things than can be done is available in the manual. I’ll just outline some basic selection and filtering

selection expressions
. - Shows the original object
.keyname - selects the specific field in the object
.[] - selects all elements (if the object is an array)
.[index (:no of elements)] - selects the specified index from the array

function expressions (in addition to basic arithmetic)
length - returns length of the array
keys - returns fields in an object
map - applies a function to all the elements in an array
del - deletes an object
sel - returns an object in an array if the condition is met
test - regex like pattern matching

All these can be combined, nested and piped to each other (yes, these are pipes within pipes) indefinitely to manipulate JSON. For example consider the following JSON file named data.json

   "id": 1,
   "title": "sunt aut facere",
   "body": "quia et t architecto"
   "id": 2,
   "title": "qui est esse",
   "body": "est rerum tempore"
   "id": 3,
   "title": "ea molestias quasi",
   "body": "et iusto sed quo"
   "id": 4,
   "title": "eum et est occaecati",
   "body": "ullam et saepe"
   "id": 5,
   "title": "nesciunt quas odio",
   "body": "repudiandae veniam quaerat"

This can be filtered in the following ways,

'.' - all the data.

'.[0]' - first element of the data

'.[1:3]' - three elements from index 1 (ie, second, third and fourth elements.)

'.[0].title' - title of the first element

'.[].id' - ids of all elements "1,2,3,4,5"

'[.[].id]' - ids of all elements as an array. "[1,2,3,4,5]"

'. | length' - number of elements (5)

'.[] | length' - number of elements in each object of the array [3,3,3,3,3]

'.[0] | keys' - the fields/keys in the first element

'.[] | select(.id==3)' - the element with id as 3

'. | del(.[2])' - everything but third element

'. | del((.[] | select(.id==3)))' - everything but the element with id as 3

'. | map(.id = .id+1)' - increase the id variable for all elements by 1

'. | map(del(.id))' - remove the field id from all elements

'.[] | select(.body | test("et"))' - elements with 'et' in the body fields

Combining all these we can easily explore and process, json files right from linux terminal and finally the data can be organised in an array and exported as a csv using the @csv function. For example,

cat data.json | jq -r '.[] | [.id, .title, .body] | @csv' > data.csv

the -r is important since that makes jq to output raw csv text.


Understanding Javascript Objects and JSON Data.

The first time I heard of JSON (JavaScript Object Notation) when was trying to get data out of twitter. At that time I was new to javascript and was figuring out a lot of stuff at once so the whole thing was very confusing and incredible hard to grasp. Now when I think back, It would have saved me a lot of time if someone just gave a clear overview of how objects worked in javascript and how the same pattern (jargon alert!) when used to transfer data becomes JSON. This is the reason why I am writing this post. As a disclaimer, I am not a programmer but an urban planner and my understanding of this subject is purely based on my practical experience trying to build things relevant in my field so bear with me if any of this is inaccurate or wrong. Please feel free to point out the mistakes.

To start, let us set up the environment for learning. Since objects in JavaScript and JSON data are abstract concepts there is no way for a normal person to understand these without actually seeing them in action. So it is time to open the console in your browser (preferably chrome) and start typing. The console in chrome can be started by pressing (cmd+alt+j or ctrl+shift+j) and you can just type in the commands one by one. It is as simple as that.

Chrome JS Console

By default the console opens in the global namespace of the tab which is open. In plain english the console is like a virtual space where you create, destroy, modify objects and these objects gets displayed in the browser window (rendering) based on their properties. For example the tab has an object associated with it, the html document, the body and every element (image, text etc) has an object associated with it. This is called DOM – Document Object Model which is used to manipulate HTML elements with JavaScript. The point is this is a virtual space to contain objects and this is where we would be working. This space already has a lot of objects and one shouldn’t confuse them with our own. To check our understanding lets type ‘window’ in the console and press return. It returns some text, this is the object which denotes the chrome window. When you click the triangle in the left side (expand it) you can see all the contents of the object.

Screen Shot 2014-12-03 at 16.01.29

Now typing window.innerHeight or window.innerWidth will give you the height and width of the window. You can resize the window and type it again to see it changing. Here the important things are, 1) We can create objects in this virtual space with properties to represent stuff we see (a window in this case) 2) the “.” denotes the property of an object (i’ll elaborate this later).

Now it is time for us to create our own object. Lets say this object is a digital representation of yourself. Lets start by creating the object by typing me = {}. This literally translates to english as ‘me’ is an empty object. The curly brackets denote that ‘me’ is an object and it is empty since it has nothing in it. Now typing ‘me’ again and pressing enter will return an empty object. Now lets go one step ahead and ask my name. Since it is a property, you use the “.”. So type “me.name” and see the result. It says it is “undefined” because we have created the object but never defined any properties.  Lets set one by typing me.name = ‘bala’ (note the single quotes. i’ll explain later) . Now asking me.name will return the name “bala” and asking for the whole object ‘me’ will now give the properties as well.

Screen Shot 2014-12-03 at 16.16.35

If you have come this far then you understand object, its properties, how to create and return them. Now it is time to understand some data structures (this sounds way too complex than necessary). For practical purposes we just have to know 4 major data structures – Number, String (of characters), Array (of elements) and Boolean. Most of the data we use will fall under these data structure. Numbers are stored as is without any special way to mark them. for example, Typing in me.age = 25 will set the property ‘age’ in me as a number which is 25. Strings are denoted by double/single quotes around them if the quotes are not present then it is considered as a name of an object (variable) and will return an error if it cannot find one. for example, me.city = london will return an error while me.city = ‘london’ will be OK. Arrays are collection of data in a specific order and is denoted by square brackets. for example, me.address = [221,’b’,’Baker Street’] will set the address as an array of 3 elements. typing me.address will return and array, while typing me.address[0] will return 221. the square brackets with a number is the way of accessing an element within an array when you know its place (the counting starts at 0). Try to get the street name out of the ‘me’ object.

Screen Shot 2014-12-03 at 16.48.55

A string is like a special case of array (with only single characters) so you can do the array type of queries to strings as well. for example, me.city[0] will return ‘l’. Boolean is either ‘true’ or ‘false’ (without quotes). for example, you can do me.graduate=true. This helps where you can directly use this instead checking for conditions (like if else statement). With this 4 basic data type you can create a model for almost every kind of objects we usually encounter. To sum up, we create objects with curly brackets, object is something which has properties, properties can be of various types of data (number,string, array, boolean), we set and access properties of objects by using the ‘.’, arrays are created by using square brackets and we set and access contents of array by using square brackets.

There is one more thing we need to know which is that there is a simpler way to create objects than setting properties one by one. which brings us closer to the JSON. instead of using the ‘.’ we can directly write the contents of the object using ‘,’ and ‘:’. So combining all the steps above in creating the me object, we just do,

Screen Shot 2014-12-04 at 18.23.37

With these basics we can now move to interesting stuff – Nesting and References.


When we talked about data structures though we talked about 4, we actually learned 5. The fifth one is objects. This means that a property of an object can be an object. This introduces amazing capabilities to javascript objects. for example, me.education = {} will create an empty object for education. and me.education.school = ‘kvp’ will set the property for me.education. This process can be theoretically repeated forever (if the memory permits).

Screen Shot 2014-12-04 at 17.56.38

This has two major significance. 1) While modelling sparse data, this makes our object memory efficient (this is the reason it is used in data transfer). for example, a table which shows scores of 5 students in 10 different courses and every student attends 3 courses can be modelled as a javascript object as shown below,

Screen Shot 2014-12-04 at 18.13.21

Screen Shot 2014-12-04 at 18.12.52

You essentially don’t need to have null/empty spaces. If something is null then it is just not there. 2) you can model infinitely rich objects by nesting different types of objects together. e.g. in the above example if some course has two markers and two scores you just introduce and object instead of the number, no need to change your structure of the data ( schema ).

Screen Shot 2014-12-04 at 18.28.44


This is another way javascript optimises memory usage. Every object’s association with its name is just a reference. for example create an variable a = 20, create another variable using this variable b = a , check both variables. now do b = 40 and check the value of a. The results are shown below. This is logical and is what you expect to happen.

Screen Shot 2014-12-04 at 18.34.36

Now try and do the same with objects, a = { name:’bala’, age:25 } ; b = a ; b.age = 27. Now check the contents of a!

Screen Shot 2014-12-04 at 18.42.44

This behaviour is because JavaScript does not duplicates objects in memory when assigned to variables but just references the object to the variable name. This allows us to create infinitely rich data structures with finite memory. Funny example is below,

Screen Shot 2014-12-04 at 18.43.53

That concludes our overall introduction to understanding objects in javascript. With this foundation, let us move to JSON.


To be brief, JSON is just a notation to create javascript objects. It is exactly the same as what we discussed above. The only thing which is new is the property names should be enclosed in quotes (since JSON has no concept of variables). You evaluate a JSON string and it gives you a JavaScript object thats it. Nothing more. While thinking about JSON, as an alternative you can think of it as a collection of key value pairs (this kind of thinking helps while working with php). Keys are always strings and values can be anything (even another collection of key value pairs).

Screen Shot 2014-12-04 at 18.51.11

The difference between JavaScript objects and JSON is that JSON is a notation i.e. it is a text file written outside javascript and cannot be executed. It is a normal data / text file like csv. You can essentially read a csv as a string, split it into parts using the newline and comma characters and create an object/array in javascript but with JSON it is much simpler (just do eval) and more powerful (nesting).

Thats it. Hope this helps absolute beginners as myself in understanding these concepts faster.

P.S. If you have come this far, congratulations! You now know how data is modelled and stored in mongodb! All you have to do to use mongodb is learn commands.