Running Windows inside Arch Linux with VirtualBox

Even though I have moved over to Linux completely for quite sometime, every now and then I’ll encounter situations in which I really have to use windows. Last time I had to do it is because of a form which was a Word document set up in a way that I had to use ms word to fill it in. Initially I planned to never go back to windows and in such situations I’ll just borrow a windows computer just for that purpose. Then I realised it is better to have a windows installation loaded with commonly used software ready to go whenever I needed it rather than depending on someone else. So I installed windows on my desktop using VirtualBox. The only thing which needs to be sourced is the windows installation disk (.iso) which either someone can loan you a copy or buy one. I used my university’s license on this one.

The steps are straightforward with Arch. Install virtualbox, virtualbox-host-modules-arch, virtualbox-ext-oracle (this one is from AUR). Open virtual box and create your Virtual machine following the step by step GUI and start the machine. Thats it. We have a working windows installation. First thing we need to do in the guest system is install “guest additions” which can be inserted as a disc from Devices>Insert Guest Additions CD Image. The way to make the virtual machine look seamless with host OS is by setting the same wallpaper , set the guest to auto-resize to host window, hide the menu bar and status bar.

My configuration for i3 is available at


Tunnelling internet through ssh server in MacOSX

This is a neat trick I use to tunnel my internet traffic on my mac book through a ssh server. It involves setting up a socks proxy and connecting that to a ssh connection. It involves two steps. Which you can make aliases in your .bashrc (.zshrc) file and use them from terminal.

alias mac_sst_start='ssh -D 8080 -f -q -C -N usename@serveraddress'
alias mac_proxy_on="sudo networksetup -setsocksfirewallproxy Wi-Fi localhost 8080"
alias mac_proxy_off="sudo networksetup -setsocksfirewallproxystate Wi-Fi off"

The first command mac_sst_start  starts a ssh server at the port 8080 and forwards all the internet traffic presented to it through the ssh server. When you run this, there will be a prompt for password which is the ssh account password in the server.

The second command mac_proxy_on changes the WiFi preference on the MacBook to use this port 8080 as a socks proxy and forward all the traffic to this proxy. This will also ask for password but this is the local MacBook password. Once these two are run, the internet is tunnelled through the server so if you check your ip, it will show up as the host’s ip. The third one is to switch off the proxy when you want to return to the normal internet connection.

I use this with my university servers which gives me access to my university resource from all over the world. I can access library, journal articles, servers in the university etc etc as if I am connected to my university network (just like a vpn).



Books compiled from StackOverflow documentation!

This is an absolutely brilliant collection of books made from Q&A of stack overflow. The experiment has been shutdown but the outcome is a collection of books available here. I quickly looked through the algorithms and R books and they seem really good!

The books are available at

the stackoverflow archive to download  is available at

and the discussion on reddit about this is at

Screen Shot 2018-01-19 at 22.24.56

Sending mail from command line using mutt

Sometimes you just don’t have the patience to open a GUI. Imagine you are working on a terminal remotely through a very feeble internet connection and after hours of data wrangling you got your results in one small package. Now all you want is to email this 200kb document (average size of a 20k word .txt document). You can either load a GUI, open a browser, open gmail (the login page itself is 2MB), attach the file and send the email or just execute a one line command which does everything for you. With some minimal setup you can do the latter – sending email via CLI just like any other shell command. You can even include this in your scripts (send mail when the script finishes running etc).

We will do this using a terminal program called “mutt” which also has a brilliant CLI interface and will configure it to use gmail via imap. First step is to install mutt using a package manager, (apt/ yum/ pacman for linux and brew for macosx). I am doing this in Arch with pacman. I am installing mutt and smtp-forwarder and then create necessary folders and files for mutt.

sudo paman -S mutt smtp-forwarder
mkdir -p ~/.mutt/cache/headers
mkdir ~/.mutt/cache/bodies
touch ~/.mutt/certificates
touch ~/.mutt/muttrc

Edit the muttrc file with your favourite text editor and add these configurations, (make sure to change the username to your username and if your are using two factor authentication with gmail the password has to be generated from App passwords.

set ssl_starttls=yes
set ssl_force_tls=yes
set imap_user = ''
set imap_pass = 'yourpassword'
set from= ''
set realname='yourname'
set folder = imaps://
set spoolfile = imaps://
set postponed="imaps://[Gmail]/Drafts"
set header_cache = "~/.mutt/cache/headers"
set message_cachedir = "~/.mutt/cache/bodies"
set certificate_file = "~/.mutt/certificates"
set smtp_url = 'smtps://'
set imap_pass = 'yourpassword'
set move = no
set imap_keepalive = 900
set editor = vim
bind pager j next-line
bind pager k previous-line
set sort = threads
set sort_aux = reverse-date-sent
unset imap_passive
set imap_check_subscribed
set mail_check=60
set timeout=10

That is it! Now we can send mail from terminal by just passing some text or a file with the text to the mutt command,

echo "email body"  | mutt -s "email-subject" --
mutt -s "email-subject" -- <  file_with_body_text.txt

we can even attach files like this,

echo "please find attached"  | mutt -s "email-subject" -a "attachment.pdf" --


Running a simple static HTTP server

I have been really busy the past 3 days so there were no posts. So there is going to be 3 unrelated small posts on small utilities I use. First thing is a http-server. Since web browsers are locked down these days, it is not easy to read files off local machine when you are testing even a simple website. For example, If I have a html file where I want to load a csv, parse it and display it, serving the html from a http server is the only way to allow a chrome/ Firefox to read the file. At the same time, I really don’t want to install a full  Apache wen server to serve two html files.

The solution to this is a node package – ‘http-server’. It is a tiny http server which when run from a folder in CLI, serves the folder contents as a http-host at localhost. All we need to do is,

# Install nodejs and node package manager (npm)
sudo pacman -S node npm
# Install http-server package through npm globally
npm install -g http-server
# start the server

That is it! whichever folder you ran http-server would be accessible at the ip/port shown. we can combine this with forever (another node package) or run under a gnu-screen session to keep it in the background.

Extracting multiple linear sequences out of two dimensional data

This is current problem I am working on. I don’t know how to explain it properly but I’ll try. There is single linear time series of data collected in which there are multiple increasing sequences are present. For example consider this series of numbers (y) collected at time (x) generated with three linear equations.

x = 2, 2, 3, 4, 4, 5, 5, 6, 6, 7, 8, 8, 8, 9, 9, 10, 10, 11, 11, 11, 12, 12, 13, 14, 14, 15, 16, 18, 18, 19

y = 14, 12, 37, 22, 14, 26, 15, 73, 30, 34, 97, 38, 18, 109, 19, 121, 20, 133, 50, 21, 145, 54, 23, 169, 62, 181, 26, 217, 78, 29

When we plot this data on a chart we can see that there are three sequences in them.


The problem is to isolate these three clusters. Since I have no idea how to do this, I was first going for a k-means clustering algorithm with 3 clusters. which gave me this,


This is clearly wrong since we have a series which is forward moving both on x axis and y axis so we cannot have the blue cluster possibly occur linearly. This is when I though might be a graph based clustering algorithm might help. I can put all my rules in making the graph where only linearly possible clusters are connected and then just partition the graph. If it is too dense then I might be able to run some community detection algorithm to get the clusters out of it.

As an initial experiment, I made a graph between all these points (nodes) where distance is the euclidean distance between them. Then I applied the rules where for two nodes a, b (points) a link can exist from a to b only if

  1. b(x) > a(x),
  2. b(y) > a(y) and
  3. b(x) – a(x) is not more than 5

The resulting graph looks like this,


This seems good progress since I seem to have 2 connected components (ignoring the lone node) where one of them is a clear linear sequence. Then when I ran a random walk on the graph, I get three clusters,


we seem to be able to cluster linear sequences out of the data, where except for when these linear sequences are really close. This looks very promising for the stuff I am working on! Will see how this works with real data and post the update.

ps. I would really like to know if there is already a method which can extract multiple linear sequences out of a data similar to what I am trying here. Please mention in the comments if you think anything is relevant.

Importance of transitions in interactive visualisations.

I am a very big fan of transitions when presenting complex data with interactive visualisations. I believe that transitions play an important role in building a continuous narrative in audience’s mind when they are trying to understand connected series of information. To be clear, I am talking specifically about transition of data between the stages of visualisation and not the transition of the canvas of the visualisation. Today lets look at one of my favourite visualisations of all time – this brilliant interactive one by NYtimes, showcasing the budget proposal made by US government in 2012

The story here starts with showing whole budget along with the scale of 900bn budget deficit proposed, then we move to see that the most of the spending is mandatory over which the govt. has no control and the smaller discretionary spending is the one which we are having the discussion on, then we move to see these discretionary spending categories compared with each other to the gainers and losers from last year and finally we see them put together along with the mandatory spending to complete the picture.

The amazing thing about this graphic is the way the circles (departments / budget categories), which are the basic unit of the analysis, move between the stages of the story – specially from step 2 to step 3. Not only it reinforces the concept of mandatory spending it also prepares us to what we are going to be see in the next stage. This half second transition fundamentally changes the clarity and legibility of the whole story for a casual audience.

I am currently working on a medium sized dataset trying to filter it with various rules to produce a final number and this kind of visualisation seems perfect to show the changes in the results as I move to different stages of filtering. I will be posting on the progress and stay tuned for the updates.