Sending mail from command line using mutt

Sometimes you just don’t have the patience to open a GUI. Imagine you are working on a terminal remotely through a very feeble internet connection and after hours of data wrangling you got your results in one small package. Now all you want is to email this 200kb document (average size of a 20k word .txt document). You can either load a GUI, open a browser, open gmail (the login page itself is 2MB), attach the file and send the email or just execute a one line command which does everything for you. With some minimal setup you can do the latter – sending email via CLI just like any other shell command. You can even include this in your scripts (send mail when the script finishes running etc).

We will do this using a terminal program called “mutt” which also has a brilliant CLI interface and will configure it to use gmail via imap. First step is to install mutt using a package manager, (apt/ yum/ pacman for linux and brew for macosx). I am doing this in Arch with pacman. I am installing mutt and smtp-forwarder and then create necessary folders and files for mutt.

sudo paman -S mutt smtp-forwarder
mkdir -p ~/.mutt/cache/headers
mkdir ~/.mutt/cache/bodies
touch ~/.mutt/certificates
touch ~/.mutt/muttrc

Edit the muttrc file with your favourite text editor and add these configurations, (make sure to change the username to your username and if your are using two factor authentication with gmail the password has to be generated from App passwords.

set ssl_starttls=yes
set ssl_force_tls=yes
set imap_user = 'username@gmail.com'
set imap_pass = 'yourpassword'
set from= 'username@gmail.com'
set realname='yourname'
set folder = imaps://imap.gmail.com/
set spoolfile = imaps://imap.gmail.com/INBOX
set postponed="imaps://imap.gmail.com/[Gmail]/Drafts"
set header_cache = "~/.mutt/cache/headers"
set message_cachedir = "~/.mutt/cache/bodies"
set certificate_file = "~/.mutt/certificates"
set smtp_url = 'smtps://username@smtp.gmail.com:465/'
set imap_pass = 'yourpassword'
set move = no
set imap_keepalive = 900
set editor = vim
bind pager j next-line
bind pager k previous-line
set sort = threads
set sort_aux = reverse-date-sent
unset imap_passive
set imap_check_subscribed
set mail_check=60
set timeout=10

That is it! Now we can send mail from terminal by just passing some text or a file with the text to the mutt command,

echo "email body"  | mutt -s "email-subject" -- recipient@gmail.com
mutt -s "email-subject" -- recipient@gmail.com <  file_with_body_text.txt

we can even attach files like this,

echo "please find attached"  | mutt -s "email-subject" -a "attachment.pdf" -- recipient@gmail.com

 

Advertisements

Running a simple static HTTP server

I have been really busy the past 3 days so there were no posts. So there is going to be 3 unrelated small posts on small utilities I use. First thing is a http-server. Since web browsers are locked down these days, it is not easy to read files off local machine when you are testing even a simple website. For example, If I have a html file where I want to load a csv, parse it and display it, serving the html from a http server is the only way to allow a chrome/ Firefox to read the file. At the same time, I really don’t want to install a full  Apache wen server to serve two html files.

The solution to this is a node package – ‘http-server’. It is a tiny http server which when run from a folder in CLI, serves the folder contents as a http-host at localhost. All we need to do is,

# Install nodejs and node package manager (npm)
sudo pacman -S node npm
# Install http-server package through npm globally
npm install -g http-server
# start the server
http-server

That is it! whichever folder you ran http-server would be accessible at the ip/port shown. we can combine this with forever (another node package) or run under a gnu-screen session to keep it in the background.

Extracting multiple linear sequences out of two dimensional data

This is current problem I am working on. I don’t know how to explain it properly but I’ll try. There is single linear time series of data collected in which there are multiple increasing sequences are present. For example consider this series of numbers (y) collected at time (x) generated with three linear equations.

x = 2, 2, 3, 4, 4, 5, 5, 6, 6, 7, 8, 8, 8, 9, 9, 10, 10, 11, 11, 11, 12, 12, 13, 14, 14, 15, 16, 18, 18, 19

y = 14, 12, 37, 22, 14, 26, 15, 73, 30, 34, 97, 38, 18, 109, 19, 121, 20, 133, 50, 21, 145, 54, 23, 169, 62, 181, 26, 217, 78, 29

When we plot this data on a chart we can see that there are three sequences in them.

2018-01-15-223825

The problem is to isolate these three clusters. Since I have no idea how to do this, I was first going for a k-means clustering algorithm with 3 clusters. which gave me this,

2018-01-15-223833

This is clearly wrong since we have a series which is forward moving both on x axis and y axis so we cannot have the blue cluster possibly occur linearly. This is when I though might be a graph based clustering algorithm might help. I can put all my rules in making the graph where only linearly possible clusters are connected and then just partition the graph. If it is too dense then I might be able to run some community detection algorithm to get the clusters out of it.

As an initial experiment, I made a graph between all these points (nodes) where distance is the euclidean distance between them. Then I applied the rules where for two nodes a, b (points) a link can exist from a to b only if

  1. b(x) > a(x),
  2. b(y) > a(y) and
  3. b(x) – a(x) is not more than 5

The resulting graph looks like this,

2018-01-15-224153

This seems good progress since I seem to have 2 connected components (ignoring the lone node) where one of them is a clear linear sequence. Then when I ran a random walk on the graph, I get three clusters,

2018-01-15-224142

we seem to be able to cluster linear sequences out of the data, where except for when these linear sequences are really close. This looks very promising for the stuff I am working on! Will see how this works with real data and post the update.

ps. I would really like to know if there is already a method which can extract multiple linear sequences out of a data similar to what I am trying here. Please mention in the comments if you think anything is relevant.

Importance of transitions in interactive visualisations.

I am a very big fan of transitions when presenting complex data with interactive visualisations. I believe that transitions play an important role in building a continuous narrative in audience’s mind when they are trying to understand connected series of information. To be clear, I am talking specifically about transition of data between the stages of visualisation and not the transition of the canvas of the visualisation. Today lets look at one of my favourite visualisations of all time – this brilliant interactive one by NYtimes, showcasing the budget proposal made by US government in 2012

The story here starts with showing whole budget along with the scale of 900bn budget deficit proposed, then we move to see that the most of the spending is mandatory over which the govt. has no control and the smaller discretionary spending is the one which we are having the discussion on, then we move to see these discretionary spending categories compared with each other to the gainers and losers from last year and finally we see them put together along with the mandatory spending to complete the picture.

The amazing thing about this graphic is the way the circles (departments / budget categories), which are the basic unit of the analysis, move between the stages of the story – specially from step 2 to step 3. Not only it reinforces the concept of mandatory spending it also prepares us to what we are going to be see in the next stage. This half second transition fundamentally changes the clarity and legibility of the whole story for a casual audience.

I am currently working on a medium sized dataset trying to filter it with various rules to produce a final number and this kind of visualisation seems perfect to show the changes in the results as I move to different stages of filtering. I will be posting on the progress and stay tuned for the updates.

 

Installing Arch Linux (and R) on an Android device

This is a really recent development and I am very excited about this. I finally found a way to have a somewhat proper Linux installation on my phone. Though it might not be the best place to have a CLI, it is really promising and I can rely on this to do some small stuff on the go. As the tools I use are getting simpler (Photoshop vs Imagemagick) and the hardware of the phones I own are getting better, it is should possible for my phone to do the things my 5 year old laptop could handle provided the right environment.

This is done by installing a full Arch installation on an Android phone under the termux environment using the installer from TermuxArch. The installation here is actually way easier than installing Arch on a normal desktop. We start by installing termux android app. When we open termux we get a bash shell. From here we install wget by running, pkg install wget . When this is complete we download and run the Arch installation script by,

# Download the script
wget https://raw.githubusercontent.com/sdrausty/TermuxArch/master/setupTermuxArch.sh 
# Adding execute permissions
chmod a+x setupTermuxArch.sh
# Run the script
./setupTermuxArch.sh

Now we can just follow the instructions in the script which will download and unpack a base Arch Linux system and ask you to edit the mirror list. At this point, just  un-comment (remove the #) of the closest mirrors and save and exit the file. When the installation is complete you have a vanilla arch system on your mobile! Now we can theoretically install and use any program I have on my work desktop on my phone which including to ssh, vim, git, latex, R, node, postgres, mongodb, etc etc. I can even ssh into my work desktop straight from here. Below are some screenshots of the system (the chart is done entirely on phone!).

 

Mapping distribution of National Institutions in Higher Education in India [R + tidyverse + tmap]

Since I started learning R and moved away from proprietary data analysis and GIS packages, I have been amazed by the availability of free data-sets and tools enabling people to make awesome analysis and visualisations. Today we’ll look into a simple exercise of taking open data from different sources and combining them using opensource and free tools to produce maps which help us understand the data better.

We’ll use data from two sources, a tabular file on distribution of national institutes of higher education in India from here and shape file containing geographic information on boundaries of Indian states from here. I have cleaned and prepared both data which can be downloaded here. First we need to download the archive and extract/ unzip it to our working directory. Once that is done we can start combining and plotting the data.

# Loading the data
data <- read.csv("institutions.csv")
library(rgdal) # library for reading shape files
states <- readOGR(".", "india_state")

Note that, in readOGR, the first parameter is the folder at which the shape file is kept (it should be “.” if we directly unzipped the files to working directory) and second one is the name of the shape file.

# calculating the total number institutions and 
# the number of institutions per 10 million people
data <-  data %>%
mutate( Total = rowSums(.[,3:10]) )%>%
mutate( Totalppm = Total/Population)

# merging the data into the shapefile
states <- states %>%
merge(data,by="state")

Now we have succesfully merged the data into the shape file we can see this by asking states@data at the console. Now we need to map the data using tmap. First of all we load the library by running library(tmap). This is complex package and has a lot of dependencies so it might take a while to install and download. As a first step we plot just the borders of the state by running,

tm_shape(states) +
    tm_borders(col = "grey")

1

We can add labels to the map by adding a tm_text layer to it. Notice that the order of the layer is important since things overlap on each other.

tm_shape(states) +
    tm_borders(col = "grey") +
    tm_text("state", root = 10, size = "AREA")

2

Now we can plot a single variable on the map as the fill colour for the states. For example If we want to highlight all the states with an IIT, we do,

tm_shape(states) +
    tm_fill("IIT") +
    tm_borders(col = "grey") +
    tm_text("state", root = 10, size = "AREA")

3

We can plot multiple variables on the same plot side by side by just passing a vector of variables to compare. Notice that we switched of legends on one of the layer here by setting legend.size.show=FALSE.

tm_shape(states) +
    tm_fill(c("IIT", "IIM")) +
    tm_borders(col = "grey") +
    tm_text("state", root = 10, size = "AREA",
            legend.size.show = FALSE)

4

Finally we can plot the number of institutions per population by,

tm_shape(states) +
    tm_fill("Totalppm",
            title = "Institutions per 10m people") +
    tm_borders(col = 'grey') +
    tm_text("state", root = 10, size = "AREA",
            legend.size.show = FALSE)

5

This map is very uninformative because of the small Himalayan states with low populations skewing the whole distribution. So we  have to classify this data ourselves by giving a custom breaks parameter,

tm_shape(states) +
    tm_fill("Totalppm", 
            breaks = (1:10) / 2,
            title = "Institutions per 10m people") +
    tm_borders(col = 'grey') +
    tm_text("state", root = 10, size = "AREA",
            legend.size.show = FALSE)

6

Now this shows a possible north south divide in the distribution of institutions per person. This may be because of most of the national institutions in North being located in Delhi, while in the South Bangalore, Mumbai and Chennai compete for them.

That completes the post for today. To summarise we took a tabular data, joined and plotted it with geographic data and uncovered new information which are not present in them individually!

Visualising flows as Sankey diagrams with R

This one is on making quick and easy Sankey diagrams with R (and networkD3 package)  for exploring data. All we need to do is to understand how to convert data into a network and rest is really easy. We’ll create a random sample data-set which shows the room at which people were at three instances – morning, afternoon and evening and go on to visualise how people flow from each room over time. We’ll use the tidyverse stuff which I mentioned in this and this post.

First we need to create a random set of data. we do this by generating 100 random names and assign them to 5 rooms randomly  for three instances.

# load required libraries
library(randomNames)
library(tidyverse)

# generate people names
people <- randomNames(100, which.names = 'first')
# generate a set pf rooms
rooms <- paste(rep("Room", 5), 1:5)
# populate data-set by combining both
morning <- sample(rooms, 100, replace=TRUE)
afternoon <- sample(rooms, 100, replace=TRUE)
evening <- sample(rooms, 100, replace=TRUE)
data <- data.frame( people, morning, afternoon, evening)

head(data) #gives us
  people   morning afternoon evening
1 Symone    Room 3  Room 3    Room 4
2 Adrian    Room 5  Room 1    Room 2
3 Orlando   Room 3  Room 4    Room 2
4 Cristal   Room 5  Room 4    Room 2
5 Emily     Room 4  Room 1    Room 4
6 Elizabeth Room 4  Room 2    Room 4

Now that we have the data, we will try to calculate how people move between rooms from morning to evening. We’ll create a network of rooms at a time period with number of people moving between them as links.

# first we calculate number of people moving 
# between morning to afternoon for each room
# we label the rooms uniquely for morning and
# afternoon by adding "m_" and "a_"
mor_to_aft <- data %>% 
    mutate(
          from = paste0("m_", morning),
          to = paste0("a_", afternoon)) %>% 
    group_by(from, to) %>% 
    summarise(people = length(people))

# we do the same for afternoon to evening
aft_to_eve <- data %>% 
    mutate(
          from = paste0("a_", afternoon),
          to = paste0("e_", evening)) %>% 
    group_by(from, to) %>% 
    summarise(people = length(people))

# and we combine both to create links data
links <- bind_rows(mor_to_aft, aft_to_eve)
links # gives us
      from       to   people
1 m_Room 1 a_Room 1      6
2 m_Room 1 a_Room 2      2
3 m_Room 1 a_Room 3      1
4 m_Room 1 a_Room 4      6
5 m_Room 1 a_Room 5      2
6 m_Room 2 a_Room 1      3

Now we need to make the nodes, we do that by finding all unique instances of rooms in the links and indexing them from 0 (this is because of d3 and the way javascript works).

nodes <- c(links$from, links$to) %>% 
    unique() %>% 
    data.frame(name = .) %>% 
    mutate(id = as.numeric(row(.)) - 1)

Now we have to join these indexes into the links so that the network package understands the relationship between these two objects.

links <- links %>%
    left_join(nodes,by=c("from"="name")) %>%
    left_join(nodes,by=c("to"="name")) %>%
    ungroup() %>%
    select(from=id.x,to=id.y,people)

That completes data preparation. Now we have a network of time_rooms which linked by people moving between them. This can be plotted by,

library(networkD3)
sankeyNetwork(links, nodes, "from", "to", "people", NodeID = "name")

which produces,

2018-01-11-210848

Here we can clearly see which rooms had the most people at a given time and where did those people come from and where did they go in the next session. We can use the same technique to produce amazing complex diagrams visualising complex interactions at multiple levels like these ones 1, 2, 3, 4.