Aryan Singh's World

Data Science, Statistics, ML, Deep Learning

What do NaMo’s speeches convey?

This weekend while wandering around the labyrinths of internet, I stumbled upon the corpus of Indian prime minister Mr. Narendra Modi’s speeches. I thought it would be interesting to analyse the speeches to see what are the main issues he speaks about and what is the overall connotation of the speeches. In this blog, I present my analysis of the speeches along with the visualizations in the form of graphs and plots.

Unigram and Bigram Frequency

I used count vectoriser from sklearn feature extraction to vectorise the text into frequency vectors and then summed it over the rows to find the frequency of each word in the overall corpus. Later I plotted the top 30 words by frequency on a barplot to analyse them. Following is the result I got:

wordfreq
Since mann ki baat is a program aimed at listening to and addressing problems of people PMs main focus is on issues relating to poverty and water. Also he talks about taking actions by using phrases like time, make, great.

Most Frequent Nouns, Adjective and Verbs

Next up I thought it would be interesting to do POS tagging of each speech to see what are the major issues PM lays stress upon and how positive/willing he is to solve them. For this I pos tagged the whole corpus using nltk and then found out the most common nouns, verbs and adjectives out of it. I plotted 16 most common nouns, adjectives and verbs in the form a word cloud to visualise and draw inferences. Here is what I got:

Nouns Cloud:

nouncloud

It is clear from the word cloud that the main issues that are being highlighted are related to basic amenities like water, villages, farmers. Interestingly enough, yoga is repeatedly a frequent part of the conversation. PM has also addressed issues about black money but the frequency is on the lower side.

Verbs Cloud:

verb

The verbs mostly have a positive connotation. Words like think, make, started and done indicate the action oriented approach.

Adjectives Cloud:

adjectives

Adjectives do reveal a basic essence of the major fields/issues that PM is targetting. Youth, poor and digital India initiatives are some of the most frequent areas touched upon.

Sentiment Analysis Of The Speeches

Next up I analysed each speech for it’s sentiment score to understand whether the connotation is positive/negative and how it has changed overtime. I use TextBlob library to sentiment score each speech. Here is how the time series sentiment score looks like:

sentiment

Looking at the overall analysis, the speeches don’t seem to be that positive. This might be because they are aimed at addressing the issues people face on daily basis. Overall years 2015 and 2016 are more positive as compared to the other years.

 

Code: https://github.com/aryancodify/NaturalLanguageProcessing/blob/master/modi_speeches.ipynb

Dataset: https://www.kaggle.com/shankarpandala/mann-ki-baat-speech-corpus

Training Machine Learning Models in Cloud: FloydHub – Part 2

In the last tutorial we saw how to train a Deep Learning model on Floydhub GPU via Jupyter notebooks. In this week’s blog I am going to demonstrate how to run a local python script on the Floydhub GPU.

Till step 4 the process remains same as the previous post where we create a Floydhub project and initialize the same locally. Our directory contains the following files:

train.py
train_and_eval.py
eval.py
floyd.yml

Once initialized we need to issue the following commands to run the script remotely on floydhub GPUs:

floyd run --gpu --env tensorflow-1.3 "python train_and_eval.py"

Here train_and_eval.py  is the file that trains and runs a Feed Forward net on the dataset. Here is what this command does internally:

  • Syncs the local code to FloydHub’s servers.
  • Provisions a GPU instance on the cloud with TensorFlow 1.3 installed.
  • Executes the command python train_and_eval.py on the GPU server.
  • Stores the output logs and generated output data.
  • Terminates the GPU instance once the command finishes execution

Here is the project state after the execution of the command:

 

floyd_project

Following is the output page with metrics:

 

train1

Following is the command line output:

 

train2

With that we come to end of this tutorial series. In next series I explore Google Colab as a way to procure GPU for training Deep Learning models. Till Then:

Happy Deep Learning !!

Training Machine Learning Models In Cloud: FloydHub

Training machine learning models can be a time-consuming task and can take several hours to days to train a model especially if the model is a dense deep neural network. Recently I faced a similar issue while trying to run a feedforward net with GridSearch for a classification problem. Since my laptop lacks a GPU it took over 8 hours to train the model and considerable consumption of memory and CPU rendered my laptop useless for the time duration.

This experience along with several other experiences motivated me to look for an alternative way to train models that can be done remotely without compromising time and resources of my laptop. After doing some research I stumbled upon Floydhub a cloud-based solution for training our Machine Learning Models.

Following are the advantages of Floydhub:

  • Easy to use and intuitive UI
  • Great documentation
  • Interactive environment through Jupyter notebook on cloud
  • Seemless integration with python
  • Pre setup GPU along with libraries like Keras and Tensorflow pre-installed

Here I will display a binary classification problem of customer churning/not churning from a bank on the basis of the customer characteristics on the floydhub platform. We will be training afeed-forwardd neural net with back propagation and stochastic gradient descent using the Keras library to achieve this task. I will be displaying two methods of doing this. One with Jupyter notebook and another using remote execution of a python script. The dataset for this exercise can be downloaded from here:

https://www.kaggle.com/hj5992/bank-churn-modelling

Let’s go over both the methods one by one.

Part 1 : Training ANN using Jupyter Notebook in Floydhub

Preconditions:

Account on Floydhub.

Step 1: Install the floyd-cli for python.

First step is to install floyd-cli for python. This can be achieved by issuing the following command:

pip install -U floyd-cli

Step 2: Login into the Floydhub account

Once we have the floyd-cli installed we need to login to floydhub account from the command line to enable our floyd interface to run our job. This can be achieved as follows:

floyd login
Login with your FloydHub username and password to run jobs.
Username [aryan]: aryan
Password: 
Login Successful as aryan

Step 3: Create a Floydhub project on cloud

Now we need to create a Floydhub project on the Floydhub portal. We can do so by hovering on the + sign on the top right corner.

cretae_floyd_project

Once we click on create project we need to add details for the project:

create_project2

Now the project is ready and we can see the following project page:

project_page

Step 4: Initialize the Floydhub project locally

As highlighted in the previous image we need to issue the following command in the command prompt to initialize the project locally:

floyd init aryancodify/bank-churn

Let’s issue the command and look at the output. Before issuing the command we can go to the working directory where we want this project to be initialized:

floydInit

Step 5: Sync the Jupyter notebook and datasets from local to floydhub and running it on cloud

Once the project is initialized locally we can sync our jupyter notebooks and dataset to the floydhub project on cloud to run the notebook interactively using the gpu. We need to issue the following command for this:

floyd run --gpu --mode jupyter

Once we run the command we will see the following output:

gpuInit

We had the following files which will be synced to cloud:

files

This can take some time as the floydhub runtime executes following task in the backend:

  • Sync your local code to FloydHub’s server
  • Provision a GPU instance on the cloud (if you want CPU, drop the --gpu flag)
  • Set up an deep learning environment with Tensorflow and keras installed
  • Start a Jupyter server on the cloud, and open the url in your browser.

Once this command is executed a new window will open up in the browser with the view of the Jupyter notebook:

jupyter_floyd

Here we can see our notebook that we synced from local file system to floydhub. Then we can open the Jupyter notebook and start interacting with it.

Step 6: Looking at the Job status and metrics:

We can check the status of our job by going into the jobs page of the web dashboard. This will list all our jobs:

job_status

We can click on a particular project to check on its jobs.

For eg here clicking on the bank-churn project we see the following dashboard:

metrics

Here we can see that our GPU utilisation was almost 95.2% i.e. 11 gb.

Looking at the notebook:

We use the Keras library to build a feedforward net in order to perform our binary classification the project and notebook can be accessed here:

https://www.floydhub.com/aryancodify/projects/bank-churn/1/files/ChurnRateClassification.ipynb

runJupyter

We can clearly see that we trained the model for 100 epochs and it took roughly 1-2 mins to train the model.

Since building the network is out of scope for this post will not be discussing the details here, though appropriate Markdowns and comments have been added in the notebook for a better understanding.

After training the model we get an accuracy of around 86% on the test dataset which is pretty good.

To check whether Keras is running with the GPU we use the following commands in the Jupyter notebook:

from keras import backend as K
K.tensorflow_backend._get_available_gpus()

We will look at achieving the same using the remote python script in part 2 of this tutorial.

Code Review: Thankless but kindly don’t think less

Finally, we are in an era which acknowledges some of our anti-heros like Deadpool and Batman along with our all powerful, infallible Superman. Its time, that we also acknowledge the dark knight behind the optimized code delivered by any team: the code reviewer. Code review is one such Dev function that has always taken the backseat while coding takes the centre stage.

Initially, when I was fresh out of college, I used to find the concept of code review a hinderance to productivity. To be honest, I used to love my code and everytime somebody pointed out the chinks in my code, I would do whatever I could to avoid doing that change.

Well, when mentors can’t teach you something, Karma finds its own peculiar ways to make you realize your faults. I still remember pulling off an all nighter at office to hot fix an out of memory error, just because I had ignored some of the review comments to prevent heap space overflow. It was with time and experience, that I realized the fatality of taking the code reviews personally.

Now that we have established, the importance of a good review, I want to share a couple of things that I have found lacking in the coding teams simce I have started to do frequent code reviews.

  • First, is the tendency of a reviewer to be light on certain code reviews due to their own work pressures or constrained timelines. I have always advised my team to play devil’s advocate​ when it comes to code reviews. This practice has not only helped us to optimize our code by laying a strict emphasis on TDD and minimal run time complexity but also minimized the known nemesis like null pointer exceptions.
  • Second, is the tendency of a developer to dole over their own code and love each and every line of it. This kind of attachement to ones code is the beginning of what could be a really bad habit for a developer’s career. While, its important to love your work and write code which reads like poetry and achieves its said objectives, the thing which is equally important is to develop a sense of detachment towards ones own code when its ready for review. Its extremely pertinent to base the counterpoints to the suggested changes on well read logic rather than baseless speculations.

    While these two are the standout observations in our code reviews, another thing which is really important is, to think ahead while doing the review. Reviewers need to make sure that the code does not only meet the said objectives but does so in a way which is optimised and poses minimal collateral damage.

    I hope you will ponder over these things when you review your next code or submit your next pull request for review. 

    And pardon me if I was overtly philosophical !!