Analyzing data in Python – Bar Charts

There are many good book on Data Analytics. Recently I borrowed a book from office library titled – “Even You Can Learn Statistics and Analytics: An Easy to Understand Guide to Statistics and Analytics” authored by David M. Levine and David F. Stephan. I feel this is a good one for beginners on Data Analytics.

There are also other good books like – ‘Python for Data Analysis’, ‘Python: Data Analytics and Visualization‘ , ‘Python for Finance’ etc.

One important aspect of presenting data is in Graph format (visual format – also known as Data Visualization).

A bar chart is useful for presenting categorical data. I has rectangular bars whose length is proportional to the categorical values we want to present.

E.g. We want to represent the Marks of a Student in several subjects:

Subject Score out 100
Maths 94
Physics 85
Chemistry 66
French 55
Computers 89
English 64


This is a vertical bar graph. This can be achieved by a small Python code:

import matplotlib.pyplot as plot
import numpy as np
subjects = ['Maths', 'Physics', 'Chemistry', 'French', 'Computers', 'English']
marks = [94,85,66,55,89,64]

m = np.arange(len(subjects)), marks)
plot.xticks(m, subjects)
plot.title('Marks obtained out of 100')

Matplotlib is a Python 2D plotting library and Numpy is the fundamental package for scientific computing with Python are two impotant packages in Python

Here subjects and score (marks) are represented in Python arrays. len(subjects) return the length of subjects – in this case 6.

numpy.arrange is used to arrange the subjects on the graph in order.  On x-axis, we have the subject names (xticks) and on y-axis, we have marks. is plotting the vertical bar chart.

We can also define other parameters for the graph such as fontsize, weight, rotation:, marks,color='indigo')
plot.xlabel('Subject',fontsize=15, fontweight='bold', color='blue')
plot.ylabel('Marks',fontsize=15, fontweight='bold', color='blue')
plot.xticks(m, subjects, fontsize=10, fontweight='bold', rotation=35, color='blue')
plot.title('Marks obtained out of 100',fontsize=15, fontweight='bold', color='blue')


The same can be represented as a horizontal bar graph. Instead of bar function, we use barh function.

import matplotlib.pyplot as plot
import numpy as np
subjects = ['Maths', 'Physics', 'Chemistry', 'French', 'Computers', 'English']
marks = [94,85,66,55,89,64]

m = np.arange(len(subjects))
plot.barh(m, marks,color='indigo')
plot.ylabel('Subject',fontsize=15, fontweight='bold', color='blue')
plot.xlabel('Marks',fontsize=15, fontweight='bold', color='blue')
plot.yticks(m, subjects, fontsize=10, fontweight='bold', rotation=35, color='blue')
plot.title('Marks obtained out of 100',fontsize=15, fontweight='bold', color='blue')


In order to add the data value on the graph, we need to

for i, value in enumerate(marks):
 plot.text(value, i, str(value), color='indigo', fontweight='bold')


I am using Spyder IDE (from Anaconda Navigator) in order to run this code. It has a handy feature, a variable explorer that shows the details of the variables used in code.

Screenshot from 2018-02-03 12-57-00.png


Setting up Python Environment-Anaconda Installation

For a newbie like me, it is difficult to keep upgrading Python and associated packages while resolving package dependencies. This is where Anaconda comes to my rescue. Anaconda is a free Python distribution and package manager. It comes with lot of pre-installed packages (primarily for data science).

It can be downloaded for Linux from the Continuum’s site . The instructions for installation on Linux are available on the same site. I have downloaded and installed 64 bit Python 3.6 version on my Linux Mint.

In order to update Anaconda and Python to latest version, you need to run the below command on the Terminal.

Screenshot from 2017-06-06 20-39-16

However, I continue to have older version of Python. You can see in below screenshot, Python 3.5.2 which I manually installed and Python 2.7.12 which was pre-installed on Linux Mint are still available.

Screenshot from 2017-06-06 20-47-40

conda update anaconda

Screenshot from 2017-06-06 20-50-45.png

On my Linux Mint, I have already updated to Anaconda version 4.4.0 (latest available as of date). This way, it is easy to keep upgrading Python and required packages.

On my PyCharm, I can choose Python 3.6 (installed through Anaconda / conda update) as the project interpreter.

Screenshot from 2017-06-06 20-57-17.png

Anaconda also comes with Anaconda Navigator – a GUI useful to launch Applications, manage packages, learning Python etc,

To add short cut to anaconda-navigator to desktop, the created the following script (desktop entry file in usr/share/applications folder

[Desktop Entry]
Comment=Scientific PYthon Development EnviRonment - Python3
Exec=bash -c 'export PATH="/home/srinivas/anaconda3/bin:$PATH" && /home/srinivas/anaconda3/bin/anaconda-navigator'

Screenshot from 2017-06-06 21-08-23

Spyder is an open source cross platform IDE for scientific programming in Python. Spyder integrates NumPy, SciPy, Matplotlib and IPython, as well as other open source software.

Screenshot from 2017-06-06 21-10-01

To conclude, Anaconda is a Python distribution with lot of useful features and learning opportunity in one place.



Setting up Python Environment-Installing Packages

In order to build useful applications, we need Python Libraries or Packages. Majority of such useful pckages can be downloaded from PyPI, the Python Package Index

Best way to install the packages is by using a tool called pip. We can get pip from However, on Linux Mint, pip is already installed along with Python 2.7.12. Similarly, when I installed Python 3.5.2, pip3 tool is installed. To upgrade to latest pip, you need to run below command on terminal

pip install -U pip

Now let us look at some of the useful packages for analyzing data.

Numpy ( Is useful for processing for numbers, strings, records, and objects.

pip install numpy

Pandas ( Python Data Analysis Library provides various data analysis tools for Python.

pip install pandas

Matplotlib ( Matplotlib is a Python 2D plotting library to produce publication quality graphs and figures.

pip install matplot

OpenPyXL ( Openpyxl is a Python library for reading and writing Excel 2010 xlsx/xlsm/xltx/xltm files.

pip install openpyxl

Once these packages are installed, they can be imported and used in your application. E.g.,

import numpy
import pandas
import openpyxl
import matplotlib

from pandas import DataFrame
from pandas import *



Setting up Python Environment-SQLite

For applications involving data storage and usage, we need a Database. SQLite is a simple yet very useful SQL database engine. It can be downloaded from the website –

There is a very nice description of when to consider using SQLite database and when to consider client server databases like MySQL and PostgreSQL here –

I installed SQLite on my Linux Mint using the below command on terminal:

sudo apt-get update
sudo apt-get install sqlite

In order to create database, tables, views etc, you may use a DB client for SQLite DB called SQLiteStudio from –

All you need is download, unpack and run the app.

As you can see from below screenshots, there are are many number of good features available in SQLite like, Constraints, Indexes, Triggers, Views etc.

Data can be inserted using the user interface

Screenshot from 2017-05-20 20-28-43Screenshot from 2017-05-20 20-31-17Screenshot from 2017-05-20 20-33-46

Screenshot from 2017-05-20 20-40-49.png

Setting up Python Environment-PyCharm

Python can be installed on Windows, Mac as well as Linux. Since I plan to learn programming on Linux, I was searching for IDE for Python on Linux.

For beginners, IPython, IDLE and pyCharm are good to get started.

IPython can be accessed from Linux terminal. Here, I printed Hello word! (Word as in WordPress 🙂

Screenshot from 2017-04-30 16-51-28

IDLE is Python’s Integrated Development and Learning Environment. It is coded in Python using tkinter GUI toolkit. It is simple but has quite a number of useful features. Please refer to for more documentation.

PyCharm is an IDE from Jetbrains ( It is easy to install on Linux Mint. There are 2 versions of the Pycharm IDE – Professional which is a licensed edition. There is also a free Community Edition which can be downloaded. I downloaded Community Edition.

Screenshot from 2017-04-30 16-20-28

Installation instruction are available on the download site ( All you need to do is unpack pycharm tar.gz and use from bin subdirectory.

After installation is completed, I added PyCharm Community Edition to my Dock (Plank) for easier access.

Once IDE is opened, from File –> Settings–> Project Interpreter, python interpreter can be selected.

I am learning Python 3.x and hence selected. Python 3.5.2.

Screenshot from 2017-04-30 17-17-21.png

All the additional packages (like numpy, pandas, matplotlib, pyQt5 etc) installed are displayed here.

Below is a screenshot of sample program(which I took from internet) executed in PyCharm IDE to display combinations numbers whose sum is adding up to a number.

Screenshot from 2017-04-30 17-16-01

Overall, PyCharm Community Edition is nice IDE with powerful features.

How I got bitten by Python programming

Many years ago, I used to be a Java programmer. In fact, I started my information technology career in the year 1999 as a software developer in a small company which focussed on application software development. During my engineering days, I learnt Fortran and C programming. When I completed my engineering, there was Y2K problem ( which helped many job aspirants to jump into IT industry irrespective of their educational background.

During the same time, Java was one of the bleeding edge technologies. There was a saying – ‘To get into IT job,  all you need to know  is spelling of Java’.

After few years of programming (mainly in Java, Web development, SQL, Database design), like many others, I moved on to project management and with more focus on day to day operations, I gradually lost hold on coding but not the zeal.

Several years later in the current digital world, data analytics caught my attention. I am interested in learning data analytics and visualization. Since few years, I started using Linux Mint Cinnamon OS more frequently on my personal laptop as it is free and open source(FOSS). I was fascinated by Cinnamon Desktop Environment. The website –, claims most of the Linux Mint is developed in Python language – I was aware of the fact that majority of Unix/Linux development happens in C but was surprised when I saw Python. This was my first encounter /awareness on Python.  This is when I started gathering my understanding of Python from internet.

Why learn Python ? 

  • It is a free and open source (FOSS)
  • Already available in several Linux distributions
  • Easy to learn for beginners (minimal coding is required)
  • One of the languages widely used for Data Analytics
  • Popular ( and good Community support
  • Availability of code libraries / packages
    • Many Web development frameworks – Django, Bottle, Flask etc
    • Scientific and numeric computing – Numpy, Matplotlib, Pandas etc
    • Rich GUI development – pyQt, wxPython

Python 2.x or 3.x ? 

Several books and websites debate on whether to use Python 2 or Python 3. I have noticed that by default, Python 2.7 was installed on Linux Mint 18 (Sarah). When I started learing, I felt that going forward the focus would be on developing Python 3.x  as  it is the present and future. Hence I started with Python 3.5 interpreter. Fortunately, Python 3.5 is also pre-installed on the latest Linux Mint 18.1 (Serena).

My favourite books for learning Python / Data Analytics

There are several online books and tutorials available. One of my favourite is Tutorials Point –

I follow the Google plus Python community frequently –

Also, Stackoverflow ( comes to my rescue whenever I encounter some hurdles.

Screenshot from 2016-12-31 20-44-32.png

Disclaimer: The opinions and experiences listed on the site are my personal. In some cases, my understanding could be incorrect as I am a beginner to intermediate programmer. Please point out if any correction is required so that I can consider editing  the blog.