Analyzing data in Python – Pareto Charts

As per Wikipedia, a Pareto chart, named after Vilfredo Pareto, is a type of chart that contains both bars and a line graph, where individual values are represented in descending order by bars, and the cumulative total is represented by the line.

Below is a simple example of Pareto Chart in Python

from matplotlib import pyplot as plot
import numpy as np
preference = ({'Comedy':1500,'Science Fiction':670,'Action':950,'Drama':450,'Romance':50})

# sort preference in descending order
weights, labels = zip(*sorted(((pref,genre) for genre,pref in preference.items()), reverse=True))

for i in weights:
 cumu_1 = weights[0]
 cumu_2 = weights[1] + cumu_1
 cumu_3 = weights[2] + cumu_2
 cumu_4 = weights[3] + cumu_3
 cumu_5 = weights[4] + cumu_4
cumu_weights = [cumu_1,cumu_2, cumu_3, cumu_4, cumu_5]

print(cumu_weights)

# lefthand edge of each bar
left = np.arange(len(weights))
fig, ax = plot.subplots(1, 1)
ax.bar(left, weights, 1)
ax.set_xticks(left)
ax.set_xticklabels(labels,fontsize=10, fontweight='bold', rotation=35, color='darkblue')
ax.plot(cumu_weights)

Here we are sorting the preference in decending order and drawing a barchart with weightage of preference on the y axis. We also take the cumulative values decreasing order of this weightages and plot as a line graph.

Pareto

 

Advertisements

How I got bitten by Python programming

Many years ago, I used to be a Java programmer. In fact, I started my information technology career in the year 1999 as a software developer in a small company which focussed on application software development. During my engineering days, I learnt Fortran and C programming. When I completed my engineering, there was Y2K problem (https://en.wikipedia.org/wiki/Year_2000_problem) which helped many job aspirants to jump into IT industry irrespective of their educational background.

During the same time, Java was one of the bleeding edge technologies. There was a saying – ‘To get into IT job,  all you need to know  is spelling of Java’.

After few years of programming (mainly in Java, Web development, SQL, Database design), like many others, I moved on to project management and with more focus on day to day operations, I gradually lost hold on coding but not the zeal.

Several years later in the current digital world, data analytics caught my attention. I am interested in learning data analytics and visualization. Since few years, I started using Linux Mint Cinnamon OS more frequently on my personal laptop as it is free and open source(FOSS). I was fascinated by Cinnamon Desktop Environment. The website – https://en.wikipedia.org/wiki/Linux_Mint, claims most of the Linux Mint is developed in Python language – https://www.python.org/. I was aware of the fact that majority of Unix/Linux development happens in C but was surprised when I saw Python. This was my first encounter /awareness on Python.  This is when I started gathering my understanding of Python from internet.

Why learn Python ? 

  • It is a free and open source (FOSS)
  • Already available in several Linux distributions
  • Easy to learn for beginners (minimal coding is required)
  • One of the languages widely used for Data Analytics
  • Popular (http://www.tiobe.com/tiobe-index/) and good Community support
  • Availability of code libraries / packages
    • Many Web development frameworks – Django, Bottle, Flask etc
    • Scientific and numeric computing – Numpy, Matplotlib, Pandas etc
    • Rich GUI development – pyQt, wxPython

Python 2.x or 3.x ? 

Several books and websites debate on whether to use Python 2 or Python 3. I have noticed that by default, Python 2.7 was installed on Linux Mint 18 (Sarah). When I started learing, I felt that going forward the focus would be on developing Python 3.x  as  it is the present and future. Hence I started with Python 3.5 interpreter. Fortunately, Python 3.5 is also pre-installed on the latest Linux Mint 18.1 (Serena).

My favourite books for learning Python / Data Analytics

There are several online books and tutorials available. One of my favourite is Tutorials Point – https://www.tutorialspoint.com/python/

I follow the Google plus Python community frequently – https://plus.google.com/u/0/communities/103393744324769547228

Also, Stackoverflow (http://stackoverflow.com/questions/tagged/python) comes to my rescue whenever I encounter some hurdles.

Screenshot from 2016-12-31 20-44-32.png

Disclaimer: The opinions and experiences listed on the site are my personal. In some cases, my understanding could be incorrect as I am a beginner to intermediate programmer. Please point out if any correction is required so that I can consider editing  the blog.