Analyzing data in Python – Scatter (xy) Plot

A Scatter plot (also known as XY plot) has points that show the relationship between two sets of variables.

e.g., a plot of persons height vs weight.

import matplotlib.pyplot as plt
import pandas

heights = []
weights = []

colnames = ['Height', 'Weight']
data = pandas.read_csv('ShortListOfHeightWeight.csv', names=colnames)

heights=data.Height.tolist()
weights=data.Weight.tolist()

plt.scatter(heights, weights)
plt.title('Scatter plot of height and corresponding weight', fontsize=15)
plt.xlabel('height', fontsize=15)
plt.ylabel('weight', fontsize=15)
plt.show()

Here we have data of 250 persons (height in inches and corresponding weight in lbs).

Scatter

 

Advertisements

Analyzing data in Python – Time Series Plot

A time series graph is a graph or plot that illustrates data points at successive intervals of time. It can be drawn using a Python Pandas’ Series.plot method.

e.g., Plot of the closing values of stock market S&P BSE sensex on the y axis vs time on the x axis (starting year 2000 to 2018).

Data is downloaded as a csv file from the site https://www.bseindia.com/indices/IndexArchiveData.aspx

from pandas import Series
from matplotlib import pyplot as plt
series = Series.from_csv('SENSEX.csv', header=0)
plt.ylabel('Sensex')
series.plot()
plt.show()

TimeSeries1

Analyzing data in Python – Pareto Charts

As per Wikipedia, a Pareto chart, named after Vilfredo Pareto, is a type of chart that contains both bars and a line graph, where individual values are represented in descending order by bars, and the cumulative total is represented by the line.

Below is a simple example of Pareto Chart in Python

from matplotlib import pyplot as plot
import numpy as np
preference = ({'Comedy':1500,'Science Fiction':670,'Action':950,'Drama':450,'Romance':50})

# sort preference in descending order
weights, labels = zip(*sorted(((pref,genre) for genre,pref in preference.items()), reverse=True))

for i in weights:
 cumu_1 = weights[0]
 cumu_2 = weights[1] + cumu_1
 cumu_3 = weights[2] + cumu_2
 cumu_4 = weights[3] + cumu_3
 cumu_5 = weights[4] + cumu_4
cumu_weights = [cumu_1,cumu_2, cumu_3, cumu_4, cumu_5]

print(cumu_weights)

# lefthand edge of each bar
left = np.arange(len(weights))
fig, ax = plot.subplots(1, 1)
ax.bar(left, weights, 1)
ax.set_xticks(left)
ax.set_xticklabels(labels,fontsize=10, fontweight='bold', rotation=35, color='darkblue')
ax.plot(cumu_weights)

Here we are sorting the preference in decending order and drawing a barchart with weightage of preference on the y axis. We also take the cumulative values decreasing order of this weightages and plot as a line graph.

Pareto

 

Analyzing data in Python – Bar Charts

There are many good book on Data Analytics. Recently I borrowed a book from office library titled – “Even You Can Learn Statistics and Analytics: An Easy to Understand Guide to Statistics and Analytics” authored by David M. Levine and David F. Stephan. I feel this is a good one for beginners on Data Analytics.

There are also other good books like – ‘Python for Data Analysis’, ‘Python: Data Analytics and Visualization‘ , ‘Python for Finance’ etc.

One important aspect of presenting data is in Graph format (visual format – also known as Data Visualization).

A bar chart is useful for presenting categorical data. I has rectangular bars whose length is proportional to the categorical values we want to present.

E.g. We want to represent the Marks of a Student in several subjects:

Subject Score out 100
Maths 94
Physics 85
Chemistry 66
French 55
Computers 89
English 64

Bar1

This is a vertical bar graph. This can be achieved by a small Python code:

import matplotlib.pyplot as plot
import numpy as np
subjects = ['Maths', 'Physics', 'Chemistry', 'French', 'Computers', 'English']
marks = [94,85,66,55,89,64]

m = np.arange(len(subjects))
plot.bar(m, marks)
plot.xlabel('Subject')
plot.ylabel('Marks')
plot.xticks(m, subjects)
plot.title('Marks obtained out of 100')
plot.show()

Matplotlib is a Python 2D plotting library and Numpy is the fundamental package for scientific computing with Python are two impotant packages in Python

Here subjects and score (marks) are represented in Python arrays. len(subjects) return the length of subjects – in this case 6.

numpy.arrange is used to arrange the subjects on the graph in order.  On x-axis, we have the subject names (xticks) and on y-axis, we have marks.

plot.bar is plotting the vertical bar chart.

We can also define other parameters for the graph such as fontsize, weight, rotation:

plot.bar(m, marks,color='indigo')
plot.xlabel('Subject',fontsize=15, fontweight='bold', color='blue')
plot.ylabel('Marks',fontsize=15, fontweight='bold', color='blue')
plot.xticks(m, subjects, fontsize=10, fontweight='bold', rotation=35, color='blue')
plot.title('Marks obtained out of 100',fontsize=15, fontweight='bold', color='blue')

Bar2

The same can be represented as a horizontal bar graph. Instead of bar function, we use barh function.

import matplotlib.pyplot as plot
import numpy as np
subjects = ['Maths', 'Physics', 'Chemistry', 'French', 'Computers', 'English']
marks = [94,85,66,55,89,64]

m = np.arange(len(subjects))
plot.barh(m, marks,color='indigo')
plot.ylabel('Subject',fontsize=15, fontweight='bold', color='blue')
plot.xlabel('Marks',fontsize=15, fontweight='bold', color='blue')
plot.yticks(m, subjects, fontsize=10, fontweight='bold', rotation=35, color='blue')
plot.title('Marks obtained out of 100',fontsize=15, fontweight='bold', color='blue')
plot.show()

Bar3

In order to add the data value on the graph, we need to

for i, value in enumerate(marks):
 plot.text(value, i, str(value), color='indigo', fontweight='bold')

Bar4.png

I am using Spyder IDE (from Anaconda Navigator) in order to run this code. It has a handy feature, a variable explorer that shows the details of the variables used in code.

Screenshot from 2018-02-03 12-57-00.png

Setting up Python Environment-Anaconda Installation

For a newbie like me, it is difficult to keep upgrading Python and associated packages while resolving package dependencies. This is where Anaconda comes to my rescue. Anaconda is a free Python distribution and package manager. It comes with lot of pre-installed packages (primarily for data science).

It can be downloaded for Linux from the Continuum’s site https://www.continuum.io/downloads#linux . The instructions for installation on Linux are available on the same site. I have downloaded and installed 64 bit Python 3.6 version on my Linux Mint.

In order to update Anaconda and Python to latest version, you need to run the below command on the Terminal.

Screenshot from 2017-06-06 20-39-16

However, I continue to have older version of Python. You can see in below screenshot, Python 3.5.2 which I manually installed and Python 2.7.12 which was pre-installed on Linux Mint are still available.

Screenshot from 2017-06-06 20-47-40

conda update anaconda

Screenshot from 2017-06-06 20-50-45.png

On my Linux Mint, I have already updated to Anaconda version 4.4.0 (latest available as of date). This way, it is easy to keep upgrading Python and required packages.

On my PyCharm, I can choose Python 3.6 (installed through Anaconda / conda update) as the project interpreter.

Screenshot from 2017-06-06 20-57-17.png

Anaconda also comes with Anaconda Navigator – a GUI useful to launch Applications, manage packages, learning Python etc,

To add short cut to anaconda-navigator to desktop, the created the following script (desktop entry file in usr/share/applications folder

[Desktop Entry]
Version=1.0
Type=Application
Name=Anaconda-Navigator
GenericName=Anaconda
Comment=Scientific PYthon Development EnviRonment - Python3
Exec=bash -c 'export PATH="/home/srinivas/anaconda3/bin:$PATH" && /home/srinivas/anaconda3/bin/anaconda-navigator'
Categories=Development;Science;IDE;Qt;Education;
Icon=/home/srinivas/anaconda3/Anaconda.png
Terminal=false
StartupNotify=true
Name[en_IN]=Anaconda

Screenshot from 2017-06-06 21-08-23

Spyder is an open source cross platform IDE for scientific programming in Python. Spyder integrates NumPy, SciPy, Matplotlib and IPython, as well as other open source software.

Screenshot from 2017-06-06 21-10-01

To conclude, Anaconda is a Python distribution with lot of useful features and learning opportunity in one place.

 

 

Setting up Python Environment-Installing Packages

In order to build useful applications, we need Python Libraries or Packages. Majority of such useful pckages can be downloaded from PyPI, the Python Package Index https://pypi.python.org/pypi

Best way to install the packages is by using a tool called pip. We can get pip from https://pip.pypa.io/en/latest/installing.html. However, on Linux Mint, pip is already installed along with Python 2.7.12. Similarly, when I installed Python 3.5.2, pip3 tool is installed. To upgrade to latest pip, you need to run below command on terminal

pip install -U pip

Now let us look at some of the useful packages for analyzing data.

Numpy (http://www.numpy.org/): Is useful for processing for numbers, strings, records, and objects.

pip install numpy

Pandas (http://pandas.pydata.org/): Python Data Analysis Library provides various data analysis tools for Python.

pip install pandas

Matplotlib (https://matplotlib.org/): Matplotlib is a Python 2D plotting library to produce publication quality graphs and figures.

pip install matplot

OpenPyXL (https://openpyxl.readthedocs.io/en/default/): Openpyxl is a Python library for reading and writing Excel 2010 xlsx/xlsm/xltx/xltm files.

pip install openpyxl

Once these packages are installed, they can be imported and used in your application. E.g.,

import numpy
import pandas
import openpyxl
import matplotlib

from pandas import DataFrame
from pandas import *

 

 

Setting up Python Environment-pyQt

In order to build small desktop based application, I need to have a framework to build UI.

Qt is a C++ based framework of libraries and tools that enables development of cross-platform applications and devices. There are both commercial and open source versions of Qt available.

Here is the link to download the same – https://info.qt.io/download-qt-for-application-development

In my case (on Linux Mint), I initially downloaded open source version of online installer from http://download.qt.io/official_releases/online_installers/qt-unified-linux-x64-online.run and then custom installed Qt Designer tool

Found the designer executable file in /usr/lib/x86_64-linux-gnu/qt5/bin folder

Created a shortcut in Mint Menu by creating a QtDesigner.desktop file in /usr/share/applications

[Desktop Entry]
Name=QtDesigner
GenericName=QtDesigner
Exec=/your path/Qt/5.7/gcc_64/bin/designer
Icon=/your path/Qt/Examples/Qt-5.7/widgets/widgets/icons/images/designer.png
Type=Application
Categories=Development
Name[en_IN]=QtDesigner

Screenshot from 2017-05-21 11-53-56Screenshot from 2017-05-21 12-13-12

A sample screen designed above can be saved as a .ui file

MyFirstQtApp.ui
<?xml version="1.0" encoding="UTF-8"?>
<ui version="4.0">
 <class>Form</class>
 <widget class="QWidget" name="Form">
 <property name="geometry">
 <rect>
 <x>0</x>
 <y>0</y>
 <width>603</width>
 <height>604</height>
 </rect>
 </property>
 <property name="windowTitle">
 <string>Person</string>
 </property>
 <widget class="QLabel" name="MyFirstQtProject">
 <property name="geometry">
 <rect>
 <x>250</x>
 <y>30</y>
 <width>101</width>
 <height>20</height>
 </rect>
 </property>
 <property name="text">
 <string>Person Details</string>
 </property>
 </widget>
 <widget class="QGroupBox" name="groupBox">
 <property name="geometry">
 <rect>
 <x>80</x>
 <y>60</y>
 <width>431</width>
 <height>141</height>
 </rect>
 </property>
 <property name="autoFillBackground">
 <bool>true</bool>
 </property>
 <property name="title">
 <string>Person</string>
 </property>
 <property name="flat">
 <bool>false</bool>
 </property>
 <widget class="QLabel" name="MyFirstQtProject_4">
 <property name="geometry">
 <rect>
 <x>0</x>
 <y>110</y>
 <width>101</width>
 <height>20</height>
 </rect>
 </property>
 <property name="text">
 <string>Last Name</string>
 </property>
 </widget>
 <widget class="QLabel" name="MyFirstQtProject_3">
 <property name="geometry">
 <rect>
 <x>0</x>
 <y>70</y>
 <width>101</width>
 <height>20</height>
 </rect>
 </property>
 <property name="text">
 <string>Middle Name</string>
 </property>
 </widget>
 <widget class="QLabel" name="MyFirstQtProject_2">
 <property name="geometry">
 <rect>
 <x>0</x>
 <y>30</y>
 <width>101</width>
 <height>20</height>
 </rect>
 </property>
 <property name="text">
 <string>First Name</string>
 </property>
 </widget>
 <widget class="QTextEdit" name="textEdit_3">
 <property name="geometry">
 <rect>
 <x>90</x>
 <y>20</y>
 <width>211</width>
 <height>31</height>
 </rect>
 </property>
 </widget>
 <widget class="QTextEdit" name="textEdit_4">
 <property name="geometry">
 <rect>
 <x>90</x>
 <y>60</y>
 <width>211</width>
 <height>31</height>
 </rect>
 </property>
 </widget>
 <widget class="QTextEdit" name="textEdit_5">
 <property name="geometry">
 <rect>
 <x>90</x>
 <y>100</y>
 <width>211</width>
 <height>31</height>
 </rect>
 </property>
 </widget>
 </widget>
 <widget class="QGroupBox" name="groupBox_2">
 <property name="geometry">
 <rect>
 <x>80</x>
 <y>260</y>
 <width>341</width>
 <height>221</height>
 </rect>
 </property>
 <property name="autoFillBackground">
 <bool>true</bool>
 </property>
 <property name="title">
 <string>Address</string>
 </property>
 <property name="flat">
 <bool>false</bool>
 </property>
 <widget class="QLabel" name="MyFirstQtProject_5">
 <property name="geometry">
 <rect>
 <x>0</x>
 <y>110</y>
 <width>101</width>
 <height>20</height>
 </rect>
 </property>
 <property name="text">
 <string>City </string>
 </property>
 </widget>
 <widget class="QLabel" name="MyFirstQtProject_6">
 <property name="geometry">
 <rect>
 <x>0</x>
 <y>70</y>
 <width>101</width>
 <height>20</height>
 </rect>
 </property>
 <property name="text">
 <string>Line 2</string>
 </property>
 </widget>
 <widget class="QLabel" name="MyFirstQtProject_7">
 <property name="geometry">
 <rect>
 <x>0</x>
 <y>30</y>
 <width>101</width>
 <height>20</height>
 </rect>
 </property>
 <property name="text">
 <string>Line 1</string>
 </property>
 </widget>
 <widget class="QLabel" name="MyFirstQtProject_9">
 <property name="geometry">
 <rect>
 <x>0</x>
 <y>200</y>
 <width>101</width>
 <height>20</height>
 </rect>
 </property>
 <property name="text">
 <string>Country</string>
 </property>
 </widget>
 <widget class="QTextEdit" name="textEdit">
 <property name="geometry">
 <rect>
 <x>80</x>
 <y>30</y>
 <width>211</width>
 <height>31</height>
 </rect>
 </property>
 </widget>
 <widget class="QTextEdit" name="textEdit_2">
 <property name="geometry">
 <rect>
 <x>80</x>
 <y>70</y>
 <width>211</width>
 <height>31</height>
 </rect>
 </property>
 </widget>
 <widget class="QLabel" name="MyFirstQtProject_8">
 <property name="geometry">
 <rect>
 <x>0</x>
 <y>150</y>
 <width>101</width>
 <height>20</height>
 </rect>
 </property>
 <property name="text">
 <string>State</string>
 </property>
 </widget>
 <widget class="QLineEdit" name="lineEdit">
 <property name="geometry">
 <rect>
 <x>80</x>
 <y>110</y>
 <width>113</width>
 <height>27</height>
 </rect>
 </property>
 </widget>
 <widget class="QLineEdit" name="lineEdit_2">
 <property name="geometry">
 <rect>
 <x>80</x>
 <y>150</y>
 <width>113</width>
 <height>27</height>
 </rect>
 </property>
 </widget>
 <widget class="QComboBox" name="comboBox">
 <property name="geometry">
 <rect>
 <x>80</x>
 <y>190</y>
 <width>181</width>
 <height>31</height>
 </rect>
 </property>
 </widget>
 </widget>
 <widget class="QPushButton" name="pushButton">
 <property name="geometry">
 <rect>
 <x>190</x>
 <y>530</y>
 <width>85</width>
 <height>27</height>
 </rect>
 </property>
 <property name="text">
 <string>Add</string>
 </property>
 </widget>
 <widget class="QPushButton" name="pushButton_2">
 <property name="geometry">
 <rect>
 <x>320</x>
 <y>530</y>
 <width>85</width>
 <height>27</height>
 </rect>
 </property>
 <property name="text">
 <string>Reset</string>
 </property>
 </widget>
 </widget>
 <resources/>
 <connections/>
</ui> 
PyQt from Riverbank Computing, is a set of Python bindings for Qt application framework.

On Linux, I use the below command to install pyQt

pip3 install PyQt5

In order to convert the MyFirstQtApp.ui file to python file MyFirstQtApp.py, we need to use a tool in PyQt called pyuic from the terminal

pyuic5 -x MyFirstQtApp.ui -o MyFirstQtApp.py
MyFirstQtApp.py
# -*- coding: utf-8 -*-

# Form implementation generated from reading ui file 'MyFirstQtApp.ui'
#
# Created by: PyQt5 UI code generator 5.6
#
# WARNING! All changes made in this file will be lost!

from PyQt5 import QtCore, QtGui, QtWidgets

class Ui_Form(object):
 def setupUi(self, Form):
 Form.setObjectName("Form")
 Form.resize(603, 604)
 self.MyFirstQtProject = QtWidgets.QLabel(Form)
 self.MyFirstQtProject.setGeometry(QtCore.QRect(250, 30, 101, 20))
 self.MyFirstQtProject.setObjectName("MyFirstQtProject")
 self.groupBox = QtWidgets.QGroupBox(Form)
 self.groupBox.setGeometry(QtCore.QRect(80, 60, 431, 141))
 self.groupBox.setAutoFillBackground(True)
 self.groupBox.setFlat(False)
 self.groupBox.setObjectName("groupBox")
 self.MyFirstQtProject_4 = QtWidgets.QLabel(self.groupBox)
 self.MyFirstQtProject_4.setGeometry(QtCore.QRect(0, 110, 101, 20))
 self.MyFirstQtProject_4.setObjectName("MyFirstQtProject_4")
 self.MyFirstQtProject_3 = QtWidgets.QLabel(self.groupBox)
 self.MyFirstQtProject_3.setGeometry(QtCore.QRect(0, 70, 101, 20))
 self.MyFirstQtProject_3.setObjectName("MyFirstQtProject_3")
 self.MyFirstQtProject_2 = QtWidgets.QLabel(self.groupBox)
 self.MyFirstQtProject_2.setGeometry(QtCore.QRect(0, 30, 101, 20))
 self.MyFirstQtProject_2.setObjectName("MyFirstQtProject_2")
 self.textEdit_3 = QtWidgets.QTextEdit(self.groupBox)
 self.textEdit_3.setGeometry(QtCore.QRect(90, 20, 211, 31))
 self.textEdit_3.setObjectName("textEdit_3")
 self.textEdit_4 = QtWidgets.QTextEdit(self.groupBox)
 self.textEdit_4.setGeometry(QtCore.QRect(90, 60, 211, 31))
 self.textEdit_4.setObjectName("textEdit_4")
 self.textEdit_5 = QtWidgets.QTextEdit(self.groupBox)
 self.textEdit_5.setGeometry(QtCore.QRect(90, 100, 211, 31))
 self.textEdit_5.setObjectName("textEdit_5")
 self.groupBox_2 = QtWidgets.QGroupBox(Form)
 self.groupBox_2.setGeometry(QtCore.QRect(80, 260, 341, 221))
 self.groupBox_2.setAutoFillBackground(True)
 self.groupBox_2.setFlat(False)
 self.groupBox_2.setObjectName("groupBox_2")
 self.MyFirstQtProject_5 = QtWidgets.QLabel(self.groupBox_2)
 self.MyFirstQtProject_5.setGeometry(QtCore.QRect(0, 110, 101, 20))
 self.MyFirstQtProject_5.setObjectName("MyFirstQtProject_5")
 self.MyFirstQtProject_6 = QtWidgets.QLabel(self.groupBox_2)
 self.MyFirstQtProject_6.setGeometry(QtCore.QRect(0, 70, 101, 20))
 self.MyFirstQtProject_6.setObjectName("MyFirstQtProject_6")
 self.MyFirstQtProject_7 = QtWidgets.QLabel(self.groupBox_2)
 self.MyFirstQtProject_7.setGeometry(QtCore.QRect(0, 30, 101, 20))
 self.MyFirstQtProject_7.setObjectName("MyFirstQtProject_7")
 self.MyFirstQtProject_9 = QtWidgets.QLabel(self.groupBox_2)
 self.MyFirstQtProject_9.setGeometry(QtCore.QRect(0, 200, 101, 20))
 self.MyFirstQtProject_9.setObjectName("MyFirstQtProject_9")
 self.textEdit = QtWidgets.QTextEdit(self.groupBox_2)
 self.textEdit.setGeometry(QtCore.QRect(80, 30, 211, 31))
 self.textEdit.setObjectName("textEdit")
 self.textEdit_2 = QtWidgets.QTextEdit(self.groupBox_2)
 self.textEdit_2.setGeometry(QtCore.QRect(80, 70, 211, 31))
 self.textEdit_2.setObjectName("textEdit_2")
 self.MyFirstQtProject_8 = QtWidgets.QLabel(self.groupBox_2)
 self.MyFirstQtProject_8.setGeometry(QtCore.QRect(0, 150, 101, 20))
 self.MyFirstQtProject_8.setObjectName("MyFirstQtProject_8")
 self.lineEdit = QtWidgets.QLineEdit(self.groupBox_2)
 self.lineEdit.setGeometry(QtCore.QRect(80, 110, 113, 27))
 self.lineEdit.setObjectName("lineEdit")
 self.lineEdit_2 = QtWidgets.QLineEdit(self.groupBox_2)
 self.lineEdit_2.setGeometry(QtCore.QRect(80, 150, 113, 27))
 self.lineEdit_2.setObjectName("lineEdit_2")
 self.comboBox = QtWidgets.QComboBox(self.groupBox_2)
 self.comboBox.setGeometry(QtCore.QRect(80, 190, 181, 31))
 self.comboBox.setObjectName("comboBox")
 self.pushButton = QtWidgets.QPushButton(Form)
 self.pushButton.setGeometry(QtCore.QRect(190, 530, 85, 27))
 self.pushButton.setObjectName("pushButton")
 self.pushButton_2 = QtWidgets.QPushButton(Form)
 self.pushButton_2.setGeometry(QtCore.QRect(320, 530, 85, 27))
 self.pushButton_2.setObjectName("pushButton_2")

self.retranslateUi(Form)
 QtCore.QMetaObject.connectSlotsByName(Form)

def retranslateUi(self, Form):
 _translate = QtCore.QCoreApplication.translate
 Form.setWindowTitle(_translate("Form", "Person"))
 self.MyFirstQtProject.setText(_translate("Form", "Person Details"))
 self.groupBox.setTitle(_translate("Form", "Person"))
 self.MyFirstQtProject_4.setText(_translate("Form", "Last Name"))
 self.MyFirstQtProject_3.setText(_translate("Form", "Middle Name"))
 self.MyFirstQtProject_2.setText(_translate("Form", "First Name"))
 self.groupBox_2.setTitle(_translate("Form", "Address"))
 self.MyFirstQtProject_5.setText(_translate("Form", "City "))
 self.MyFirstQtProject_6.setText(_translate("Form", "Line 2"))
 self.MyFirstQtProject_7.setText(_translate("Form", "Line 1"))
 self.MyFirstQtProject_9.setText(_translate("Form", "Country"))
 self.MyFirstQtProject_8.setText(_translate("Form", "State"))
 self.pushButton.setText(_translate("Form", "Add"))
 self.pushButton_2.setText(_translate("Form", "Reset"))


if __name__ == "__main__":
 import sys
 app = QtWidgets.QApplication(sys.argv)
 Form = QtWidgets.QWidget()
 ui = Ui_Form()
 ui.setupUi(Form)
 Form.show()
 sys.exit(app.exec_())

Once the python UI file is available, the python file can be run (At this point of time, as there is no functionality coded, only the UI screen is displayed)

Screenshot from 2017-05-21 12-41-54