Combining Pandas and Plotly for Data Visualization

#1

We will work you through an example of how to use Pandas and Plotly to visualize data. The aim of visualization is not numbers, but to get an insight and a feeling about the data. Pandas is a Python library that makes it easier to work with data - it offers functionalities that make data presentation, visualization and analysis pretty simple task. In this brief tutorial, we will use Plotly offline.

In our tutorial, we will work with the famous "Iris" dataset that is stored on the UCI machine learning repository.

The Iris dataset contains measurements for 150 Iris flowers from three different species.

The three classes, each with 50 measurements, in the Iris dataset are:

Iris-setosa
Iris-versicolor
Iris-virginica

And the four features of each species in the Iris dataset are:

sepal length in cm
sepal width in cm
petal length in cm
petal width in cm

Loading the Iris dataset

We can download the Iris dataset from UCI repository and put it, say on the Desktop, then input the path to load it, but here we will use the pandas library and load the dataset directly from the UCI repository:

Code: [Select all] [Expand/Collapse]

import pandas as pds
df = pds.read_csv(
    filepath_or_buffer='https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data',
    header=None, 
    sep=',')
 
df.columns=['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'class']
df.dropna(how="all", inplace=True) # This drops the empty line at the file-end

We can respectively, print the first/last few rows of the dataset we loaded as follows:

Code: [Select all] [Expand/Collapse]

print(df.head())
   sepal_length  sepal_width  petal_length  petal_width        class
          5.1          3.5           1.4          0.2  Iris-setosa
          4.9          3.0           1.4          0.2  Iris-setosa
          4.7          3.2           1.3          0.2  Iris-setosa
          4.6          3.1           1.5          0.2  Iris-setosa
          5.0          3.6           1.4          0.2  Iris-setosa
 
print(df.tail())
 sepal_length  sepal_width  petal_length  petal_width           class
          6.7          3.0           5.2          2.3  Iris-virginica
          6.3          2.5           5.0          1.9  Iris-virginica
          6.5          3.0           5.2          2.0  Iris-virginica
          6.2          3.4           5.4          2.3  Iris-virginica
          5.9          3.0           5.1          1.8  Iris-virginica

Let's split the data table into data values X and class/species labels Y:

Code: [Select all] [Expand/Collapse]

X = df.ix[:,0:4].values
Y = df.ix[:,4].values

The Iris dataset is now formated in form of a $150 \times 4$ matrix where the columns are the different features, and every row represents a separate flower sample. Each sample row $x$ can be viewed as a 4-dimensional vector:

$x^{T} = \begin{bmatrix} x_{1} \\ x_{2} \\ x_{3} \\ x_{4} \end{bmatrix} = \begin{bmatrix} \text{ sepal \ length } \\ \text{sepal \ width} \\ \text{ petal \ length} \\ \text{petal \ width} \end{bmatrix}$

Finally, let's visualize our dataset using histograms:

Code: [Select all] [Expand/Collapse]

from plotly.graph_objs import *
 
traces = []
 
legend = {0:False, 1:False, 2:False, 3:True} 
 
#You can choose other colors
colors = {'Iris-setosa': 'rgb(31, 119, 180)', 
          'Iris-versicolor': 'rgb(255, 127, 14)', 
          'Iris-virginica': 'rgb(44, 160, 44)'}
 
for col in range(4):
    for key in colors:
        traces.append(Histogram(x=X[y==key, col], 
                        opacity=0.75,
                        xaxis='x%s' %(col+1),
                        marker=Marker(color=colors[key]),
                        name=key,
                        showlegend=legend[col]))
 
data = Data(traces)
 
layout = Layout(barmode='overlay',
                xaxis=XAxis(domain=[0, 0.25], title='sepal length (cm)'),
                xaxis2=XAxis(domain=[0.3, 0.5], title='sepal width (cm)'),
                xaxis3=XAxis(domain=[0.55, 0.75], title='petal length (cm)'),
                xaxis4=XAxis(domain=[0.8, 1], title='petal width (cm)'),
                yaxis=YAxis(title='count'),
                title='Different Iris flower features distribution')
 
#Use Plotly offline
from plotly.offline import plot 
 
fig = Figure(data=data, layout=layout)
plot(fig)

The output is:

#2

Here is the full code, ix is deprecated and has been replaced by iloc. Run this code here:

Code: [Select all] [Expand/Collapse]

import pandas as pds
from plotly.offline import plot
from plotly.graph_objs import *
import chart_studio.plotly as py
py.sign_in('TSSFL', 'VIrC8tjxOn2nwujbiwrk') #Sign into TSSFL Plotly online
df = pds.read_csv(
filepath_or_buffer='https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data',
header=None,
sep=',')
 
df.columns=['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'class']
df.dropna(how="all", inplace=True) # This drops the empty line at the file-end
 
X = df.iloc[:,0:4].values
Y = df.iloc[:,4].values
 
traces = []
 
legend = {0:False, 1:False, 2:False, 3:True}
 
#You can choose other colors
colors = {'Iris-setosa': 'rgb(31, 119, 180)',
      'Iris-versicolor': 'rgb(255, 127, 14)',
      'Iris-virginica': 'rgb(44, 160, 44)'}
 
for col in range(4):
    for key in colors:
        traces.append(Histogram(x=X[Y==key, col],
                    opacity=0.75,
                    xaxis='x%s' %(col+1),
                    marker=Marker(color=colors[key]),
                    name=key,
                    showlegend=legend[col]))
data = Data(traces)
 
layout = Layout(barmode='overlay',
            xaxis=XAxis(domain=[0, 0.25], title='sepal length (cm)'),
            xaxis2=XAxis(domain=[0.3, 0.5], title='sepal width (cm)'),
            xaxis3=XAxis(domain=[0.55, 0.75], title='petal length (cm)'),
            xaxis4=XAxis(domain=[0.8, 1], title='petal width (cm)'),
            yaxis=YAxis(title='count'),
            title='Different Iris flower features distribution')
 
#Use Plotly offline
fig = Figure(data=data, layout=layout)
plot(fig)
 
#Upload the plot into Plotly Online
plot_url = py.plot(fig, filename='Iris_data_plot.png')

The HTML based output will look like this:

TSSFL TECHNOLOGY STACK

Combining Pandas and Plotly for Data Visualization

Who is online

Combining Pandas and Plotly for Data Visualization

Who is online

Login • Register