The Multivariate Gaussian Distributed Data Visualization

Post Reply
User avatar
Eli
Senior Expert Member
Reactions: 183
Posts: 5334
Joined: 9 years ago
Location: Tanzania
Has thanked: 75 times
Been thanked: 88 times
Contact:

#1

The multivariate normal, multinormal or multivariate Gaussian distribution is a generalization of the one-dimensional normal distribution to higher dimensions. Such a distribution is specified by its mean and covariance matrix. These parameters are analogous to the mean (average or “center”) and variance (standard deviation, or “width,” squared) of the one-dimensional normal/Gaussian distribution.

The mean is a coordinate in N-dimensional space, which represents the location where samples are most likely to be generated. This is analogous to the peak of the bell curve for the one-dimensional or univariate normal distribution, see visualizing-random-samples-from-a-norma ... ution-5017

Covariance indicates the level to which two variables vary together (covary). From the multivariate normal distribution, we draw N-dimensional samples, \(X = [x_1, x_2, \dots, x_N]\). The covariance matrix element \(C_{ij}\) is the covariance of \(x_i\) and \(x_j\). The element \(C_{ii}\) is the variance of \(x_i\) (i.e. its “spread”).

Instead of specifying the full covariance matrix, popular approximations include:

  • Spherical covariance (covariance is a multiple of the identity matrix)
  • Diagonal covariance (covariance has non-negative elements, and only on the diagonal)
Note that the covariance matrix must be positive semidefinite (a.k.a. nonnegative-definite). Otherwise, the behavior of the method under consideration is undefined and backwards compatibility is not guaranteed. See Scipy documentation: https://docs.scipy.org/doc/numpy-1.13.0 ... ormal.html

The problem of multi-dimensional data is its visualization, it would be extremely difficult to get an insight from the large sample of data and analyse it without at least visualizing it. To illustrate how we can deal and visualize the problem of multidimensional data, we will generate two \(3 \times 30\)-dimensional samples randomly drawn from a multivariate Gaussian distribution. We will assume that the samples come from two different classes, where one half (i.e., 30) samples of our dataset are labelled, Class 1 and the other half, Class 2.

We will create two \(3 \times 30\) datasets - one dataset for each sample, where each column can be viewed as a \(3 \times 1\) vector,


\begin{align}x = \begin{bmatrix}
x_{1} \\
x_{2} \\
x_{3}
\end{bmatrix}\end{align}

so that our dataset will have the form

\begin{align}X = \begin{bmatrix}
x1_{1} & x1_{2} & x1_{3} & \dots & x1_{30} \\
x2_{1} & x2_{2} & x2_{3} & \dots & x2_{30} \\
x3_{1} & x3_{2} & x3_{3} & \dots & x3_{30}
\end{bmatrix}.\end{align}

We will assume that the sample means for our two datasets (Class 1 and Class 2) are given by

\begin{align}\mu_{1} = \begin{bmatrix}
0 \\
0 \\
0
\end{bmatrix}, \\ \end{align}

\begin{align} \mu_{2} = \begin{bmatrix}
1 \\
1 \\
1
\end{bmatrix}\end{align}

and the covariance matrices are

\begin{align}\Sigma_{1} = \Sigma_{2} = \begin{bmatrix}
1 & 0 & 0\\
0 & 1 & 0\\
0 & 0 & 1
\end{bmatrix}. \end{align}

Now, let's use the code below to create two \( 3 \times 30\) datasets for Class 1 and Class 2:

  1. import numpy as np
  2.  
  3. np.random.seed(4294967294) #Used the random seed for consistency
  4.  
  5. mu_vec_1 = np.array([0,0,0])
  6. cov_mat_1 = np.array([[1,0,0],[0,1,0],[0,0,1]])
  7. class_1_sample = np.random.multivariate_normal(mu_vec_1, cov_mat_1, 30).T
  8. assert class_1_sample.shape == (3,30), "The matrix dimensions is not 3x30"
  9.  
  10. mu_vec_2 = np.array([1,1,1])
  11. cov_mat_2 = np.array([[1,0,0],[0,1,0],[0,0,1]])
  12. class_2_sample = np.random.multivariate_normal(mu_vec_2, cov_mat_2, 30).T
  13. assert class_2_sample.shape == (3,30), "The matrix dimensions is not 3x30"


We can then get a rough idea on how the samples of our two classes distributed by plotting and visualizing them in a 3D scatter plot using the code below:

  1. import matplotlib.pyplot as plt
  2. from mpl_toolkits.mplot3d import Axes3D
  3. from mpl_toolkits.mplot3d import proj3d
  4.  
  5. fig = plt.figure(figsize=(10,10)) #Define figure size
  6.  
  7. ax = fig.add_subplot(111, projection='3d')
  8. plt.rcParams['legend.fontsize'] = 20  
  9. ax.plot(class_1_sample[0,:], class_1_sample[1,:], class_1_sample[2,:], 'o', markersize=10, color='green', alpha=1.0, label='Class 1')
  10. ax.plot(class_2_sample[0,:], class_2_sample[1,:], class_2_sample[2,:], 'o', markersize=10, alpha=1.0, color='red', label='Class 2')
  11.  
  12. plt.title('Data Samples for Classes 1 & 2', y=1.04)
  13. ax.legend(loc='upper right')
  14. plt.savefig('Multivariate_distr.png', bbox_inches='tight')
  15. plt.show()


The output is the attached 3-D scatter plot:

Image

Have a Nice Easter!
0
TSSFL -- A Creative Journey Towards Infinite Possibilities!
User avatar
Eli
Senior Expert Member
Reactions: 183
Posts: 5334
Joined: 9 years ago
Location: Tanzania
Has thanked: 75 times
Been thanked: 88 times
Contact:

#2

We can pruduce 3D plots with different looks:


Image
Attachments
20200830_185716.jpg
(90.46 KiB) Not downloaded yet
20200830_185716.jpg
(90.46 KiB) Not downloaded yet
0
TSSFL -- A Creative Journey Towards Infinite Possibilities!
Nyanga Honda
Member
Reactions: 0
Posts: 1
Joined: 9 months ago

#3

It's good but they need to be improved.
0
User avatar
Eli
Senior Expert Member
Reactions: 183
Posts: 5334
Joined: 9 years ago
Location: Tanzania
Has thanked: 75 times
Been thanked: 88 times
Contact:

#4

Hello @Nyanga Honda

Welcome to TSSFL Technology Stack.

What specifically needs improvement ? We are glad to hear from you.
0
TSSFL -- A Creative Journey Towards Infinite Possibilities!
Post Reply
  • Similar Topics
    Replies
    Views
    Last post

Return to “Statistics and Probability”

  • Information
  • Who is online

    Users browsing this forum: No registered users and 3 guests