TSSFL TECHNOLOGY STACK

Posted: **Mon Jul 05, 2021 8:12 pm**

TSSFL ODF Live ShowCase During Sabasaba International Trade Fair In Dar Es Salaam, Tanzania, From 28th June - 13th July 2021

Visualizing the Global 2019 GDP Per Capita, Life Expectancy, and other Social Factors Dataset

Below we show how to use the data science tools, in particular, Python programming language to visualize the relationship between economic and social factors. We use this dataset which features GDP per capita, social support, healthy life expectancy, freedom to make choices, generosity, and so on, all over the world. Run the code below:

Code: [Select all] [Expand/Collapse]

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
 
#We use the dataset called "2019.csv" found at https://github.com/fati8999-tech/Data-visualization-with-Python-Using-Seaborn-and-Plotly_-GDP-per-Capita-Life-Expectency-Dataset/blob/master/2019.csv
#Pull the "raw" GitHub content
df = pd.read_csv('https://raw.githubusercontent.com/fati8999-tech/Data-visualization-with-Python-Using-Seaborn-and-Plotly_-GDP-per-Capita-Life-Expectency-Dataset/master/2019.csv')
print(df.head(5))
 
#Configure plotting parameters
import seaborn as sns
#plt.style.use('ggplot')
sns.set_style('darkgrid') # darkgrid, white grid, dark, white and ticks
plt.rc('axes', titlesize=18)     # fontsize of the axes title
plt.rc('axes', labelsize=14)    # fontsize of the x and y labels
plt.rc('xtick', labelsize=13)    # fontsize of the tick labels
plt.rc('ytick', labelsize=13)    # fontsize of the tick labels
plt.rc('legend', fontsize=13)    # legend fontsize
plt.rc('font', size=13)
 
colors1 = sns.color_palette('pastel')
colors2 = sns.color_palette('deep')
#colors = sns.color_palette("Set2")
#Let's plot a distribution of a single column in a dataframe (GDP per capita)
#using sns.distplot(dataofsinglecolumn)
 
sns.distplot(df['GDP per capita'], bins=10, color="magenta") #Use 10 bins
plt.show()
plt.clf()
 
#Let's use 25 bins and remove KDE
sns.distplot(df['GDP per capita'], kde = False , bins = 25, color="magenta")
plt.show()
 
#Jointplot
#Let's visualize the relationship between two variables using scatter and histogram plots
 
sns.jointplot(x=df['GDP per capita'], y= df['Healthy life expectancy'],data=df, color="green") #Two ditribution x and y
plt.show()
plt.clf()
 
#Let's draw scatter plot using function kind = "", and bin the data into
#hexagons with histogram in the margins
sns.jointplot(x=df['GDP per capita'], y= df['Healthy life expectancy'],data=df,kind='reg', color=colors2[6])
plt.show()
plt.clf()
 
#
sns.jointplot(x=df['GDP per capita'], y= df['Healthy life expectancy'],data=df,kind='resid', color=colors1[5])
plt.show()
plt.clf()
 
sns.jointplot(x=df['GDP per capita'], y= df['Healthy life expectancy'],data=df,kind='kde', color="purple")
plt.show()
plt.clf()
 
sns.jointplot(x=df['GDP per capita'], y= df['Healthy life expectancy'],data=df,kind='hist', color="darkblue")
plt.show()
plt.clf()
 
sns.jointplot(x=df['GDP per capita'], y= df['Healthy life expectancy'],data=df,kind='hex', color="red")
plt.show()
plt.clf()
 
#Results show that GDP per capita and Healthy life expectancy are positively linearly correlated
 
df_sorted = df.sort_values('GDP per capita',ascending=False)
#Let's plot categorical GDP per capita for top ten countries
plt.figure(figsize=(10, 6), tight_layout=True)
sns.barplot(x=df_sorted['GDP per capita'],y=df_sorted['Country or region'].head(10),data=df_sorted, color="darkcyan")
plt.xticks(rotation=90)
plt.title("Top 10 Countries with Highest GDP per Capita")
for i, v in enumerate(df_sorted['GDP per capita'].head(10)):
    plt.text(v+0.01, i, str(round(v, 4)), color='steelblue', va="center")
    plt.text(v+0.3, i, str(i+1), color='black', va="center")
 
#plt.subplots_adjust(left=0.3)    
textstr = 'Created at \nwww.tssfl.com'
#plt.text(0.02, 0.5, textstr, fontsize=14, transform=plt.gcf().transFigure)
plt.gcf().text(0.02, 0.9, textstr, fontsize=14, color='green') # (0,0) is bottom left, (1,1) is top right
plt.show()
plt.clf()
 
df_sorted = df.sort_values('GDP per capita',ascending=False)
#Let's plot categorical GDP per capital for top ten countries
plt.figure(figsize=(8,6), tight_layout=True)
sns.barplot(x=df_sorted['Country or region'].head(10), y=df_sorted['GDP per capita'],data=df_sorted, color="darkcyan")
plt.xticks(rotation=90)
plt.title("Top 10 Countries with Highest GDP per Capita")
xlocs, xlabs = plt.xticks()
for i, v in enumerate(df_sorted['GDP per capita'].head(10)):
    plt.text(xlocs[i] - 0.25, v + 0.05, str(v), color='steelblue', va="center")
plt.gcf().text(0.02, 0.1, textstr, fontsize=14, color='green')
plt.show()
plt.clf()
 
#Let's plot categorical GDP per capital for top ten countries
df_sorted = df.sort_values('GDP per capita',ascending=True)
plt.figure(figsize=(8,8), tight_layout=True)
sns.barplot(x=df_sorted['GDP per capita'],y=df_sorted['Country or region'].head(10),data=df_sorted, color="darkmagenta")
plt.xticks(rotation=90)
plt.title("Countries with Lowest GDP per Capita")
for i, v in enumerate(df_sorted['GDP per capita'].head(10)):
    plt.text(v+0.01, i, str(round(v, 4)), color='teal', va="center")
plt.gcf().text(0.7, 0.85, textstr, fontsize=14, color='green')
plt.show()
plt.clf()
 
df_sorted = df.sort_values('GDP per capita',ascending=True)
#Let's plot categorical GDP per capital for top ten countries
plt.figure(figsize=(8,8), tight_layout=True)
sns.barplot(x=df_sorted['Country or region'].head(10), y=df_sorted['GDP per capita'],data=df_sorted, color="darkmagenta")
plt.xticks(rotation=90)
plt.title("Countries with Lowest GDP per Capita")
xlocs, xlabs = plt.xticks()
for i, v in enumerate(df_sorted['GDP per capita'].head(10)):
    plt.text(xlocs[i] - 0.25, v + 0.01, str(v), color='teal', va="center")
plt.gcf().text(0.2, 0.85, textstr, fontsize=14, color='green')
plt.show()
plt.clf()
 
df_sorted = df.sort_values('GDP per capita',ascending=True)
#Let's plot categorical GDP per capital for top ten countries
plt.figure(figsize=(12,40), tight_layout=True)
sns.barplot(x=df_sorted['GDP per capita'],y=df_sorted['Country or region'],data=df_sorted, color="lightblue")
plt.xticks(rotation=90)
plt.title("GDP per Capita")
for i, v in enumerate(df_sorted['GDP per capita']):
    plt.text(v+0.01, i, str(round(v, 4)), color='teal', va="center")
    plt.text(v+0.15, i, str(157-(i+1)), color='black', va="center")
plt.gcf().text(0.55, 0.96, textstr, fontsize=14, color='green')
plt.show()
plt.clf()
 
df_sorted = df.sort_values('GDP per capita',ascending=False)
#Let's plot categorical GDP per capital for top ten countries
plt.figure(figsize=(12,40), tight_layout=True)
sns.barplot(x=df_sorted['GDP per capita'],y=df_sorted['Country or region'],data=df_sorted, color="lightblue")
plt.xticks(rotation=90)
plt.title("GDP per Capita")
for i, v in enumerate(df_sorted['GDP per capita']):
    plt.text(v+0.01, i, str(round(v, 4)), color='teal', va="center")
    plt.text(v+0.15, i, str(i+1), color='black', va="center")
plt.gcf().text(0.02, 0.99, textstr, fontsize=14, color='green')
plt.show()
plt.clf()
#End
 
#Let's plot categorical GDP per capital for top ten countries
plt.figure(figsize=(8,5), tight_layout=True)
sns.barplot(x=df['Country or region'].tail(10),y=df['GDP per capita'],data=df, color="olive")
plt.xticks(rotation=90)
plt.show()
plt.clf()
 
#Matrix plot visualizing correlation btn the data selected
data_select = df[['GDP per capita','Social support','Healthy life expectancy','Perceptions of corruption']]
print("Correlation between Data:")
print(data_select.corr())
 
#Visualize
#Change color as you want https://matplotlib.org/tutorials/colors/colormaps.html
plt.figure(figsize=(8,6), tight_layout=True)
sns.heatmap(data_select.corr(), cmap='coolwarm')
plt.title("Matrix Plot")
plt.show()
plt.clf()
 
#Let's get various relationships for the entire dataset
#Get the distribution of a single variable by hist and of two variables by scatter
plt.style.use('ggplot')
sns.pairplot(df)
plt.show()
plt.clf()

Posted: **Mon Jul 05, 2021 9:18 pm**

Here is the output:

Posted: **Tue Jul 20, 2021 6:32 am**

Related news:

https://udsm.ac.tz/web/index.php/colleg ... dift)-2021

Posted: **Fri Aug 13, 2021 7:48 am**

Here is the related analysis

Code: [Select all] [Expand/Collapse]

import pandas as pd
import matplotlib.pyplot as plt
 
data = pd.read_csv('https://raw.githubusercontent.com/mpicbg-scicomp/dlbc17-python-intro/master/data/gapminder_gdp_oceania.csv', index_col='country')
 
# Extract year from last 4 characters of each column name
# The current column names are structured as 'gdpPercap_(year)', 
# so we want to keep the (year) part only for clarity when plotting GDP vs. years
# To do this we use strip(), which removes from the string the characters stated in the argument
# This method works on strings, so we call str before strip()
 
years = data.columns.str.strip('gdpPercap_')
 
# Convert year values to integers, saving results back to dataframe
 
data.columns = years.astype(int)
 
data.loc['Australia'].plot()
plt.show()
plt.clf()
 
#Select and transform data, then plot it
data.T.plot()
plt.ylabel('GDP per capita')
plt.show()
 
#Use ggplot
 
plt.style.use('ggplot')
data.T.plot(kind='bar')
plt.ylabel('GDP per capita')
plt.show()
plt.clf()
 
#Use Matplotlib
years = data.columns
gdp_australia = data.loc['Australia']
plt.plot(years, gdp_australia, 'g--')
plt.show()
 
#Plot several datasets
# Select two countries' worth of data.
gdp_australia = data.loc['Australia']
gdp_nz = data.loc['New Zealand']
 
# Plot with differently-colored markers.
plt.plot(years, gdp_australia, 'b-', label='Australia')
plt.plot(years, gdp_nz, 'g-', label='New Zealand')
 
# Create legend.
plt.legend(loc='upper left')
plt.xlabel('Year')
plt.ylabel('GDP per capita ($)')
plt.show()
plt.clf()
 
#Plot a scatter plot correlating the GDP of Australia and New Zealand
plt.scatter(gdp_australia, gdp_nz)
plt.show()
plt.clf()
data.T.plot.scatter(x = 'Australia', y = 'New Zealand')
plt.show()
plt.clf()
 
#Plot the minimum GDP per capita over time for all the countries in Europe
#plot the maximum GDP per capita over time for Europe
data_europe = pd.read_csv('https://raw.githubusercontent.com/alistairwalsh/2016-07-13-SUT/master/data/gapminder_gdp_europe.csv', index_col='country')
data_europe.min().plot(label='min')
data_europe.max().plot(label='max')
plt.title("Min and Max GDP per capita: European countries")
plt.legend(loc='best')
plt.xticks(rotation=90)
plt.show()
plt.clf()
 
#Scatter plot showing the relationship between the minimum and maximum GDP per capita among the countries in Asia 
#for each year in the data set
data_asia = pd.read_csv('https://raw.githubusercontent.com/vanzaj/2016-06-11-ntu/gh-pages/data/gapminder_gdp_asia.csv', index_col='country')
data_asia.describe().T.plot(kind='scatter', x='min', y='max')
plt.title("Min and Max GDP per capita: Asia countries")
plt.show()
plt.clf()
#No particular correlations can be seen between the minimum and maximum gdp values year on year
 
#The variability in the maximum is much higher than that of the minimum
data_asia.max().plot()
print(data_asia.idxmax())
print(data_asia.idxmin())
plt.title("Variability in max and min GDP per capita: Asia")
plt.xticks(rotation=90)
plt.show()
#Myanmar consistently has the lowest gdp, the highest gdb nation has varied more notably
plt.clf()
 
#The correlation between GDP and life expectancy for 2007, normalizing marker size by population:
data_all = pd.read_csv('https://raw.githubusercontent.com/mpicbg-scicomp/dlbc17-python-intro/master/data/gapminder_all.csv', index_col='country')
data_all.plot(kind='scatter', x='gdpPercap_2007', y='lifeExp_2007',
              s=data_all['pop_2007']/1e6)
plt.title("Correlation btn GDP and life expectancy for 2007")
plt.show()

Ref: [1]

Posted: **Fri Aug 13, 2021 8:16 am**

We can do similar analysis with R

Code: [Select all] [Expand/Collapse]

#googlesheets4 auth
#library(googlesheets4)
#read_sheet("https://docs.google.com/spreadsheets/d/1BC48PKPZW71AC6hOn1SNesvZk1PoBjE4wXwl1YlWFpY/edit#gid=0")
#read_sheet("https://docs.google.com/spreadsheets/d/1U6Cf_qEOhiR9AZqTqS3mbMF3zt2db48ZP5v3rkrAEJY/edit#gid=780868077")
#read_sheet("1U6Cf_qEOhiR9AZqTqS3mbMF3zt2db48ZP5v3rkrAEJY")
 
library(googlesheets4)
#gs4_deauth()
gs4_deauth()
#Imagine this is the URL or ID of a Sheet readable by anyone (with a link)
ss <- "https://docs.google.com/spreadsheets/d/1U6Cf_qEOhiR9AZqTqS3mbMF3zt2db48ZP5v3rkrAEJY/edit#gid=780868077"
dat <- read_sheet(ss)
 
#By sheet ID
ss2 <- "1U6Cf_qEOhiR9AZqTqS3mbMF3zt2db48ZP5v3rkrAEJY"
dat2 <- read_sheet(ss2)

See documentation: Ref1, Ref2

See basic usage:

https://cran.r-project.org/web/packages ... usage.html

TSSFL TECHNOLOGY STACK

TSSFL Stack Live ShowCase During Sabasaba International Trade Fair In Dar Es Salaam, Tanzania

TSSFL Stack Live ShowCase During Sabasaba International Trade Fair In Dar Es Salaam, Tanzania

Re: TSSFL ODF Live ShowCase During Sabasaba International Trade Fair In Dar Es Salaam, Tanzania

Re: TSSFL ODF Live ShowCase During Sabasaba International Trade Fair In Dar Es Salaam, Tanzania

Re: TSSFL ODF Live ShowCase During Sabasaba International Trade Fair In Dar Es Salaam, Tanzania

Re: TSSFL ODF Live ShowCase During Sabasaba International Trade Fair In Dar Es Salaam, Tanzania