Organizer: Dar Es Salaam University College of Education (www.duce.ac.tz)
Presenter: Dr Yohana Elimboto
Moderator: Dr Sixbert Mlowe
Date: 30th June 2021
Time: 2:00 PM - 2:40 PM
Platform: Zoom
Link:
https://us04web.zoom.us/j/76187059507?p ... rVVdFUWNKW
HUrdz09
Meeting ID: 761 8705 9507
Passcode: 4NhpNX
If you do not have Zoom on your device, you can join the meeting viaTSSFL ODF: integrated-zoom-video-conferencing-for-meetings-6420
Abstract: TSSFL Open Discussion Forums (TSSFL ODF) is an open-source knowledge base set of interactive forums that facilitate communications and collaborations in research to solve a wide range of real-life problems. TSSFL ODF is integrated with numerous
research tools, making it a very robust cloud platform that can be instantly deployed to provide inspiring and unprecedented cost-effective solutions in research. During this talk, Dr. Yohana will showcase how to use the TSSFL ODF integrated and the associated data science tools and data repositories, specifically Python programming language, Open Data Kit (ODK) Collect, Google drive, Google spreadsheets, Excel worksheets, DropBox, and GitHub to automate research processes from data collection to data storage, data management, data retrieval, data cleaning, data processing, data analysis, and finally, research results and reports.
Please find more information from the attached document.
-
- Active Topics
-
-
- by Eli 4 days ago Re: What is in Your Mind? View the latest post Replies 793 Views 548439
- by Eli 5 days ago Israel Retaliates by Striking Military Targets in Iran View the latest post Replies 4 Views 770
- by Eli 1 week ago Mosab Hassan Yousef on Palestine and PLO View the latest post Replies 1 Views 776
- by Eli 1 week ago The Stranger on the Road to Emmaus View the latest post Replies 2 Views 2954
- by Eli 1 week ago All in One: YouTube, TED, X, Facebook, Instagram, and TikTok Reels, Videos, Images and Text Posts View the latest post Replies 367 Views 229294
- by Eli 1 week ago Russia Invades Ukraine View the latest post Replies 695 Views 435326
- by Eli 1 week ago IDF Lebanon Ground Invasion View the latest post Replies 2 Views 8509
- by Eli 1 week ago How to be Successful View the latest post Replies 4 Views 71974
- by Eli 1 week ago Heavy Consumption of Social Media is now A Serious Disease View the latest post Replies 2 Views 4715
- by Eli 2 weeks ago TSSFL Participates in TAIC 2024 View the latest post Replies 1 Views 2444
-
First Public Service Webinar: Accelerated Research Solutions with TSSFL Technology Stack Digital Technologies
- Admin
- Site Admin
- Senior Expert Member
- Reactions: 56
- Posts: 384
- Joined: 10 years ago
- Has thanked: 38 times
- Been thanked: 32 times
- Contact:
0
TSSFL Stack is dedicated to empowering and accelerating teaching and learning, fostering scientific research, and promoting rapid software development and digital technologies
- Eli
- Senior Expert Member
- Reactions: 187
- Posts: 5679
- Joined: 9 years ago
- Location: Tanzania
- Has thanked: 75 times
- Been thanked: 88 times
- Contact:
Live Webinar: Accelerated Research Solutions with Digital Technologies
Name: Elimboto Yohana
Institution: Dar Es Salaam University College of Education
Introduction
Name: Elimboto Yohana
Institution: Dar Es Salaam University College of Education
Introduction
- TSSFL ODF is a currently a cloud computing platform apart from being an interactive forum.
- It is interoperable with a number of robust cloud platforms, e.g., Google.
- TSSFL ODF eliminates time, spatial and technological constraints in research, teaching/learning and problem-solving, that's,
- Use any internet connected device from anywhere at anytime.
- Today we are focusing on data collection and research to see how it works!
TSSFL -- A Creative Journey Towards Infinite Possibilities!
- Eli
- Senior Expert Member
- Reactions: 187
- Posts: 5679
- Joined: 9 years ago
- Location: Tanzania
- Has thanked: 75 times
- Been thanked: 88 times
- Contact:
Let's start by running few codes:
First code:
Second code:
Third code:
Fourth code:
First code:
- import numpy as np
- import matplotlib.pyplot as plt
- from mpl_toolkits.mplot3d import Axes3D
- from mpl_toolkits.mplot3d import proj3d
- fig = plt.figure(figsize=(10,10)) #Define figure size
- ax = fig.add_subplot(111, projection='3d')
- plt.rcParams['legend.fontsize'] = 20
- np.random.seed(4294967294) #Used the random seed for consistency
- mu_vec_1 = np.array([0,0,0])
- cov_mat_1 = np.array([[1,0,0],[0,1,0],[0,0,1]])
- class_1_sample = np.random.multivariate_normal(mu_vec_1, cov_mat_1, 30).T
- assert class_1_sample.shape == (3,30), "The matrix dimensions is not 3x30"
- mu_vec_2 = np.array([1,1,1])
- cov_mat_2 = np.array([[1,0,0],[0,1,0],[0,0,1]])
- class_2_sample = np.random.multivariate_normal(mu_vec_2, cov_mat_2, 30).T
- assert class_2_sample.shape == (3,30), "The matrix dimensions is not 3x30"
- ax.plot(class_1_sample[0,:], class_1_sample[1,:], class_1_sample[2,:], 'o', markersize=10, color='green', alpha=1.0, label='Class 1')
- ax.plot(class_2_sample[0,:], class_2_sample[1,:], class_2_sample[2,:], 'o', markersize=10, alpha=1.0, color='red', label='Class 2')
- plt.title('Data Samples for Classes 1 & 2', y=1.04)
- ax.legend(loc='upper right')
- plt.savefig('Multivariate_distr.png', bbox_inches='tight')
- plt.show()
Second code:
- import matplotlib.pyplot as plt
- import seaborn as sns
- # Load Dataset
- df = sns.load_dataset('iris')
- # Plot
- plt.figure(figsize=(10,8), dpi= 80)
- sns.pairplot(df, kind="reg", hue="species")
- plt.show()
Third code:
- import seaborn as sns
- import matplotlib.pyplot as plt
- import pandas as pd
- sns.set()
- iris = sns.load_dataset("iris")
- sns.pairplot(iris, hue='species', height=2.5)
- plt.show()
Fourth code:
- import matplotlib.pyplot as plt
- import pandas as pd
- import seaborn as sns
- # Import Dataset
- df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mtcars.csv")
- # Plot
- plt.figure(figsize=(12,10), dpi= 80)
- sns.heatmap(df.corr(), xticklabels=df.corr().columns, yticklabels=df.corr().columns, cmap='RdYlGn', center=0, annot=True)
- # Decorations
- plt.title('Correlogram of mtcars', fontsize=22)
- plt.xticks(fontsize=12)
- plt.yticks(fontsize=12)
- plt.show()
TSSFL -- A Creative Journey Towards Infinite Possibilities!
- Eli
- Senior Expert Member
- Reactions: 187
- Posts: 5679
- Joined: 9 years ago
- Location: Tanzania
- Has thanked: 75 times
- Been thanked: 88 times
- Contact:
Data Collection
Data can be Collected using:
Data Collection with ODK Collect
Further TSSFL ODF - ODK Collect Integration for Data collection:
Data Collection with Google Survey forms:
Data Generated by Hand
Data can be Collected using:
- ODK Collect;
- Google survey forms;
- Any other means.
Data Collection with ODK Collect
Further TSSFL ODF - ODK Collect Integration for Data collection:
Data Collection with Google Survey forms:
Data Generated by Hand
TSSFL -- A Creative Journey Towards Infinite Possibilities!
- Eli
- Senior Expert Member
- Reactions: 187
- Posts: 5679
- Joined: 9 years ago
- Location: Tanzania
- Has thanked: 75 times
- Been thanked: 88 times
- Contact:
Data Retrieval, Processing, Visualization, and Statistics
Let's manipulate the data with the following codes:
ODK Test Spreadsheet data:
K43 Data
Elimboto_DSK Data - Categorical
TSSFL ODF Google Survey Form responses Data:
Let's manipulate the data with the following codes:
ODK Test Spreadsheet data:
- #Plot some graph
- #Import required libraries
- import gspread
- import urllib.request
- import numpy as np
- import matplotlib.pyplot as plt
- import seaborn as sns
- import pandas as pd
- """
- urllib.request.urlretrieve("https://www.dropbox.com/s/mqsyfuetv8potvd/credentials.json?dl=1", "credentials.json")
- gc = gspread.service_account(filename="credentials.json")
- sh = gc.open_by_key("1BC48PKPZW71AC6hOn1SNesvZk1PoBjE4wXwl1YlWFpY") #Open spreadsheet,
- #the spreadsheet ID starts with 1019ke.... between "" in the line above
- """
- #Alternative
- #If your file only has one sheet, replace sheet_url
- sheet_url = "https://docs.google.com/spreadsheets/d/1BC48PKPZW71AC6hOn1SNesvZk1PoBjE4wXwl1YlWFpY/edit#gid=0"
- url_1 = sheet_url.replace('/edit#gid=', '/export?format=csv&gid=')
- data = pd.read_csv(url_1)
- #worksheet = sh.sheet1
- #Define variables
- var1 = "Name"
- var2 = "Age"
- var3 = "Height"
- #age = worksheet.col_values(3)[1:]
- age = data[var2]
- print("Ages:", age)
- #height = worksheet.col_values(4)[1:]
- height = data[var3]
- print("Heights:", height)
- #Pandas is extremely very useful for Google spreadsheets
- #Convert the json to Pandas dataframe
- #Get all data records as dictionary
- #data = worksheet.get_all_records()
- #df = pd.DataFrame.from_dict(data)
- #Let's get some statistics
- #age_arr = np.array(age)
- #age_array = age_arr.astype(float)
- #h_arr = np.array(height)
- #h_array = h_arr.astype(float)
- print("Average Age:", np.mean(age))
- print("Mean Height:", np.mean(height))
- print("Minimum and Maximum Age:", np.min(age), np.max(age))
- print("Minimum and Maximum Height:", np.min(height), np.max(height))
- #Let's visualize
- #Graph styles and font size
- sns.set_style('darkgrid') # darkgrid, white grid, dark, white and ticks
- plt.rc('axes', titlesize=18) # fontsize of the axes title
- plt.rc('axes', labelsize=14) # fontsize of the x and y labels
- plt.rc('xtick', labelsize=13) # fontsize of the tick labels
- plt.rc('ytick', labelsize=13) # fontsize of the tick labels
- plt.rc('legend', fontsize=13) # legend fontsize
- plt.rc('font', size=13) # controls default text sizes
- #sns list of color plettes
- #print(sns.color_palette('deep'), sns.color_palette("pastel"), sns.color_palette("Set2"))
- #Let's Read Data from Google Sheets into Pandas without the Google Sheets API
- #Useful for multiple sheets
- sheet_id = "1BC48PKPZW71AC6hOn1SNesvZk1PoBjE4wXwl1YlWFpY"
- sheet_name = "Sheet1"
- url = f"https://docs.google.com/spreadsheets/d/{sheet_id}/gviz/tq?tqx=out:csv&sheet={sheet_name}"
- #If your file only has one sheet, replace sheet_url
- #sheet_url = “https://docs.google.com/spreadsheets/d/1XqOtPkiE_Q0dfGSoyxrH730RkwrTczcRbDeJJpqRByQ/edit#gid=0"
- #url_1 = sheet_url.replace(‘/edit#gid=’, ‘/export?format=csv&gid=’)
- #Get Pandas dataframe
- dataset = pd.read_csv(url)
- #print(dataset)
- #Names = worksheet.col_values(2)[1:]
- #Names = data[var1]
- #print(Names)
- df_names = dataset[var1]
- df_ages = dataset[var2]
- df_heights = dataset[var3]
- print(df_names)
- #Preprocessing
- plots = dataset.groupby([var1], as_index=False).mean()
- #print(plots)
- #Bar Plot in MatplotLib with plt.bar()
- #Names vs Age
- plt.figure(figsize=(10,5), tight_layout=True)
- colors = sns.color_palette('pastel')
- plt.bar(dataset[var1], dataset[var2], color=colors[:5])
- plt.xlabel(var1)
- plt.xticks(rotation=90)
- plt.ylabel('Age')
- plt.title('Barplot')
- plt.show()
- #Name Vs Height
- plt.figure()
- plt.figure(figsize=(10,5), tight_layout=True)
- colors = sns.color_palette('deep')
- plt.bar(dataset[var1], dataset[var3], color=colors[:6])
- plt.xlabel(var1)
- plt.xticks(rotation=90)
- plt.ylabel('Height')
- plt.title('Barplot')
- plt.show()
- #Bar Plot in Seaborn with sns.barplot()
- plt.figure(figsize=(10,5), tight_layout=True)
- ax = sns.barplot(x=dataset[var1], y=dataset[var2], palette='pastel', ci=None)
- ax.set(title='Barplot with Seaborn', xlabel='Names', ylabel='Age')
- plt.xticks(rotation=90)
- plt.show()
- #Barplot grouped data by "n" variables
- plt.figure(figsize=(12, 6), tight_layout=True)
- ax = sns.barplot(x=dataset[var2], y=dataset[var3], hue=dataset[var1], palette='pastel')
- ax.set(title='Age vs Height' ,xlabel='Age', ylabel='Height')
- ax.legend(title='Names', title_fontsize='13', loc='upper right')
- plt.show()
- #Histograms with plt.hist() or sns.histplot()
- plt.figure(figsize=(10,6), tight_layout=True)
- bins = [160, 165, 170, 175, 180, 185, 190, 195, 200]
- # matplotlib
- plt.hist(dataset[var3], bins=bins, color=sns.color_palette('Set2')[2], linewidth=2)
- plt.title('Histogram')
- plt.xlabel('Height (cm)')
- plt.ylabel('Count')
- # seaborn
- ax = sns.histplot(data=dataset, x=var3, bins=bins, color=sns.color_palette('Set2')[2], linewidth=2)
- ax.set(title='Histogram', xlabel='Height (cm)', ylabel='Count')
- plt.show()
- #Boxplot
- plt.figure(figsize=(10,6), tight_layout=True)
- ax = sns.boxplot(data=dataset, x=var1, y=var2, palette='Set2', linewidth=2.5)
- ax.set(title='Boxplot', xlabel='Names', ylabel='Age (Years)')
- plt.xticks(rotation=90)
- plt.show()
- #Scatter plot
- plt.figure(figsize=(10,6), tight_layout=True)
- ax = sns.scatterplot(data=dataset, x=var2, y=var3, hue=var1, palette='Set2', s=60)
- ax.set(xlabel='Age (Years)', ylabel='Height (cm)')
- ax.legend(title='People', title_fontsize = 12)
- plt.show()
- #Something else
- pivot = dataset.groupby([var1], as_index=False).mean()
- relationship = pivot.loc[:,var2:var3]
- print(relationship)
- #Plot some graph
- charts = ["bar", "line", "barh", "hist", "box", "kde", "density", "area"]
- for chart_type in charts:
- relationship.plot(kind="%s" % chart_type) #Replace bar with line, barh, hist, box, kde, density, area
- plt.title("%s plot" % chart_type)
- plt.show()
- #Seaborn
- plt.figure()
- sns.set_style("darkgrid")
- sns.lineplot(data = dataset, x = var2, y = var3)
- plt.show()
- plt.figure()
- sns.set_style("whitegrid")
- sns.lineplot(data = dataset, x = var2, y = var3)
- plt.show()
- #Hexbin
- #Split the plotting window into 20 hexbins
- plt.figure()
- nbins = 20
- plt.title('Hexbin')
- plt.hexbin(dataset[var2], dataset[var3], gridsize=nbins, color=colors[:6])
- plt.show()
- #2-D Hist
- plt.figure()
- plt.title('2-D Histogram')
- plt.hist2d(dataset[var2], dataset[var3], bins=nbins, color=colors[:5])
- plt.show()
- #Set variables
- x = dataset[var2]
- y = dataset[var3]
- z = dataset[var1]
- #Linear Regression
- plt.figure()
- sns.regplot(x = x, y = y, data=dataset);
- plt.show()
- plt.figure()
- sns.jointplot(x=x, y=y, data=dataset, kind="reg");
- plt.show()
- #Set seaborn style
- sns.set_style("white")
- # Basic 2D density plot
- plt.figure()
- sns.kdeplot(x=x, y=y)
- plt.show()
- # Custom the color, add shade and bandwidth
- plt.figure()
- sns.kdeplot(x=x, y=y, cmap="Reds", shade=True, bw_adjust=.5)
- plt.show()
- # Add thresh parameter
- plt.figure()
- sns.kdeplot(x=x, y=y, cmap="Blues", shade=True, thresh=0)
- plt.show()
- #Joint plot
- plt.figure()
- sns.jointplot(x = x,y = y,data = dataset,kind = 'hex')
- plt.show()
K43 Data
- #Plot some graph
- #Import required libraries
- import gspread
- import urllib.request
- import numpy as np
- import matplotlib.pyplot as plt
- import seaborn as sns
- import pandas as pd
- #Alternative
- #If your file only has one sheet, replace sheet_url
- sheet_url = "https://docs.google.com/spreadsheets/d/1426rTslBl2mgggHIQnLR7WivW4xor6cp5H1Su-1SrdI/edit#gid=0"
- url_1 = sheet_url.replace('/edit#gid=', '/export?format=csv&gid=')
- data = pd.read_csv(url_1)
- #worksheet = sh.sheet1
- #Define variables
- var1 = "data-Name"
- var2 = "data-Age"
- var3 = "data-Weight"
- #age = worksheet.col_values(3)[1:]
- age = data[var2]
- print("Ages:", age)
- #weight = worksheet.col_values(4)[1:]
- weight = data[var3]
- print("Weights:", weight)
- #Pandas is extremely very useful for Google spreadsheets
- #Convert the json to Pandas dataframe
- #Get all data records as dictionary
- #data = worksheet.get_all_records()
- #df = pd.DataFrame.from_dict(data)
- #Let's get some statistics
- #age_arr = np.array(age)
- #age_array = age_arr.astype(float)
- #h_arr = np.array(weight)
- #h_array = h_arr.astype(float)
- print("Average Age:", np.mean(age))
- print("Mean Weight:", np.mean(weight))
- print("Minimum and Maximum Age:", np.min(age), np.max(age))
- print("Minimum and Maximum Weight:", np.min(weight), np.max(weight))
- #Let's visualize
- #Graph styles and font size
- sns.set_style('darkgrid') # darkgrid, white grid, dark, white and ticks
- plt.rc('axes', titlesize=18) # fontsize of the axes title
- plt.rc('axes', labelsize=14) # fontsize of the x and y labels
- plt.rc('xtick', labelsize=13) # fontsize of the tick labels
- plt.rc('ytick', labelsize=13) # fontsize of the tick labels
- plt.rc('legend', fontsize=13) # legend fontsize
- plt.rc('font', size=13) # controls default text sizes
- #sns list of color plettes
- #print(sns.color_palette('deep'), sns.color_palette("pastel"), sns.color_palette("Set2"))
- #Let's Read Data from Google Sheets into Pandas without the Google Sheets API
- #Useful for multiple sheets
- sheet_id = "1426rTslBl2mgggHIQnLR7WivW4xor6cp5H1Su-1SrdI"
- sheet_name = "Sheet1"
- url = f"https://docs.google.com/spreadsheets/d/{sheet_id}/gviz/tq?tqx=out:csv&sheet={sheet_name}"
- #If your file only has one sheet, replace sheet_url
- #sheet_url = “https://docs.google.com/spreadsheets/d/1XqOtPkiE_Q0dfGSoyxrH730RkwrTczcRbDeJJpqRByQ/edit#gid=0"
- #url_1 = sheet_url.replace(‘/edit#gid=’, ‘/export?format=csv&gid=’)
- #Get Pandas dataframe
- dataset = pd.read_csv(url)
- #print(dataset)
- #Names = worksheet.col_values(2)[1:]
- #Names = data[var1]
- #print(Names)
- df_names = dataset[var1]
- df_ages = dataset[var2]
- df_weights = dataset[var3]
- print(df_names)
- #Preprocessing
- plots = dataset.groupby([var1], as_index=False).mean()
- #print(plots)
- #Bar Plot in MatplotLib with plt.bar()
- #Names vs Age
- plt.figure(figsize=(10,5), tight_layout=True)
- colors = sns.color_palette('pastel')
- plt.bar(dataset[var1], dataset[var2], color=colors[:5])
- plt.xlabel('Name')
- plt.xticks(rotation=90)
- plt.ylabel('Age')
- plt.title('Barplot')
- plt.show()
- #Name Vs Weight
- plt.figure()
- plt.figure(figsize=(10,5), tight_layout=True)
- colors = sns.color_palette('deep')
- plt.bar(dataset[var1], dataset[var3], color=colors[:6])
- plt.xlabel('Name')
- plt.xticks(rotation=90)
- plt.ylabel('Weight')
- plt.title('Barplot')
- plt.show()
- #Bar Plot in Seaborn with sns.barplot()
- plt.figure(figsize=(10,5), tight_layout=True)
- ax = sns.barplot(x=dataset[var1], y=dataset[var2], palette='pastel', ci=None)
- ax.set(title='Barplot with Seaborn', xlabel='Names', ylabel='Age')
- plt.xticks(rotation=90)
- plt.show()
- #Barplot grouped data by "n" variables
- plt.figure(figsize=(12, 6), tight_layout=True)
- ax = sns.barplot(x=dataset[var2], y=dataset[var3], hue=dataset[var1], palette='pastel')
- ax.set(title='Age vs Weight' ,xlabel='Age', ylabel='Weight')
- ax.legend(title='Names', title_fontsize='13', loc='upper right')
- plt.show()
- #Histograms with plt.hist() or sns.histplot()
- plt.figure(figsize=(10,6), tight_layout=True)
- bins = [40, 50, 60, 70, 80]
- # matplotlib
- plt.hist(dataset[var3], bins=bins, color=sns.color_palette('Set2')[2], linewidth=2)
- plt.title('Histogram')
- plt.xlabel('Weight (cm)')
- plt.ylabel('Count')
- # seaborn
- ax = sns.histplot(data=dataset, x=var3, bins=bins, color=sns.color_palette('Set2')[2], linewidth=2)
- ax.set(title='Histogram', xlabel='Weight (cm)', ylabel='Count')
- plt.show()
- #Boxplot
- plt.figure(figsize=(10,6), tight_layout=True)
- ax = sns.boxplot(data=dataset, x=var1, y=var2, palette='Set2', linewidth=2.5)
- ax.set(title='Boxplot', xlabel='Names', ylabel='Age (Years)')
- plt.xticks(rotation=90)
- plt.show()
- #Scatter plot
- plt.figure(figsize=(10,6), tight_layout=True)
- ax = sns.scatterplot(data=dataset, x=var2, y=var3, hue=var1, palette='Set2', s=60)
- ax.set(xlabel='Age (Years)', ylabel='Weight (kgs)')
- ax.legend(title='People', title_fontsize = 12)
- plt.show()
- #Something else
- pivot = dataset.groupby([var1], as_index=False).mean()
- relationship = pivot.loc[:,var2:var3]
- print(relationship)
- #Plot some graph
- charts = ["bar", "line", "barh", "hist", "box", "kde", "density", "area"]
- for chart_type in charts:
- relationship.plot(kind="%s" % chart_type) #Replace bar with line, barh, hist, box, kde, density, area
- plt.title("%s plot" % chart_type)
- plt.show()
- #Seaborn
- plt.figure()
- sns.set_style("darkgrid")
- sns.lineplot(data = dataset, x = var2, y = var3)
- plt.show()
- plt.figure()
- sns.set_style("darkgrid")
- sns.lineplot(data = dataset, x = var3, y = var2)
- plt.show()
- #replot
- plt.figure()
- sns.set_theme(style="darkgrid")
- sns.relplot(x=var2, y=var3, hue=var1, data=data);
- plt.show()
- #Hexbin
- #Split the plotting window into 20 hexbins
- plt.figure()
- nbins = 15
- plt.title('Hexbin')
- plt.hexbin(dataset[var2], dataset[var3], gridsize=nbins, color=colors[:5])
- plt.show()
- #Hexbin 2
- #Split the plotting window into 20 hexbins
- plt.figure()
- nbins = 15
- plt.title('Hexbin')
- plt.hexbin(dataset[var2], dataset[var3], gridsize=nbins, cmap=plt.cm.BuGn_r)
- plt.show()
- #2-D Hist
- plt.figure()
- plt.title('2-D Histogram')
- plt.hist2d(dataset[var2], dataset[var3], bins=nbins, color=colors[:6])
- plt.show()
- #2-D Hist 2
- plt.figure()
- plt.title('2-D Histogram')
- plt.hist2d(dataset[var2], dataset[var3], bins=nbins, cmap=plt.cm.BuGn_r)
- plt.show()
- #Set variables
- x = dataset[var2]
- y = dataset[var3]
- z = dataset[var1]
- #Linear Regression
- plt.figure()
- sns.regplot(x = x, y = y, data=dataset);
- plt.show()
- plt.figure()
- sns.jointplot(x=x, y=y, data=dataset, kind="reg");
- plt.show()
- #Set seaborn style
- sns.set_style("white")
- # Basic 2D density plot
- plt.figure()
- sns.kdeplot(x=x, y=y)
- plt.show()
- # Custom the color, add shade and bandwidth
- plt.figure()
- sns.kdeplot(x=x, y=y, cmap="Reds", shade=True, bw_adjust=.5)
- plt.show()
- # Add thresh parameter
- plt.figure()
- sns.kdeplot(x=x, y=y, cmap="Blues", shade=True, thresh=0)
- plt.show()
- #Joint plot
- plt.figure()
- sns.jointplot(x = x,y = y,data = dataset,kind = 'hex')
- plt.show()
Elimboto_DSK Data - Categorical
- #Plot some graph
- #Import required libraries
- import gspread
- import urllib.request
- import numpy as np
- import matplotlib.pyplot as plt
- import seaborn as sns
- import pandas as pd
- from statsmodels.graphics.mosaicplot import mosaic
- #REDCap
- #Let's visualize
- #Graph styles and font size
- sns.set_style('darkgrid') # darkgrid, white grid, dark, white and ticks
- plt.rc('axes', titlesize=18) # fontsize of the axes title
- plt.rc('axes', labelsize=14) # fontsize of the x and y labels
- plt.rc('xtick', labelsize=13) # fontsize of the tick labels
- plt.rc('ytick', labelsize=13) # fontsize of the tick labels
- plt.rc('legend', fontsize=13) # legend fontsize
- plt.rc('font', size=13) # controls default text sizes
- sheet_id = "1pm1mGdRgpitrYQiGqUNSHPdR43e-ZSXCavYr-TcqtwU"
- sheet_name = "Sheet1"
- url_1 = f"https://docs.google.com/spreadsheets/d/{sheet_id}/gviz/tq?tqx=out:csv&sheet={sheet_name}"
- data = pd.read_csv(url_1)
- #print(data)
- #Drop first row
- #df = data.drop(labels=0, axis=0)
- #df = data.drop(data.index[0])
- df = data[~data['Ailment cured'].isin(['HIV/AIDS'])]
- #df['Ailment cured'] = df['Ailment cured'].replace({'Gonorrhoea, syphilis':'Gonorrhoea & Syphilis'})
- df["Ailment cured"] = df['Ailment cured'].replace('Gonorrhoea, syphilis', 'Gonorrhoea & Syphilis')
- #print(df)
- #Growth form vs Citation
- plt.figure(figsize=(8,5))
- sns.boxplot(x='Growth form',y='Citation',data=data, palette='rainbow')
- plt.show()
- #Citation vs Growth form
- plt.figure(figsize=(8,5))
- sns.boxplot(x='Citation',y='Growth form',data=data, palette='rainbow')
- plt.show()
- #Citation vs Growth form
- plt.figure(figsize=(8,5))
- sns.boxplot(x='Citation',y='Part used',data=data, palette='rainbow')
- plt.tight_layout() #figure.savefig('myplot.png', bbox_inches='tight')
- plt.show()
- #Citation vs Ailment cured
- plt.figure(figsize=(10,5))
- sns.boxplot(x=df["Ailment cured"],y=df['Citation'],data=df, palette='rainbow')
- plt.xlabel("Ailment cured", labelpad=15)
- plt.tight_layout()
- plt.show()
- #Swarm plot
- fig = plt.gcf()
- fig.set_size_inches(30, 30)
- sns.catplot(x="Citation", y="Scientific name", hue="Ailment cured", kind="swarm", data=df)
- plt.tight_layout()
- plt.show()
- #Adding hue
- #Citation vs Growth form
- plt.figure(figsize=(8,5))
- sns.boxplot(x='Citation',y='Growth form',data=data, hue ='Part used', palette='rainbow')
- plt.tight_layout()
- plt.show()
- plt.figure(figsize=(8,5))
- sns.boxplot(x='Citation',y='Growth form',data=data, hue ='Ailment cured', palette='rainbow')
- plt.tight_layout()
- plt.show()
- #Violin plots
- plt.figure(figsize=(8,6))
- sns.violinplot(x='Citation',y='Growth form',data=data, hue ='Part used', palette='rainbow')
- plt.show()
- #Violin plots
- plt.figure(figsize=(8,6))
- sns.violinplot(x='Citation',y='Growth form',data=data, hue ='Ailment cured',palette='rainbow')
- plt.show()
- #Boxen plots
- plt.figure(figsize=(8,6))
- sns.boxenplot(x='Citation',y='Growth form',data=data, hue ='Part used', palette='rainbow')
- plt.show()
- plt.figure(figsize=(8,6))
- sns.boxenplot(x='Citation',y='Part used',data=data, hue ='Ailment cured', palette='rainbow')
- plt.tight_layout()
- plt.show()
- #Bar plots
- plt.figure(figsize=(12,6))
- sns.barplot(x='Growth form',y='Citation',data=data, palette='rainbow', hue='Part used')
- plt.tight_layout()
- plt.show()
- plt.figure(figsize=(12,6))
- sns.barplot(x='Ailment cured',y='Citation',data=data, palette='rainbow', hue='Part used')
- plt.tight_layout()
- plt.legend(loc=1)
- plt.show()
- #Point plot
- plt.figure(figsize=(10,6))
- sns.pointplot(x='Citation',y='Growth form',data=data)
- plt.show()
- plt.figure(figsize=(10,6))
- sns.pointplot(x='Citation',y='Growth form',data=data, hue='Part used')
- plt.show()
- plt.figure(figsize=(10,6))
- sns.pointplot(x='Citation',y='Growth form',data=data, hue='Part used')
- plt.show()
- plt.figure(figsize=(10,6))
- sns.pointplot(x='Citation',y='Growth form',data=data, hue='Ailment cured')
- plt.show()
- #Count plot
- plt.figure(figsize=(10,6))
- sns.countplot(x='Growth form',data=data, palette='rainbow')
- plt.show()
- plt.figure(figsize=(10,6))
- sns.countplot(x='Growth form',data=data, hue='Part used', palette='rainbow')
- plt.legend(loc=1)
- plt.show()
- plt.figure(figsize=(10,6))
- sns.countplot(x='Growth form',data=data, hue='Ailment cured', palette='rainbow')
- plt.legend(loc=2)
- plt.show()
- #Strip plot - Categorical Scatter Plots
- plt.figure(figsize=(12,8))
- sns.stripplot(x='Citation', y='Growth form', data=data, jitter=True, hue= 'Part used', dodge=True, palette='viridis')
- plt.show()
- #Swarm plots
- plt.figure(figsize=(10,6))
- sns.swarmplot(x='Citation', y='Ailment cured', data=data, hue='Growth form', dodge=True, palette='viridis')
- plt.tight_layout()
- plt.show()
- """
- #Combining plots
- plt.figure(figsize=(12,8))
- sns.violinplot(x='Citation',y="Growth form", data=data, hue='Part used', dodge='True', palette='rainbow')
- sns.swarmplot(x='Citation',y="Growth form", data=data, hue='Part used', dodge='True', color='grey', alpha=.8, s=4)
- plt.show()
- #Plot 2
- plt.figure(figsize=(12,8))
- sns.boxplot(x='Citation',y='Part used',hue='Growth form',data=data, palette='rainbow')
- sns.swarmplot(x='Citation',y='Part used',hue='Growth form', dodge=True,data=data, alpha=.8,color='grey',s=4)
- #Plot 3
- plt.figure(figsize=(12,7))
- sns.barplot(x='Growth form',y='Citation',data=data, palette='rainbow', hue='Part used')
- sns.stripplot(x='Growth form',y="Citation",data=data, hue='Citation', dodge='True', color='grey', alpha=.8, s=2)
- plt.show()
- #Faceting Data with Catplot
- #https://towardsdatascience.com/a-complete-guide-to-plotting-categorical-variables-with-seaborn-bfe54db66bec
- g = sns.catplot(x='Citation',y='Growth form', col = 'Local name', data=data,
- kind='bar', aspect=.6, palette='Set2')
- (g.set_axis_labels("Class", "Survival Rate")
- .set_titles("{col_name}")
- .set(ylim=(0,1)))
- plt.tight_layout()
- plt.savefig('seaborn_catplot.png', dpi=1000)
- """
- categorical_features = ["Growth form", "Part used", "Ailment cured", "Citation"]
- fig, ax = plt.subplots(1, len(categorical_features), figsize=(16,8))
- for i, categorical_feature in enumerate(data[categorical_features]):
- data[categorical_feature].value_counts().plot(kind="bar", ax=ax[i]).set_title(categorical_feature)
- plt.tight_layout()
- plt.show()
- """
- #print(data)
- #print(data['Local Name'])
- data['Growth form'].value_counts().plot(kind='bar')
- plt.show()
- #data['Growth form'].value_counts().plot(kind='hist')
- plt.figure()
- from statsmodels.graphics.mosaicplot import mosaic
- plt.rcParams['font.size'] = 16.0
- mosaic(data, ['Growth form', 'Part used']);
- plt.show()
- """
- plt.figure()
- sns.barplot(x=df['Growth form'].head(3),y=df['Citation'],data=df)
- plt.show()
TSSFL ODF Google Survey Form responses Data:
- #Plot some graph
- #Import required libraries
- import gspread
- import urllib.request
- import numpy as np
- import matplotlib.pyplot as plt
- import seaborn as sns
- import pandas as pd
- #Let's Read Data from Google Sheets into Pandas without the Google Sheets API
- #Useful for multiple sheets
- sheet_id = "1UnkRYcOhLFMgyT_CzByvupvdaD5cL5b_nCcOeoy1uy8"
- sheet_name = "Sheet1"
- url = f"https://docs.google.com/spreadsheets/d/{sheet_id}/gviz/tq?tqx=out:csv&sheet={sheet_name}"
- #Get Pandas dataframe
- dataset = pd.read_csv(url)
- #worksheet = sh.sheet1
- #age = worksheet.col_values(3)[1:]
- age = dataset["Age"]
- print("Ages:", age)
- #Height = worksheet.col_values(4)[1:]
- Height = dataset["Height"]
- print("Heights:", Height)
- #Pandas is extremely very useful for Google spreadsheets
- #Convert the json to Pandas dataframe
- #Get all data records as dictionary
- #data = worksheet.get_all_records()
- #df = pd.DataFrame.from_dict(data)
- #Let's get some statistics
- #age_arr = np.array(age)
- #age_array = age_arr.astype(float)
- #h_arr = np.array(Height)
- #h_array = h_arr.astype(float)
- print("Average Age:", np.mean(age))
- print("Mean Height:", np.mean(Height))
- print("Minimum and Maximum Age:", np.min(age), np.max(age))
- print("Minimum and Maximum Height:", np.min(Height), np.max(Height))
- #Let's visualize
- #Graph styles and font size
- sns.set_style('darkgrid') # darkgrid, white grid, dark, white and ticks
- plt.rc('axes', titlesize=18) # fontsize of the axes title
- plt.rc('axes', labelsize=14) # fontsize of the x and y labels
- plt.rc('xtick', labelsize=13) # fontsize of the tick labels
- plt.rc('ytick', labelsize=13) # fontsize of the tick labels
- plt.rc('legend', fontsize=13) # legend fontsize
- plt.rc('font', size=13) # controls default text sizes
- #sns list of color plettes
- #print(sns.color_palette('deep'), sns.color_palette("pastel"), sns.color_palette("Set2"))
- #If your file only has one sheet, replace sheet_url
- #sheet_url = “https://docs.google.com/spreadsheets/d/1XqOtPkiE_Q0dfGSoyxrH730RkwrTczcRbDeJJpqRByQ/edit#gid=0"
- #url_1 = sheet_url.replace(‘/edit#gid=’, ‘/export?format=csv&gid=’)
- #print(dataset)
- #Names = worksheet.col_values(2)[1:]
- #Names = data["Name"]
- #print(Names)
- df_names = dataset["Name"]
- df_ages = dataset["Age"]
- df_Heights = dataset["Height"]
- print(df_names)
- #Preprocessing
- plots = dataset.groupby(['Name'], as_index=False).mean()
- #print(plots)
- #Bar Plot in MatplotLib with plt.bar()
- #Names vs Age
- plt.figure(figsize=(10,5), tight_layout=True)
- colors = sns.color_palette('pastel')
- plt.bar(dataset['Name'], dataset['Age'], color=colors[:5])
- plt.xlabel('Name')
- plt.xticks(rotation=90)
- plt.ylabel('Age')
- plt.title('Barplot')
- plt.show()
- #Name Vs Height
- plt.figure()
- plt.figure(figsize=(10,5), tight_layout=True)
- colors = sns.color_palette('deep')
- plt.bar(dataset['Name'], dataset['Height'], color=colors[:6])
- plt.xlabel('Name')
- plt.xticks(rotation=90)
- plt.ylabel('Height')
- plt.title('Barplot')
- plt.show()
- #Bar Plot in Seaborn with sns.barplot()
- plt.figure(figsize=(10,5), tight_layout=True)
- ax = sns.barplot(x=dataset['Name'], y=dataset['Age'], palette='pastel', ci=None)
- ax.set(title='Barplot with Seaborn', xlabel='Names', ylabel='Age')
- plt.xticks(rotation=90)
- plt.show()
- #Barplot grouped data by "n" variables
- plt.figure(figsize=(12, 6), tight_layout=True)
- ax = sns.barplot(x=dataset['Age'], y=dataset['Height'], hue=dataset['Name'], palette='pastel')
- ax.set(title='Age vs Height' ,xlabel='Age', ylabel='Height')
- ax.legend(title='Names', title_fontsize='13', loc='upper right')
- plt.show()
- #Histograms with plt.hist() or sns.histplot()
- plt.figure(figsize=(10,6), tight_layout=True)
- bins = [40, 50, 60, 70, 80]
- # matplotlib
- plt.hist(dataset['Height'], bins=bins, color=sns.color_palette('Set2')[2], linewidth=2)
- plt.title('Histogram')
- plt.xlabel('Height (cm)')
- plt.ylabel('Count')
- # seaborn
- ax = sns.histplot(data=dataset, x='Height', bins=bins, color=sns.color_palette('Set2')[2], linewidth=2)
- ax.set(title='Histogram', xlabel='Height (cm)', ylabel='Count')
- plt.show()
- #Boxplot
- plt.figure(figsize=(10,6), tight_layout=True)
- ax = sns.boxplot(data=dataset, x='Name', y='Age', palette='Set2', linewidth=2.5)
- ax.set(title='Boxplot', xlabel='Names', ylabel='Age (Years)')
- plt.xticks(rotation=90)
- plt.show()
- #Scatter plot
- plt.figure(figsize=(10,6), tight_layout=True)
- ax = sns.scatterplot(data=dataset, x='Age', y='Height', hue='Name', palette='Set2', s=60)
- ax.set(xlabel='Age (Years)', ylabel='Height (cms)')
- ax.legend(title='People', title_fontsize = 12)
- plt.show()
- #Something else
- pivot = dataset.groupby(['Name'], as_index=False).mean()
- relationship = pivot.loc[:,"Age":"Height"]
- print(relationship)
- #Plot some graph
- charts = ["bar", "line", "barh", "hist", "box", "kde", "density", "area"]
- for chart_type in charts:
- relationship.plot(kind="%s" % chart_type) #Replace bar with line, barh, hist, box, kde, density, area
- plt.title("%s plot" % chart_type)
- plt.show()
- #Seaborn
- plt.figure()
- sns.set_style("darkgrid")
- sns.lineplot(data = dataset, x = "Age", y = "Height")
- plt.show()
- plt.figure()
- sns.set_style("darkgrid")
- sns.lineplot(data = dataset, x = "Height", y = "Age")
- plt.show()
- #replot
- plt.figure()
- sns.set_theme(style="darkgrid")
- sns.relplot(x="Age", y="Height", hue="Name", data=dataset);
- plt.show()
- #Hexbin
- #Split the plotting window into 20 hexbins
- plt.figure()
- nbins = 15
- plt.title('Hexbin')
- plt.hexbin(dataset["Age"], dataset["Height"], gridsize=nbins, color=colors[:5])
- plt.show()
- #Hexbin 2
- #Split the plotting window into 20 hexbins
- plt.figure()
- nbins = 15
- plt.title('Hexbin')
- plt.hexbin(dataset["Age"], dataset["Height"], gridsize=nbins, cmap=plt.cm.BuGn_r)
- plt.show()
- #2-D Hist
- plt.figure()
- plt.title('2-D Histogram')
- plt.hist2d(dataset["Age"], dataset["Height"], bins=nbins, color=colors[:6])
- plt.show()
- #2-D Hist 2
- plt.figure()
- plt.title('2-D Histogram')
- plt.hist2d(dataset["Age"], dataset["Height"], bins=nbins, cmap=plt.cm.BuGn_r)
- plt.show()
- #Set variables
- x = dataset["Age"]
- y = dataset["Height"]
- z = dataset["Name"]
- #Linear Regression
- plt.figure()
- sns.regplot(x = x, y = y, data=dataset);
- plt.show()
- plt.figure()
- sns.jointplot(x=x, y=y, data=dataset, kind="reg");
- plt.show()
- #Set seaborn style
- sns.set_style("white")
- # Basic 2D density plot
- plt.figure()
- sns.kdeplot(x=x, y=y)
- plt.show()
- # Custom the color, add shade and bandwidth
- plt.figure()
- sns.kdeplot(x=x, y=y, cmap="Reds", shade=True, bw_adjust=.5)
- plt.show()
- # Add thresh parameter
- plt.figure()
- sns.kdeplot(x=x, y=y, cmap="Blues", shade=True, thresh=0)
- plt.show()
- #Joint plot
- plt.figure()
- sns.jointplot(x = x,y = y,data = dataset,kind = 'hex')
- plt.show()
TSSFL -- A Creative Journey Towards Infinite Possibilities!
- Eli
- Senior Expert Member
- Reactions: 187
- Posts: 5679
- Joined: 9 years ago
- Location: Tanzania
- Has thanked: 75 times
- Been thanked: 88 times
- Contact:
Combining Data Collected in Various Ways
Combine Data into one Master_Sheet Spreadsheet:
Data extracted from paper Questionnaire
Data collected via Google Survey form
Data Collected using ODK Collect
The combined dataset is contained in the Master_Sheet of the spreadsheet above.
Visualize the Combined Data:
Combine Data into one Master_Sheet Spreadsheet:
- import gspread
- import urllib.request
- urllib.request.urlretrieve("https://www.dropbox.com/s/m728v370159b2xm/credentials.json?dl=1", "credentials.json")
- gc = gspread.service_account(filename="credentials.json")
- #We can open spreadsheets by Name, or by url or by key (.open(), open_by_url(), open_by_key())
- #Let's open our spreadsheets by their keys
- wb1 = gc.open_by_key("1BC48PKPZW71AC6hOn1SNesvZk1PoBjE4wXwl1YlWFpY")
- wb2 = gc.open_by_key("1UnkRYcOhLFMgyT_CzByvupvdaD5cL5b_nCcOeoy1uy8")
- wb3 = gc.open_by_key("1E2WYRF9nAxtmUW57Ng05ZsH_0R5UZ0BLLaLcAy9tJzM")
- #Let's rename some the headers in workbook 1
- ws1 = wb1.sheet1
- import numpy as np
- array = np.array([["Name", "Age", "Height", "Date"]])
- # Write the array to worksheet starting from the cell B1
- ws1.update('B1', array.tolist())
- #Let's create a new sheet named Master_sheet under the spreadsheet ODK Test Spreadsheet
- #newsheet = wb1.add_worksheet(title="Master_Sheet", rows="1024", cols="20")
- #We can create under the ODK Test Spreadsheet Google sheet copies of the Google form and manually collected data
- #newsheet = wb1.add_worksheet(title="From_Gsurvey_form", rows="1024", cols="20")
- #newsheet = wb1.add_worksheet(title="Collected_Manually", rows="1024", cols="20")
- #Let's copy data from the two sheets in different spreadsheets to the two newly created sheets under ODK Test Spreadsheet
- ws2 = wb2.sheet1
- ws3 = wb3.sheet1
- sheet1 = ws1.get_all_values()
- sheet2 = ws2.get_all_values()
- sheet3 = ws3.get_all_values()
- #Dataframes
- import pandas as pd
- df1 = pd.DataFrame.from_records(sheet1[1:], columns=sheet1[0])
- df2 = pd.DataFrame.from_records(sheet2[1:], columns=sheet2[0])
- df3 = pd.DataFrame.from_records(sheet3[1:], columns=sheet3[0])
- all_sheets = pd.concat([df1, df2, df3])
- #Drop all NANs
- clean_df = all_sheets.fillna("")
- #Let's create a list of lists for the dataframes including columns
- combined_Data = [clean_df.columns.tolist()] + clean_df.to_numpy().tolist()
- mastersheet = wb1.worksheet("Master_Sheet")
- #Let's update the Master Sheet, start pasting from cell A1
- mastersheet.update("A1", combined_Data, value_input_option="USER_ENTERED")
- import os
- os.remove("credentials.json")
Data extracted from paper Questionnaire
Data collected via Google Survey form
Data Collected using ODK Collect
The combined dataset is contained in the Master_Sheet of the spreadsheet above.
Visualize the Combined Data:
- #Plot some graph
- #Import required libraries
- import gspread
- import urllib.request
- import numpy as np
- import matplotlib.pyplot as plt
- import seaborn as sns
- import pandas as pd
- #Let's Read Data from Google Sheets into Pandas without the Google Sheets API
- #Useful for multiple sheets
- sheet_id = "1BC48PKPZW71AC6hOn1SNesvZk1PoBjE4wXwl1YlWFpY"
- sheet_name = "Master_Sheet"
- url = f"https://docs.google.com/spreadsheets/d/{sheet_id}/gviz/tq?tqx=out:csv&sheet={sheet_name}"
- #Get Pandas dataframe
- dataset = pd.read_csv(url)
- #worksheet = sh.sheet1
- #age = worksheet.col_values(3)[1:]
- age = dataset["Age"]
- print("Ages:", age)
- #Height = worksheet.col_values(4)[1:]
- Height = dataset["Height"]
- print("Heights:", Height)
- #Pandas is extremely very useful for Google spreadsheets
- #Convert the json to Pandas dataframe
- #Get all data records as dictionary
- #data = worksheet.get_all_records()
- #df = pd.DataFrame.from_dict(data)
- #Let's get some statistics
- #age_arr = np.array(age)
- #age_array = age_arr.astype(float)
- #h_arr = np.array(Height)
- #h_array = h_arr.astype(float)
- print("Average Age:", np.mean(age))
- print("Mean Height:", np.mean(Height))
- print("Minimum and Maximum Age:", np.min(age), np.max(age))
- print("Minimum and Maximum Height:", np.min(Height), np.max(Height))
- #Let's visualize
- #Graph styles and font size
- sns.set_style('darkgrid') # darkgrid, white grid, dark, white and ticks
- plt.rc('axes', titlesize=18) # fontsize of the axes title
- plt.rc('axes', labelsize=14) # fontsize of the x and y labels
- plt.rc('xtick', labelsize=13) # fontsize of the tick labels
- plt.rc('ytick', labelsize=13) # fontsize of the tick labels
- plt.rc('legend', fontsize=13) # legend fontsize
- plt.rc('font', size=13) # controls default text sizes
- #sns list of color plettes
- #print(sns.color_palette('deep'), sns.color_palette("pastel"), sns.color_palette("Set2"))
- #If your file only has one sheet, replace sheet_url
- #sheet_url = “https://docs.google.com/spreadsheets/d/1XqOtPkiE_Q0dfGSoyxrH730RkwrTczcRbDeJJpqRByQ/edit#gid=0"
- #url_1 = sheet_url.replace(‘/edit#gid=’, ‘/export?format=csv&gid=’)
- #print(dataset)
- #Names = worksheet.col_values(2)[1:]
- #Names = data["Name"]
- #print(Names)
- df_names = dataset["Name"]
- df_ages = dataset["Age"]
- df_Heights = dataset["Height"]
- print(df_names)
- #Preprocessing
- plots = dataset.groupby(['Name'], as_index=False).mean()
- #print(plots)
- #Bar Plot in MatplotLib with plt.bar()
- #Names vs Age
- plt.figure(figsize=(10,5), tight_layout=True)
- colors = sns.color_palette('pastel')
- plt.bar(dataset['Name'], dataset['Age'], color=colors[:5])
- plt.xlabel('Name')
- plt.xticks(rotation=90)
- plt.ylabel('Age')
- plt.title('Barplot')
- plt.show()
- #Name Vs Height
- plt.figure()
- plt.figure(figsize=(10,5), tight_layout=True)
- colors = sns.color_palette('deep')
- plt.bar(dataset['Name'], dataset['Height'], color=colors[:6])
- plt.xlabel('Name')
- plt.xticks(rotation=90)
- plt.ylabel('Height')
- plt.title('Barplot')
- plt.show()
- #Bar Plot in Seaborn with sns.barplot()
- plt.figure(figsize=(10,5), tight_layout=True)
- ax = sns.barplot(x=dataset['Name'], y=dataset['Age'], palette='pastel', ci=None)
- ax.set(title='Barplot with Seaborn', xlabel='Names', ylabel='Age')
- plt.xticks(rotation=90)
- plt.show()
- #Barplot grouped data by "n" variables
- plt.figure(figsize=(12, 6), tight_layout=True)
- ax = sns.barplot(x=dataset['Age'], y=dataset['Height'], hue=dataset['Name'], palette='pastel')
- ax.set(title='Age vs Height' ,xlabel='Age', ylabel='Height')
- ax.legend(title='Names', title_fontsize='13', loc='upper right')
- plt.show()
- #Histograms with plt.hist() or sns.histplot()
- plt.figure(figsize=(10,6), tight_layout=True)
- bins = [40, 50, 60, 70, 80]
- # matplotlib
- plt.hist(dataset['Height'], bins=bins, color=sns.color_palette('Set2')[2], linewidth=2)
- plt.title('Histogram')
- plt.xlabel('Height (cm)')
- plt.ylabel('Count')
- # seaborn
- ax = sns.histplot(data=dataset, x='Height', bins=bins, color=sns.color_palette('Set2')[2], linewidth=2)
- ax.set(title='Histogram', xlabel='Height (cm)', ylabel='Count')
- plt.show()
- #Boxplot
- plt.figure(figsize=(10,6), tight_layout=True)
- ax = sns.boxplot(data=dataset, x='Name', y='Age', palette='Set2', linewidth=2.5)
- ax.set(title='Boxplot', xlabel='Names', ylabel='Age (Years)')
- plt.xticks(rotation=90)
- plt.show()
- #Scatter plot
- plt.figure(figsize=(10,6), tight_layout=True)
- ax = sns.scatterplot(data=dataset, x='Age', y='Height', hue='Name', palette='Set2', s=60)
- ax.set(xlabel='Age (Years)', ylabel='Height (cms)')
- ax.legend(title='People', title_fontsize = 12)
- plt.show()
- #Something else
- pivot = dataset.groupby(['Name'], as_index=False).mean()
- relationship = pivot.loc[:,"Age":"Height"]
- print(relationship)
- #Plot some graph
- charts = ["bar", "line", "barh", "hist", "box", "kde", "density", "area"]
- for chart_type in charts:
- relationship.plot(kind="%s" % chart_type) #Replace bar with line, barh, hist, box, kde, density, area
- plt.title("%s plot" % chart_type)
- plt.show()
- #Seaborn
- plt.figure()
- sns.set_style("darkgrid")
- sns.lineplot(data = dataset, x = "Age", y = "Height")
- plt.show()
- plt.figure()
- sns.set_style("darkgrid")
- sns.lineplot(data = dataset, x = "Height", y = "Age")
- plt.show()
- #replot
- plt.figure()
- sns.set_theme(style="darkgrid")
- sns.relplot(x="Age", y="Height", hue="Name", data=dataset);
- plt.show()
- #Hexbin
- #Split the plotting window into 20 hexbins
- plt.figure()
- nbins = 15
- plt.title('Hexbin')
- plt.hexbin(dataset["Age"], dataset["Height"], gridsize=nbins, color=colors[:5])
- plt.show()
- #Hexbin 2
- #Split the plotting window into 20 hexbins
- plt.figure()
- nbins = 15
- plt.title('Hexbin')
- plt.hexbin(dataset["Age"], dataset["Height"], gridsize=nbins, cmap=plt.cm.BuGn_r)
- plt.show()
- #2-D Hist
- plt.figure()
- plt.title('2-D Histogram')
- plt.hist2d(dataset["Age"], dataset["Height"], bins=nbins, color=colors[:6])
- plt.show()
- #2-D Hist 2
- plt.figure()
- plt.title('2-D Histogram')
- plt.hist2d(dataset["Age"], dataset["Height"], bins=nbins, cmap=plt.cm.BuGn_r)
- plt.show()
- #Set variables
- x = dataset["Age"]
- y = dataset["Height"]
- z = dataset["Name"]
- #Linear Regression
- plt.figure()
- sns.regplot(x = x, y = y, data=dataset);
- plt.show()
- plt.figure()
- sns.jointplot(x=x, y=y, data=dataset, kind="reg");
- plt.show()
- #Set seaborn style
- sns.set_style("white")
- # Basic 2D density plot
- plt.figure()
- sns.kdeplot(x=x, y=y)
- plt.show()
- # Custom the color, add shade and bandwidth
- plt.figure()
- sns.kdeplot(x=x, y=y, cmap="Reds", shade=True, bw_adjust=.5)
- plt.show()
- # Add thresh parameter
- plt.figure()
- sns.kdeplot(x=x, y=y, cmap="Blues", shade=True, thresh=0)
- plt.show()
- #Joint plot
- plt.figure()
- sns.jointplot(x = x,y = y,data = dataset,kind = 'hex')
- plt.show()
TSSFL -- A Creative Journey Towards Infinite Possibilities!
- Eli
- Senior Expert Member
- Reactions: 187
- Posts: 5679
- Joined: 9 years ago
- Location: Tanzania
- Has thanked: 75 times
- Been thanked: 88 times
- Contact:
Further: Here the data in Sheet1 and Sheet2 have been combined by column names to a data in the Master_Sheet:
- Read data from several Google sheets/worksheets and create Pandas dataframe for each.
- Combine the dataframes created in the previous step.
- Clean data to remove unrequired values.
- Store the data from the combined dataframe into a new Google sheet -- Master Sheet.
- import gspread
- import urllib.request
- urllib.request.urlretrieve("https://www.dropbox.com/s/m728v370159b2xm/credentials.json?dl=1", "credentials.json")
- gc = gspread.service_account(filename="credentials.json")
- wb = gc.open_by_url("https://docs.google.com/spreadsheets/d/1O19ketp1DpZwUytV_WqAlxafOBfwiCHiR-0F5eTir6k/edit#gid=0") #Open spreadsheet,
- #print(wb.worksheets()) #Try wb.worksheets
- #print(wb.title)
- import pandas as pd
- list_of_dataframes = []
- for ws in wb.worksheets():
- if ws.title != "Master_Sheet":
- rows = ws.get_all_values()
- #Create data frames
- df = pd.DataFrame.from_records(rows[1:], columns=rows[0]) #Omit the first/header row, pass columns
- list_of_dataframes.append(df)
- #Print the first dataframe
- #print(list_of_dataframes[0])
- combined_dataframes = pd.concat(list_of_dataframes)
- print(combined_dataframes)
- #Do some cleaning, replace NAN with blank string
- clean_df = combined_dataframes.fillna("") #Try combined_dataframes.dropna("")
- #Let's create a list of lists for the dataframes including columns
- combined_Data = [clean_df.columns.tolist()] + clean_df.to_numpy().tolist()
- mastersheet = wb.worksheet("Master_Sheet")
- #Let's update the Master Sheet, start pasting from cell A1
- mastersheet.update("A1", combined_Data, value_input_option="USER_ENTERED")
- import os
- os.remove("credentials.json")
TSSFL -- A Creative Journey Towards Infinite Possibilities!
- Eli
- Senior Expert Member
- Reactions: 187
- Posts: 5679
- Joined: 9 years ago
- Location: Tanzania
- Has thanked: 75 times
- Been thanked: 88 times
- Contact:
Data Cleaning
The Python snippet below performs the following tasks:
The Python snippet below performs the following tasks:
- Read data from the spreadsheet by using spreadsheet ID, create a worksheet, and then create a Pandas DataFrame (df1) from the worksheet.
- Preprocesses data by creating the second Pandas DataFrame (df2) for which all strings named HIV/AIDS are removed from the column named "Ailment cured", similarly, it renames all strings named "Gonorrhoea, syphilis" to "Gonorrhoea & Syphilis". Note that Python is case-sensitive.
- It creates a new spreadsheet and names it "A New Test Spreadsheet". It also creates a worksheet (worksheet2) within the newly created spreadsheet.
- It updates worksheet2 by copying the Pandas DataFrame df2 into it.
- Finally, it sends an updated spreadsheet to an email.
- import gspread
- import urllib.request
- urllib.request.urlretrieve("https://www.dropbox.com/s/m728v370159b2xm/credentials.json?dl=1", "credentials.json")
- gc = gspread.service_account(filename="credentials.json")
- sh = gc.open_by_key("1pm1mGdRgpitrYQiGqUNSHPdR43e-ZSXCavYr-TcqtwU") #Open spreadsheet by ID (please change ID)
- worksheet = sh.sheet1
- import pandas as pd
- df1 = pd.DataFrame(worksheet.get_all_records())
- #Preprocess Data
- df2 = df1[~df1['Ailment cured'].isin(['HIV/AIDS'])]
- #df['Ailment cured'] = df['Ailment cured'].replace({'Gonorrhoea, syphilis':'Gonorrhoea & Syphilis'})
- df2["Ailment cured"] = df2['Ailment cured'].replace('Gonorrhoea, syphilis', 'Gonorrhoea & Syphilis')
- print(df2)
- #Let's create a new blank spreadsheet:
- sh2 = gc.create('A New Test Spreadsheet Created During Webinar')
- worksheet2 = sh2.sheet1
- #Let's write df2 to a new worksheet
- worksheet2.update([df2.columns.values.tolist()] + df2.values.tolist())
- #Share the new worksheet to an email:
- sh2.share('ey@tssfl.co', perm_type='user', role='writer')
- #Finally,
- import os
- os.remove("credentials.json")
- #We can also combine/concatenate sheets
TSSFL -- A Creative Journey Towards Infinite Possibilities!
- Eli
- Senior Expert Member
- Reactions: 187
- Posts: 5679
- Joined: 9 years ago
- Location: Tanzania
- Has thanked: 75 times
- Been thanked: 88 times
- Contact:
Report Generation - An example of Sales Data
- #Import the required libraries
- from __future__ import print_function
- import pandas as pd
- import numpy as np
- import urllib.request
- #print("This is Pandas version:", pd.__version__)
- #Download and read the data into DataFrame
- df = urllib.request.urlretrieve("https://www.tssfl.com/download/file.php?id=1188", "sales-sheet.xlsx")
- df = pd.read_excel("./sales-sheet.xlsx")
- #print(df)
- #print(df.head())
- #Pivot the data to summarize.
- sales_report = pd.pivot_table(df, index=["Manager", "Rep", "Product"], values=["Price", "Quantity"],
- aggfunc=[np.sum, np.mean], fill_value=0)
- #print(sales_report.head())
- #Show some statistics, for example, the average quantity and price for CPU and Software sales
- print(df[df["Product"]=="CPU"]["Quantity"].mean())
- print(df[df["Product"]=="CPU"]["Price"].mean())
- print(df[df["Product"]=="Software"]["Quantity"].mean())
- print(df[df["Product"]=="Software"]["Price"].mean())
- #Let's create a Jinja environment and get the report.html template
- from jinja2 import Environment, FileSystemLoader
- env = Environment(loader=FileSystemLoader('.'))
- urllib.request.urlretrieve("https://www.dropbox.com/s/djpii5trdesfb4x/report.html?dl=1", "report.html")
- template = env.get_template("./report.html")
- """env variable above shows how we pass content to the template
- We create a dictionary called temp_variables that
- contains all the variable we want to pass to the template"""
- temp_variables = {"title" : "Sales Report",
- "pivot_table": sales_report.to_html()}
- #Finally, let's render the HTML with the variables included in the output
- #This will create a string that we will eventually pass to our PDF creation engine - WeasyPrint
- html_output = template.render(temp_variables)
- #Generate PDF
- #We create a pdf by passing string to the PDF generator
- from weasyprint import HTML
- HTML(string=html_output).write_pdf("report1.pdf")
- #Apply stylesheet
- ss = urllib.request.urlretrieve("https://www.dropbox.com/s/xd7kk9t17sjfwrr/style.css?dl=1", "style.css")
- HTML(string=html_output).write_pdf("report2.pdf", stylesheets=["./style.css"])
- import os
- os.remove("sales-sheet.xlsx")
- os.remove("report.html")
- os.remove("style.css")
TSSFL -- A Creative Journey Towards Infinite Possibilities!
-
- Similar Topics
- Replies
- Views
- Last post
-
- Information
-
Who is online
Users browsing this forum: No registered users and 0 guests