First Public Service Webinar: Accelerated Research Solutions with Digital Technologies

Post general but valuable information and news and information: tech and/or education-related news and announcements. Posts under this forum must strictly adhere to the Forum Rules. If you are not sure about what to post, please ask for help from forums administration. Any violation will not be tolerated!
Post Reply
User avatar
Admin
Site Admin
Senior Expert Member
Reactions: 56
Posts: 383
Joined: 10 years ago
Has thanked: 38 times
Been thanked: 32 times
Contact:

#1

Organizer: Dar Es Salaam University College of Education (www.duce.ac.tz)
Presenter: Dr Yohana Elimboto
Moderator: Dr Sixbert Mlowe
Date: 30th June 2021
Time: 2:00 PM - 2:40 PM
Platform: Zoom

Link:

https://us04web.zoom.us/j/76187059507?p ... rVVdFUWNKW
HUrdz09
Meeting ID: 761 8705 9507
Passcode: 4NhpNX

If you do not have Zoom on your device, you can join the meeting viaTSSFL ODF: integrated-zoom-video-conferencing-for-meetings-6420

Abstract: TSSFL Open Discussion Forums (TSSFL ODF) is an open-source knowledge base set of interactive forums that facilitate communications and collaborations in research to solve a wide range of real-life problems. TSSFL ODF is integrated with numerous
research tools, making it a very robust cloud platform that can be instantly deployed to provide inspiring and unprecedented cost-effective solutions in research. During this talk, Dr. Yohana will showcase how to use the TSSFL ODF integrated and the associated data science tools and data repositories, specifically Python programming language, Open Data Kit (ODK) Collect, Google drive, Google spreadsheets, Excel worksheets, DropBox, and GitHub to automate research processes from data collection to data storage, data management, data retrieval, data cleaning, data processing, data analysis, and finally, research results and reports.

Please find more information from the attached document.
0
TSSFL Stack is dedicated to empowering and accelerating teaching and learning, fostering scientific research, and promoting rapid software development and digital technologies
User avatar
Eli
Senior Expert Member
Reactions: 183
Posts: 5334
Joined: 9 years ago
Location: Tanzania
Has thanked: 75 times
Been thanked: 88 times
Contact:

#2

Live Webinar: Accelerated Research Solutions with Digital Technologies

Name: Elimboto Yohana
Institution: Dar Es Salaam University College of Education

Introduction

  • TSSFL ODF is a currently a cloud computing platform apart from being an interactive forum.
  • It is interoperable with a number of robust cloud platforms, e.g., Google.
  • TSSFL ODF eliminates time, spatial and technological constraints in research, teaching/learning and problem-solving, that's,
  • Use any internet connected device from anywhere at anytime.
  • Today we are focusing on data collection and research to see how it works!
TSSFL -- A Creative Journey Towards Infinite Possibilities!
User avatar
Eli
Senior Expert Member
Reactions: 183
Posts: 5334
Joined: 9 years ago
Location: Tanzania
Has thanked: 75 times
Been thanked: 88 times
Contact:

#3

Let's start by running few codes:


First code:

  1. import numpy as np
  2. import matplotlib.pyplot as plt
  3. from mpl_toolkits.mplot3d import Axes3D
  4. from mpl_toolkits.mplot3d import proj3d
  5.  
  6. fig = plt.figure(figsize=(10,10)) #Define figure size
  7. ax = fig.add_subplot(111, projection='3d')
  8. plt.rcParams['legend.fontsize'] = 20
  9.  
  10. np.random.seed(4294967294) #Used the random seed for consistency
  11.  
  12. mu_vec_1 = np.array([0,0,0])
  13. cov_mat_1 = np.array([[1,0,0],[0,1,0],[0,0,1]])
  14. class_1_sample = np.random.multivariate_normal(mu_vec_1, cov_mat_1, 30).T
  15. assert class_1_sample.shape == (3,30), "The matrix dimensions is not 3x30"
  16.  
  17. mu_vec_2 = np.array([1,1,1])
  18. cov_mat_2 = np.array([[1,0,0],[0,1,0],[0,0,1]])
  19. class_2_sample = np.random.multivariate_normal(mu_vec_2, cov_mat_2, 30).T
  20. assert class_2_sample.shape == (3,30), "The matrix dimensions is not 3x30"
  21.  
  22. ax.plot(class_1_sample[0,:], class_1_sample[1,:], class_1_sample[2,:], 'o', markersize=10, color='green', alpha=1.0, label='Class 1')
  23. ax.plot(class_2_sample[0,:], class_2_sample[1,:], class_2_sample[2,:], 'o', markersize=10, alpha=1.0, color='red', label='Class 2')
  24.  
  25. plt.title('Data Samples for Classes 1 & 2', y=1.04)
  26. ax.legend(loc='upper right')
  27. plt.savefig('Multivariate_distr.png', bbox_inches='tight')
  28. plt.show()


Second code:

  1. import matplotlib.pyplot as plt
  2. import seaborn as sns
  3. # Load Dataset
  4. df = sns.load_dataset('iris')
  5. # Plot
  6. plt.figure(figsize=(10,8), dpi= 80)
  7. sns.pairplot(df, kind="reg", hue="species")
  8. plt.show()


Third code:

  1. import seaborn as sns
  2. import matplotlib.pyplot as plt
  3. import pandas as pd
  4. sns.set()
  5. iris = sns.load_dataset("iris")
  6. sns.pairplot(iris, hue='species', height=2.5)
  7. plt.show()


Fourth code:

  1. import matplotlib.pyplot as plt
  2. import pandas as pd
  3. import seaborn as sns
  4. # Import Dataset
  5.  
  6. df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mtcars.csv")
  7.  
  8. # Plot
  9. plt.figure(figsize=(12,10), dpi= 80)
  10. sns.heatmap(df.corr(), xticklabels=df.corr().columns, yticklabels=df.corr().columns, cmap='RdYlGn', center=0, annot=True)
  11.  
  12. # Decorations
  13. plt.title('Correlogram of mtcars', fontsize=22)
  14. plt.xticks(fontsize=12)
  15. plt.yticks(fontsize=12)
  16. plt.show()

TSSFL -- A Creative Journey Towards Infinite Possibilities!
User avatar
Eli
Senior Expert Member
Reactions: 183
Posts: 5334
Joined: 9 years ago
Location: Tanzania
Has thanked: 75 times
Been thanked: 88 times
Contact:

#4

Data Collection

Data can be Collected using:
  • ODK Collect;
  • Google survey forms;
  • Any other means.

Data Collection with ODK Collect




Further TSSFL ODF - ODK Collect Integration for Data collection:






Data Collection with Google Survey forms:








Data Generated by Hand

TSSFL -- A Creative Journey Towards Infinite Possibilities!
User avatar
Eli
Senior Expert Member
Reactions: 183
Posts: 5334
Joined: 9 years ago
Location: Tanzania
Has thanked: 75 times
Been thanked: 88 times
Contact:

#5

Data Retrieval, Processing, Visualization, and Statistics

Let's manipulate the data with the following codes:


ODK Test Spreadsheet data:

  1. #Plot some graph
  2. #Import required libraries
  3. import gspread
  4. import urllib.request
  5. import numpy as np
  6. import matplotlib.pyplot as plt
  7. import seaborn as sns
  8. import pandas as pd
  9.  
  10. """
  11. urllib.request.urlretrieve("https://www.dropbox.com/s/mqsyfuetv8potvd/credentials.json?dl=1", "credentials.json")
  12.  
  13. gc = gspread.service_account(filename="credentials.json")
  14. sh = gc.open_by_key("1BC48PKPZW71AC6hOn1SNesvZk1PoBjE4wXwl1YlWFpY") #Open spreadsheet,
  15. #the spreadsheet ID starts with 1019ke.... between "" in the line above
  16. """
  17. #Alternative
  18. #If your file only has one sheet, replace sheet_url
  19. sheet_url = "https://docs.google.com/spreadsheets/d/1BC48PKPZW71AC6hOn1SNesvZk1PoBjE4wXwl1YlWFpY/edit#gid=0"
  20. url_1 = sheet_url.replace('/edit#gid=', '/export?format=csv&gid=')
  21.  
  22. data = pd.read_csv(url_1)
  23.  
  24. #worksheet = sh.sheet1
  25.  
  26. #Define variables
  27. var1 = "Name"
  28. var2 = "Age"
  29. var3 = "Height"
  30.  
  31. #age = worksheet.col_values(3)[1:]
  32. age = data[var2]
  33. print("Ages:", age)
  34. #height = worksheet.col_values(4)[1:]
  35. height = data[var3]
  36. print("Heights:", height)
  37.  
  38. #Pandas is extremely very useful for Google spreadsheets
  39. #Convert the json to Pandas dataframe
  40. #Get all data records as dictionary
  41. #data = worksheet.get_all_records()
  42. #df = pd.DataFrame.from_dict(data)
  43.  
  44. #Let's get some statistics
  45. #age_arr = np.array(age)
  46. #age_array = age_arr.astype(float)
  47. #h_arr = np.array(height)
  48. #h_array = h_arr.astype(float)
  49.  
  50. print("Average Age:", np.mean(age))
  51. print("Mean Height:", np.mean(height))
  52. print("Minimum and Maximum Age:", np.min(age), np.max(age))
  53. print("Minimum and Maximum Height:", np.min(height), np.max(height))
  54.  
  55. #Let's visualize
  56. #Graph styles and font size
  57. sns.set_style('darkgrid') # darkgrid, white grid, dark, white and ticks
  58. plt.rc('axes', titlesize=18)     # fontsize of the axes title
  59. plt.rc('axes', labelsize=14)    # fontsize of the x and y labels
  60. plt.rc('xtick', labelsize=13)    # fontsize of the tick labels
  61. plt.rc('ytick', labelsize=13)    # fontsize of the tick labels
  62. plt.rc('legend', fontsize=13)    # legend fontsize
  63. plt.rc('font', size=13)          # controls default text sizes
  64.  
  65. #sns list of color plettes
  66. #print(sns.color_palette('deep'), sns.color_palette("pastel"), sns.color_palette("Set2"))
  67.  
  68. #Let's Read Data from Google Sheets into Pandas without the Google Sheets API
  69. #Useful for multiple sheets
  70. sheet_id = "1BC48PKPZW71AC6hOn1SNesvZk1PoBjE4wXwl1YlWFpY"
  71. sheet_name = "Sheet1"
  72. url = f"https://docs.google.com/spreadsheets/d/{sheet_id}/gviz/tq?tqx=out:csv&sheet={sheet_name}"
  73.  
  74. #If your file only has one sheet, replace sheet_url
  75. #sheet_url = “https://docs.google.com/spreadsheets/d/1XqOtPkiE_Q0dfGSoyxrH730RkwrTczcRbDeJJpqRByQ/edit#gid=0"
  76. #url_1 = sheet_url.replace(‘/edit#gid=’, ‘/export?format=csv&gid=’)
  77.  
  78. #Get Pandas dataframe
  79. dataset = pd.read_csv(url)
  80. #print(dataset)
  81.  
  82. #Names = worksheet.col_values(2)[1:]
  83. #Names = data[var1]
  84. #print(Names)
  85.  
  86. df_names = dataset[var1]
  87. df_ages = dataset[var2]
  88. df_heights = dataset[var3]
  89. print(df_names)
  90.  
  91. #Preprocessing
  92. plots = dataset.groupby([var1], as_index=False).mean()
  93. #print(plots)
  94.  
  95. #Bar Plot in MatplotLib with plt.bar()
  96. #Names vs Age
  97. plt.figure(figsize=(10,5), tight_layout=True)
  98. colors = sns.color_palette('pastel')
  99. plt.bar(dataset[var1], dataset[var2], color=colors[:5])
  100. plt.xlabel(var1)
  101. plt.xticks(rotation=90)
  102. plt.ylabel('Age')
  103. plt.title('Barplot')
  104. plt.show()
  105.  
  106. #Name Vs Height
  107. plt.figure()
  108. plt.figure(figsize=(10,5), tight_layout=True)
  109. colors = sns.color_palette('deep')
  110. plt.bar(dataset[var1], dataset[var3], color=colors[:6])
  111. plt.xlabel(var1)
  112. plt.xticks(rotation=90)
  113. plt.ylabel('Height')
  114. plt.title('Barplot')
  115. plt.show()
  116.  
  117. #Bar Plot in Seaborn with sns.barplot()
  118. plt.figure(figsize=(10,5), tight_layout=True)
  119. ax = sns.barplot(x=dataset[var1], y=dataset[var2], palette='pastel', ci=None)
  120. ax.set(title='Barplot with Seaborn', xlabel='Names', ylabel='Age')
  121. plt.xticks(rotation=90)
  122. plt.show()
  123.  
  124. #Barplot grouped data by "n" variables
  125. plt.figure(figsize=(12, 6), tight_layout=True)
  126. ax = sns.barplot(x=dataset[var2], y=dataset[var3], hue=dataset[var1], palette='pastel')
  127. ax.set(title='Age vs Height' ,xlabel='Age', ylabel='Height')
  128. ax.legend(title='Names', title_fontsize='13', loc='upper right')
  129. plt.show()
  130.  
  131. #Histograms with plt.hist() or sns.histplot()
  132. plt.figure(figsize=(10,6), tight_layout=True)
  133. bins = [160, 165, 170, 175, 180, 185, 190, 195, 200]
  134. # matplotlib
  135. plt.hist(dataset[var3], bins=bins, color=sns.color_palette('Set2')[2], linewidth=2)
  136. plt.title('Histogram')
  137. plt.xlabel('Height (cm)')
  138. plt.ylabel('Count')
  139. # seaborn
  140. ax = sns.histplot(data=dataset, x=var3, bins=bins, color=sns.color_palette('Set2')[2], linewidth=2)
  141. ax.set(title='Histogram', xlabel='Height (cm)', ylabel='Count')
  142. plt.show()
  143.  
  144. #Boxplot
  145. plt.figure(figsize=(10,6), tight_layout=True)
  146. ax = sns.boxplot(data=dataset, x=var1, y=var2, palette='Set2', linewidth=2.5)
  147. ax.set(title='Boxplot', xlabel='Names', ylabel='Age (Years)')
  148. plt.xticks(rotation=90)
  149. plt.show()
  150.  
  151. #Scatter plot
  152. plt.figure(figsize=(10,6), tight_layout=True)
  153. ax = sns.scatterplot(data=dataset, x=var2, y=var3,   hue=var1, palette='Set2', s=60)
  154. ax.set(xlabel='Age (Years)', ylabel='Height (cm)')
  155. ax.legend(title='People', title_fontsize = 12)
  156. plt.show()
  157.  
  158. #Something else
  159. pivot = dataset.groupby([var1], as_index=False).mean()
  160. relationship = pivot.loc[:,var2:var3]
  161. print(relationship)
  162.  
  163. #Plot some graph
  164. charts = ["bar", "line", "barh", "hist", "box", "kde", "density", "area"]
  165. for chart_type in charts:
  166.     relationship.plot(kind="%s" % chart_type) #Replace bar with line, barh, hist, box, kde, density, area
  167.     plt.title("%s plot" % chart_type)
  168.     plt.show()
  169.  
  170. #Seaborn
  171. plt.figure()
  172. sns.set_style("darkgrid")
  173. sns.lineplot(data = dataset, x = var2, y = var3)
  174. plt.show()
  175.  
  176. plt.figure()
  177. sns.set_style("whitegrid")
  178. sns.lineplot(data = dataset, x = var2, y = var3)
  179. plt.show()
  180.  
  181. #Hexbin
  182. #Split the plotting window into 20 hexbins
  183. plt.figure()
  184. nbins = 20
  185. plt.title('Hexbin')
  186. plt.hexbin(dataset[var2], dataset[var3], gridsize=nbins, color=colors[:6])
  187. plt.show()
  188.  
  189. #2-D Hist
  190. plt.figure()
  191. plt.title('2-D Histogram')
  192. plt.hist2d(dataset[var2], dataset[var3], bins=nbins, color=colors[:5])
  193. plt.show()
  194.  
  195. #Set variables
  196. x = dataset[var2]
  197. y = dataset[var3]
  198. z = dataset[var1]
  199.  
  200. #Linear Regression
  201. plt.figure()
  202. sns.regplot(x = x, y = y, data=dataset);
  203. plt.show()
  204.  
  205. plt.figure()
  206. sns.jointplot(x=x, y=y, data=dataset, kind="reg");
  207. plt.show()
  208.  
  209. #Set seaborn style
  210. sns.set_style("white")
  211.  
  212. # Basic 2D density plot
  213. plt.figure()
  214. sns.kdeplot(x=x, y=y)
  215. plt.show()
  216.  
  217. # Custom the color, add shade and bandwidth
  218. plt.figure()
  219. sns.kdeplot(x=x, y=y, cmap="Reds", shade=True, bw_adjust=.5)
  220. plt.show()
  221.  
  222. # Add thresh parameter
  223. plt.figure()
  224. sns.kdeplot(x=x, y=y, cmap="Blues", shade=True, thresh=0)
  225. plt.show()
  226.  
  227. #Joint plot
  228. plt.figure()
  229. sns.jointplot(x = x,y = y,data = dataset,kind = 'hex')
  230. plt.show()



K43 Data


  1. #Plot some graph
  2. #Import required libraries
  3. import gspread
  4. import urllib.request
  5. import numpy as np
  6. import matplotlib.pyplot as plt
  7. import seaborn as sns
  8. import pandas as pd
  9.  
  10. #Alternative
  11. #If your file only has one sheet, replace sheet_url
  12. sheet_url = "https://docs.google.com/spreadsheets/d/1426rTslBl2mgggHIQnLR7WivW4xor6cp5H1Su-1SrdI/edit#gid=0"
  13. url_1 = sheet_url.replace('/edit#gid=', '/export?format=csv&gid=')
  14.  
  15. data = pd.read_csv(url_1)
  16.  
  17. #worksheet = sh.sheet1
  18.  
  19. #Define variables
  20. var1 = "data-Name"
  21. var2 = "data-Age"
  22. var3 = "data-Weight"
  23.  
  24. #age = worksheet.col_values(3)[1:]
  25. age = data[var2]
  26. print("Ages:", age)
  27. #weight = worksheet.col_values(4)[1:]
  28. weight = data[var3]
  29. print("Weights:", weight)
  30.  
  31. #Pandas is extremely very useful for Google spreadsheets
  32. #Convert the json to Pandas dataframe
  33. #Get all data records as dictionary
  34. #data = worksheet.get_all_records()
  35. #df = pd.DataFrame.from_dict(data)
  36.  
  37. #Let's get some statistics
  38. #age_arr = np.array(age)
  39. #age_array = age_arr.astype(float)
  40. #h_arr = np.array(weight)
  41. #h_array = h_arr.astype(float)
  42.  
  43. print("Average Age:", np.mean(age))
  44. print("Mean Weight:", np.mean(weight))
  45. print("Minimum and Maximum Age:", np.min(age), np.max(age))
  46. print("Minimum and Maximum Weight:", np.min(weight), np.max(weight))
  47.  
  48. #Let's visualize
  49. #Graph styles and font size
  50. sns.set_style('darkgrid') # darkgrid, white grid, dark, white and ticks
  51. plt.rc('axes', titlesize=18)     # fontsize of the axes title
  52. plt.rc('axes', labelsize=14)    # fontsize of the x and y labels
  53. plt.rc('xtick', labelsize=13)    # fontsize of the tick labels
  54. plt.rc('ytick', labelsize=13)    # fontsize of the tick labels
  55. plt.rc('legend', fontsize=13)    # legend fontsize
  56. plt.rc('font', size=13)          # controls default text sizes
  57.  
  58. #sns list of color plettes
  59. #print(sns.color_palette('deep'), sns.color_palette("pastel"), sns.color_palette("Set2"))
  60.  
  61. #Let's Read Data from Google Sheets into Pandas without the Google Sheets API
  62. #Useful for multiple sheets
  63. sheet_id = "1426rTslBl2mgggHIQnLR7WivW4xor6cp5H1Su-1SrdI"
  64. sheet_name = "Sheet1"
  65. url = f"https://docs.google.com/spreadsheets/d/{sheet_id}/gviz/tq?tqx=out:csv&sheet={sheet_name}"
  66.  
  67. #If your file only has one sheet, replace sheet_url
  68. #sheet_url = “https://docs.google.com/spreadsheets/d/1XqOtPkiE_Q0dfGSoyxrH730RkwrTczcRbDeJJpqRByQ/edit#gid=0"
  69. #url_1 = sheet_url.replace(‘/edit#gid=’, ‘/export?format=csv&gid=’)
  70.  
  71. #Get Pandas dataframe
  72. dataset = pd.read_csv(url)
  73. #print(dataset)
  74.  
  75. #Names = worksheet.col_values(2)[1:]
  76. #Names = data[var1]
  77. #print(Names)
  78.  
  79. df_names = dataset[var1]
  80. df_ages = dataset[var2]
  81. df_weights = dataset[var3]
  82. print(df_names)
  83.  
  84. #Preprocessing
  85. plots = dataset.groupby([var1], as_index=False).mean()
  86. #print(plots)
  87.  
  88. #Bar Plot in MatplotLib with plt.bar()
  89. #Names vs Age
  90. plt.figure(figsize=(10,5), tight_layout=True)
  91. colors = sns.color_palette('pastel')
  92. plt.bar(dataset[var1], dataset[var2], color=colors[:5])
  93. plt.xlabel('Name')
  94. plt.xticks(rotation=90)
  95. plt.ylabel('Age')
  96. plt.title('Barplot')
  97. plt.show()
  98.  
  99. #Name Vs Weight
  100. plt.figure()
  101. plt.figure(figsize=(10,5), tight_layout=True)
  102. colors = sns.color_palette('deep')
  103. plt.bar(dataset[var1], dataset[var3], color=colors[:6])
  104. plt.xlabel('Name')
  105. plt.xticks(rotation=90)
  106. plt.ylabel('Weight')
  107. plt.title('Barplot')
  108. plt.show()
  109.  
  110. #Bar Plot in Seaborn with sns.barplot()
  111. plt.figure(figsize=(10,5), tight_layout=True)
  112. ax = sns.barplot(x=dataset[var1], y=dataset[var2], palette='pastel', ci=None)
  113. ax.set(title='Barplot with Seaborn', xlabel='Names', ylabel='Age')
  114. plt.xticks(rotation=90)
  115. plt.show()
  116.  
  117. #Barplot grouped data by "n" variables
  118. plt.figure(figsize=(12, 6), tight_layout=True)
  119. ax = sns.barplot(x=dataset[var2], y=dataset[var3], hue=dataset[var1], palette='pastel')
  120. ax.set(title='Age vs Weight' ,xlabel='Age', ylabel='Weight')
  121. ax.legend(title='Names', title_fontsize='13', loc='upper right')
  122. plt.show()
  123.  
  124. #Histograms with plt.hist() or sns.histplot()
  125. plt.figure(figsize=(10,6), tight_layout=True)
  126. bins = [40, 50, 60, 70, 80]
  127. # matplotlib
  128. plt.hist(dataset[var3], bins=bins, color=sns.color_palette('Set2')[2], linewidth=2)
  129. plt.title('Histogram')
  130. plt.xlabel('Weight (cm)')
  131. plt.ylabel('Count')
  132. # seaborn
  133. ax = sns.histplot(data=dataset, x=var3, bins=bins, color=sns.color_palette('Set2')[2], linewidth=2)
  134. ax.set(title='Histogram', xlabel='Weight (cm)', ylabel='Count')
  135. plt.show()
  136.  
  137. #Boxplot
  138. plt.figure(figsize=(10,6), tight_layout=True)
  139. ax = sns.boxplot(data=dataset, x=var1, y=var2, palette='Set2', linewidth=2.5)
  140. ax.set(title='Boxplot', xlabel='Names', ylabel='Age (Years)')
  141. plt.xticks(rotation=90)
  142. plt.show()
  143.  
  144. #Scatter plot
  145. plt.figure(figsize=(10,6), tight_layout=True)
  146. ax = sns.scatterplot(data=dataset, x=var2, y=var3,   hue=var1, palette='Set2', s=60)
  147. ax.set(xlabel='Age (Years)', ylabel='Weight (kgs)')
  148. ax.legend(title='People', title_fontsize = 12)
  149. plt.show()
  150.  
  151. #Something else
  152. pivot = dataset.groupby([var1], as_index=False).mean()
  153. relationship = pivot.loc[:,var2:var3]
  154. print(relationship)
  155.  
  156. #Plot some graph
  157. charts = ["bar", "line", "barh", "hist", "box", "kde", "density", "area"]
  158. for chart_type in charts:
  159.     relationship.plot(kind="%s" % chart_type) #Replace bar with line, barh, hist, box, kde, density, area
  160.     plt.title("%s plot" % chart_type)
  161.     plt.show()
  162.  
  163. #Seaborn
  164. plt.figure()
  165. sns.set_style("darkgrid")
  166. sns.lineplot(data = dataset, x = var2, y = var3)
  167. plt.show()
  168.  
  169. plt.figure()
  170. sns.set_style("darkgrid")
  171. sns.lineplot(data = dataset, x = var3, y = var2)
  172. plt.show()
  173.  
  174. #replot
  175. plt.figure()
  176. sns.set_theme(style="darkgrid")
  177. sns.relplot(x=var2, y=var3, hue=var1, data=data);
  178. plt.show()
  179.  
  180. #Hexbin
  181. #Split the plotting window into 20 hexbins
  182. plt.figure()
  183. nbins = 15
  184. plt.title('Hexbin')
  185. plt.hexbin(dataset[var2], dataset[var3], gridsize=nbins, color=colors[:5])
  186. plt.show()
  187.  
  188. #Hexbin 2
  189. #Split the plotting window into 20 hexbins
  190. plt.figure()
  191. nbins = 15
  192. plt.title('Hexbin')
  193. plt.hexbin(dataset[var2], dataset[var3], gridsize=nbins, cmap=plt.cm.BuGn_r)
  194. plt.show()
  195.  
  196. #2-D Hist
  197. plt.figure()
  198. plt.title('2-D Histogram')
  199. plt.hist2d(dataset[var2], dataset[var3], bins=nbins, color=colors[:6])
  200. plt.show()
  201.  
  202. #2-D Hist 2
  203. plt.figure()
  204. plt.title('2-D Histogram')
  205. plt.hist2d(dataset[var2], dataset[var3], bins=nbins, cmap=plt.cm.BuGn_r)
  206. plt.show()
  207.  
  208. #Set variables
  209. x = dataset[var2]
  210. y = dataset[var3]
  211. z = dataset[var1]
  212.  
  213. #Linear Regression
  214. plt.figure()
  215. sns.regplot(x = x, y = y, data=dataset);
  216. plt.show()
  217.  
  218. plt.figure()
  219. sns.jointplot(x=x, y=y, data=dataset, kind="reg");
  220. plt.show()
  221.  
  222. #Set seaborn style
  223. sns.set_style("white")
  224.  
  225. # Basic 2D density plot
  226. plt.figure()
  227. sns.kdeplot(x=x, y=y)
  228. plt.show()
  229.  
  230. # Custom the color, add shade and bandwidth
  231. plt.figure()
  232. sns.kdeplot(x=x, y=y, cmap="Reds", shade=True, bw_adjust=.5)
  233. plt.show()
  234.  
  235. # Add thresh parameter
  236. plt.figure()
  237. sns.kdeplot(x=x, y=y, cmap="Blues", shade=True, thresh=0)
  238. plt.show()
  239.  
  240. #Joint plot
  241. plt.figure()
  242. sns.jointplot(x = x,y = y,data = dataset,kind = 'hex')
  243. plt.show()



Elimboto_DSK Data - Categorical

  1. #Plot some graph
  2. #Import required libraries
  3. import gspread
  4. import urllib.request
  5. import numpy as np
  6. import matplotlib.pyplot as plt
  7. import seaborn as sns
  8. import pandas as pd
  9. from statsmodels.graphics.mosaicplot import mosaic
  10. #REDCap
  11.  
  12. #Let's visualize
  13. #Graph styles and font size
  14. sns.set_style('darkgrid') # darkgrid, white grid, dark, white and ticks
  15. plt.rc('axes', titlesize=18)     # fontsize of the axes title
  16. plt.rc('axes', labelsize=14)    # fontsize of the x and y labels
  17. plt.rc('xtick', labelsize=13)    # fontsize of the tick labels
  18. plt.rc('ytick', labelsize=13)    # fontsize of the tick labels
  19. plt.rc('legend', fontsize=13)    # legend fontsize
  20. plt.rc('font', size=13)          # controls default text sizes
  21.  
  22. sheet_id = "1pm1mGdRgpitrYQiGqUNSHPdR43e-ZSXCavYr-TcqtwU"
  23. sheet_name = "Sheet1"
  24. url_1 = f"https://docs.google.com/spreadsheets/d/{sheet_id}/gviz/tq?tqx=out:csv&sheet={sheet_name}"
  25.  
  26. data = pd.read_csv(url_1)
  27.  
  28. #print(data)
  29. #Drop first row
  30. #df = data.drop(labels=0, axis=0)
  31. #df = data.drop(data.index[0])
  32. df = data[~data['Ailment cured'].isin(['HIV/AIDS'])]
  33. #df['Ailment cured'] = df['Ailment cured'].replace({'Gonorrhoea, syphilis':'Gonorrhoea & Syphilis'})
  34. df["Ailment cured"] = df['Ailment cured'].replace('Gonorrhoea, syphilis', 'Gonorrhoea & Syphilis')
  35. #print(df)
  36.  
  37. #Growth form vs Citation
  38. plt.figure(figsize=(8,5))
  39. sns.boxplot(x='Growth form',y='Citation',data=data, palette='rainbow')
  40. plt.show()
  41.  
  42. #Citation vs Growth form
  43. plt.figure(figsize=(8,5))
  44. sns.boxplot(x='Citation',y='Growth form',data=data, palette='rainbow')
  45. plt.show()
  46.  
  47. #Citation vs Growth form
  48. plt.figure(figsize=(8,5))
  49. sns.boxplot(x='Citation',y='Part used',data=data, palette='rainbow')
  50. plt.tight_layout() #figure.savefig('myplot.png', bbox_inches='tight')
  51. plt.show()
  52.  
  53. #Citation vs Ailment cured
  54. plt.figure(figsize=(10,5))
  55. sns.boxplot(x=df["Ailment cured"],y=df['Citation'],data=df, palette='rainbow')
  56. plt.xlabel("Ailment cured", labelpad=15)
  57. plt.tight_layout()
  58. plt.show()
  59.  
  60. #Swarm plot
  61. fig = plt.gcf()
  62. fig.set_size_inches(30, 30)
  63. sns.catplot(x="Citation", y="Scientific name", hue="Ailment cured", kind="swarm", data=df)
  64. plt.tight_layout()
  65. plt.show()
  66.  
  67.  
  68. #Adding hue
  69. #Citation vs Growth form
  70. plt.figure(figsize=(8,5))
  71. sns.boxplot(x='Citation',y='Growth form',data=data, hue ='Part used', palette='rainbow')
  72. plt.tight_layout()
  73. plt.show()
  74.  
  75. plt.figure(figsize=(8,5))
  76. sns.boxplot(x='Citation',y='Growth form',data=data, hue ='Ailment cured', palette='rainbow')
  77. plt.tight_layout()
  78. plt.show()
  79.  
  80. #Violin plots
  81. plt.figure(figsize=(8,6))
  82. sns.violinplot(x='Citation',y='Growth form',data=data, hue ='Part used', palette='rainbow')
  83. plt.show()
  84.  
  85. #Violin plots
  86. plt.figure(figsize=(8,6))
  87. sns.violinplot(x='Citation',y='Growth form',data=data, hue ='Ailment cured',palette='rainbow')
  88. plt.show()
  89.  
  90. #Boxen plots
  91. plt.figure(figsize=(8,6))
  92. sns.boxenplot(x='Citation',y='Growth form',data=data, hue ='Part used', palette='rainbow')
  93. plt.show()
  94.  
  95. plt.figure(figsize=(8,6))
  96. sns.boxenplot(x='Citation',y='Part used',data=data, hue ='Ailment cured', palette='rainbow')
  97. plt.tight_layout()
  98. plt.show()
  99.  
  100. #Bar plots
  101. plt.figure(figsize=(12,6))
  102. sns.barplot(x='Growth form',y='Citation',data=data, palette='rainbow', hue='Part used')
  103. plt.tight_layout()
  104. plt.show()
  105.  
  106. plt.figure(figsize=(12,6))
  107. sns.barplot(x='Ailment cured',y='Citation',data=data, palette='rainbow', hue='Part used')
  108. plt.tight_layout()
  109. plt.legend(loc=1)
  110. plt.show()
  111.  
  112. #Point plot
  113. plt.figure(figsize=(10,6))
  114. sns.pointplot(x='Citation',y='Growth form',data=data)
  115. plt.show()
  116.  
  117. plt.figure(figsize=(10,6))
  118. sns.pointplot(x='Citation',y='Growth form',data=data, hue='Part used')
  119. plt.show()
  120.  
  121. plt.figure(figsize=(10,6))
  122. sns.pointplot(x='Citation',y='Growth form',data=data, hue='Part used')
  123. plt.show()
  124.  
  125. plt.figure(figsize=(10,6))
  126. sns.pointplot(x='Citation',y='Growth form',data=data, hue='Ailment cured')
  127. plt.show()
  128.  
  129. #Count plot
  130. plt.figure(figsize=(10,6))
  131. sns.countplot(x='Growth form',data=data, palette='rainbow')
  132. plt.show()
  133.  
  134. plt.figure(figsize=(10,6))
  135. sns.countplot(x='Growth form',data=data, hue='Part used', palette='rainbow')
  136. plt.legend(loc=1)
  137. plt.show()
  138.  
  139. plt.figure(figsize=(10,6))
  140. sns.countplot(x='Growth form',data=data, hue='Ailment cured', palette='rainbow')
  141. plt.legend(loc=2)
  142. plt.show()
  143.  
  144.  
  145. #Strip plot - Categorical Scatter Plots
  146. plt.figure(figsize=(12,8))
  147. sns.stripplot(x='Citation', y='Growth form', data=data, jitter=True, hue= 'Part used', dodge=True, palette='viridis')
  148. plt.show()
  149.  
  150. #Swarm plots
  151. plt.figure(figsize=(10,6))
  152. sns.swarmplot(x='Citation', y='Ailment cured', data=data, hue='Growth form', dodge=True, palette='viridis')
  153. plt.tight_layout()
  154. plt.show()
  155.  
  156. """
  157. #Combining plots
  158. plt.figure(figsize=(12,8))
  159. sns.violinplot(x='Citation',y="Growth form", data=data, hue='Part used', dodge='True', palette='rainbow')
  160. sns.swarmplot(x='Citation',y="Growth form", data=data, hue='Part used', dodge='True', color='grey', alpha=.8, s=4)
  161. plt.show()
  162.  
  163. #Plot 2
  164. plt.figure(figsize=(12,8))
  165. sns.boxplot(x='Citation',y='Part used',hue='Growth form',data=data, palette='rainbow')
  166. sns.swarmplot(x='Citation',y='Part used',hue='Growth form', dodge=True,data=data, alpha=.8,color='grey',s=4)
  167.  
  168. #Plot 3
  169. plt.figure(figsize=(12,7))
  170. sns.barplot(x='Growth form',y='Citation',data=data, palette='rainbow', hue='Part used')
  171. sns.stripplot(x='Growth form',y="Citation",data=data, hue='Citation', dodge='True', color='grey', alpha=.8, s=2)
  172. plt.show()
  173.  
  174. #Faceting Data with Catplot
  175. #https://towardsdatascience.com/a-complete-guide-to-plotting-categorical-variables-with-seaborn-bfe54db66bec
  176. g = sns.catplot(x='Citation',y='Growth form', col = 'Local name', data=data,
  177.              kind='bar', aspect=.6, palette='Set2')
  178. (g.set_axis_labels("Class", "Survival Rate")
  179. .set_titles("{col_name}")
  180. .set(ylim=(0,1)))
  181. plt.tight_layout()
  182. plt.savefig('seaborn_catplot.png', dpi=1000)
  183. """
  184.  
  185. categorical_features = ["Growth form", "Part used", "Ailment cured", "Citation"]
  186. fig, ax = plt.subplots(1, len(categorical_features), figsize=(16,8))
  187. for i, categorical_feature in enumerate(data[categorical_features]):
  188.     data[categorical_feature].value_counts().plot(kind="bar", ax=ax[i]).set_title(categorical_feature)
  189. plt.tight_layout()
  190. plt.show()
  191.  
  192. """
  193. #print(data)
  194. #print(data['Local Name'])
  195. data['Growth form'].value_counts().plot(kind='bar')
  196. plt.show()
  197. #data['Growth form'].value_counts().plot(kind='hist')
  198.  
  199. plt.figure()
  200. from statsmodels.graphics.mosaicplot import mosaic
  201. plt.rcParams['font.size'] = 16.0
  202. mosaic(data, ['Growth form', 'Part used']);
  203. plt.show()
  204. """
  205. plt.figure()
  206. sns.barplot(x=df['Growth form'].head(3),y=df['Citation'],data=df)
  207. plt.show()



TSSFL ODF Google Survey Form responses Data:

  1. #Plot some graph
  2. #Import required libraries
  3. import gspread
  4. import urllib.request
  5. import numpy as np
  6. import matplotlib.pyplot as plt
  7. import seaborn as sns
  8. import pandas as pd
  9.  
  10. #Let's Read Data from Google Sheets into Pandas without the Google Sheets API
  11. #Useful for multiple sheets
  12. sheet_id = "1UnkRYcOhLFMgyT_CzByvupvdaD5cL5b_nCcOeoy1uy8"
  13. sheet_name = "Sheet1"
  14. url = f"https://docs.google.com/spreadsheets/d/{sheet_id}/gviz/tq?tqx=out:csv&sheet={sheet_name}"
  15.  
  16. #Get Pandas dataframe
  17. dataset = pd.read_csv(url)
  18.  
  19. #worksheet = sh.sheet1
  20. #age = worksheet.col_values(3)[1:]
  21. age = dataset["Age"]
  22. print("Ages:", age)
  23. #Height = worksheet.col_values(4)[1:]
  24. Height = dataset["Height"]
  25. print("Heights:", Height)
  26.  
  27. #Pandas is extremely very useful for Google spreadsheets
  28. #Convert the json to Pandas dataframe
  29. #Get all data records as dictionary
  30. #data = worksheet.get_all_records()
  31. #df = pd.DataFrame.from_dict(data)
  32.  
  33. #Let's get some statistics
  34. #age_arr = np.array(age)
  35. #age_array = age_arr.astype(float)
  36. #h_arr = np.array(Height)
  37. #h_array = h_arr.astype(float)
  38.  
  39. print("Average Age:", np.mean(age))
  40. print("Mean Height:", np.mean(Height))
  41. print("Minimum and Maximum Age:", np.min(age), np.max(age))
  42. print("Minimum and Maximum Height:", np.min(Height), np.max(Height))
  43.  
  44. #Let's visualize
  45. #Graph styles and font size
  46. sns.set_style('darkgrid') # darkgrid, white grid, dark, white and ticks
  47. plt.rc('axes', titlesize=18)     # fontsize of the axes title
  48. plt.rc('axes', labelsize=14)    # fontsize of the x and y labels
  49. plt.rc('xtick', labelsize=13)    # fontsize of the tick labels
  50. plt.rc('ytick', labelsize=13)    # fontsize of the tick labels
  51. plt.rc('legend', fontsize=13)    # legend fontsize
  52. plt.rc('font', size=13)          # controls default text sizes
  53.  
  54. #sns list of color plettes
  55. #print(sns.color_palette('deep'), sns.color_palette("pastel"), sns.color_palette("Set2"))
  56. #If your file only has one sheet, replace sheet_url
  57. #sheet_url = “https://docs.google.com/spreadsheets/d/1XqOtPkiE_Q0dfGSoyxrH730RkwrTczcRbDeJJpqRByQ/edit#gid=0"
  58. #url_1 = sheet_url.replace(‘/edit#gid=’, ‘/export?format=csv&gid=’)
  59.  
  60. #print(dataset)
  61.  
  62. #Names = worksheet.col_values(2)[1:]
  63. #Names = data["Name"]
  64. #print(Names)
  65.  
  66. df_names = dataset["Name"]
  67. df_ages = dataset["Age"]
  68. df_Heights = dataset["Height"]
  69. print(df_names)
  70.  
  71. #Preprocessing
  72. plots = dataset.groupby(['Name'], as_index=False).mean()
  73. #print(plots)
  74.  
  75. #Bar Plot in MatplotLib with plt.bar()
  76. #Names vs Age
  77. plt.figure(figsize=(10,5), tight_layout=True)
  78. colors = sns.color_palette('pastel')
  79. plt.bar(dataset['Name'], dataset['Age'], color=colors[:5])
  80. plt.xlabel('Name')
  81. plt.xticks(rotation=90)
  82. plt.ylabel('Age')
  83. plt.title('Barplot')
  84. plt.show()
  85.  
  86. #Name Vs Height
  87. plt.figure()
  88. plt.figure(figsize=(10,5), tight_layout=True)
  89. colors = sns.color_palette('deep')
  90. plt.bar(dataset['Name'], dataset['Height'], color=colors[:6])
  91. plt.xlabel('Name')
  92. plt.xticks(rotation=90)
  93. plt.ylabel('Height')
  94. plt.title('Barplot')
  95. plt.show()
  96.  
  97. #Bar Plot in Seaborn with sns.barplot()
  98. plt.figure(figsize=(10,5), tight_layout=True)
  99. ax = sns.barplot(x=dataset['Name'], y=dataset['Age'], palette='pastel', ci=None)
  100. ax.set(title='Barplot with Seaborn', xlabel='Names', ylabel='Age')
  101. plt.xticks(rotation=90)
  102. plt.show()
  103.  
  104. #Barplot grouped data by "n" variables
  105. plt.figure(figsize=(12, 6), tight_layout=True)
  106. ax = sns.barplot(x=dataset['Age'], y=dataset['Height'], hue=dataset['Name'], palette='pastel')
  107. ax.set(title='Age vs Height' ,xlabel='Age', ylabel='Height')
  108. ax.legend(title='Names', title_fontsize='13', loc='upper right')
  109. plt.show()
  110.  
  111. #Histograms with plt.hist() or sns.histplot()
  112. plt.figure(figsize=(10,6), tight_layout=True)
  113. bins = [40, 50, 60, 70, 80]
  114. # matplotlib
  115. plt.hist(dataset['Height'], bins=bins, color=sns.color_palette('Set2')[2], linewidth=2)
  116. plt.title('Histogram')
  117. plt.xlabel('Height (cm)')
  118. plt.ylabel('Count')
  119. # seaborn
  120. ax = sns.histplot(data=dataset, x='Height', bins=bins, color=sns.color_palette('Set2')[2], linewidth=2)
  121. ax.set(title='Histogram', xlabel='Height (cm)', ylabel='Count')
  122. plt.show()
  123.  
  124. #Boxplot
  125. plt.figure(figsize=(10,6), tight_layout=True)
  126. ax = sns.boxplot(data=dataset, x='Name', y='Age', palette='Set2', linewidth=2.5)
  127. ax.set(title='Boxplot', xlabel='Names', ylabel='Age (Years)')
  128. plt.xticks(rotation=90)
  129. plt.show()
  130.  
  131. #Scatter plot
  132. plt.figure(figsize=(10,6), tight_layout=True)
  133. ax = sns.scatterplot(data=dataset, x='Age', y='Height',   hue='Name', palette='Set2', s=60)
  134. ax.set(xlabel='Age (Years)', ylabel='Height (cms)')
  135. ax.legend(title='People', title_fontsize = 12)
  136. plt.show()
  137.  
  138. #Something else
  139. pivot = dataset.groupby(['Name'], as_index=False).mean()
  140. relationship = pivot.loc[:,"Age":"Height"]
  141. print(relationship)
  142.  
  143. #Plot some graph
  144. charts = ["bar", "line", "barh", "hist", "box", "kde", "density", "area"]
  145. for chart_type in charts:
  146.     relationship.plot(kind="%s" % chart_type) #Replace bar with line, barh, hist, box, kde, density, area
  147.     plt.title("%s plot" % chart_type)
  148.     plt.show()
  149.  
  150. #Seaborn
  151. plt.figure()
  152. sns.set_style("darkgrid")
  153. sns.lineplot(data = dataset, x = "Age", y = "Height")
  154. plt.show()
  155.  
  156. plt.figure()
  157. sns.set_style("darkgrid")
  158. sns.lineplot(data = dataset, x = "Height", y = "Age")
  159. plt.show()
  160.  
  161. #replot
  162. plt.figure()
  163. sns.set_theme(style="darkgrid")
  164. sns.relplot(x="Age", y="Height", hue="Name", data=dataset);
  165. plt.show()
  166.  
  167. #Hexbin
  168. #Split the plotting window into 20 hexbins
  169. plt.figure()
  170. nbins = 15
  171. plt.title('Hexbin')
  172. plt.hexbin(dataset["Age"], dataset["Height"], gridsize=nbins, color=colors[:5])
  173. plt.show()
  174.  
  175. #Hexbin 2
  176. #Split the plotting window into 20 hexbins
  177. plt.figure()
  178. nbins = 15
  179. plt.title('Hexbin')
  180. plt.hexbin(dataset["Age"], dataset["Height"], gridsize=nbins, cmap=plt.cm.BuGn_r)
  181. plt.show()
  182.  
  183. #2-D Hist
  184. plt.figure()
  185. plt.title('2-D Histogram')
  186. plt.hist2d(dataset["Age"], dataset["Height"], bins=nbins, color=colors[:6])
  187. plt.show()
  188.  
  189. #2-D Hist 2
  190. plt.figure()
  191. plt.title('2-D Histogram')
  192. plt.hist2d(dataset["Age"], dataset["Height"], bins=nbins, cmap=plt.cm.BuGn_r)
  193. plt.show()
  194.  
  195. #Set variables
  196. x = dataset["Age"]
  197. y = dataset["Height"]
  198. z = dataset["Name"]
  199.  
  200. #Linear Regression
  201. plt.figure()
  202. sns.regplot(x = x, y = y, data=dataset);
  203. plt.show()
  204.  
  205. plt.figure()
  206. sns.jointplot(x=x, y=y, data=dataset, kind="reg");
  207. plt.show()
  208.  
  209. #Set seaborn style
  210. sns.set_style("white")
  211.  
  212. # Basic 2D density plot
  213. plt.figure()
  214. sns.kdeplot(x=x, y=y)
  215. plt.show()
  216.  
  217. # Custom the color, add shade and bandwidth
  218. plt.figure()
  219. sns.kdeplot(x=x, y=y, cmap="Reds", shade=True, bw_adjust=.5)
  220. plt.show()
  221.  
  222. # Add thresh parameter
  223. plt.figure()
  224. sns.kdeplot(x=x, y=y, cmap="Blues", shade=True, thresh=0)
  225. plt.show()
  226.  
  227. #Joint plot
  228. plt.figure()
  229. sns.jointplot(x = x,y = y,data = dataset,kind = 'hex')
  230. plt.show()

TSSFL -- A Creative Journey Towards Infinite Possibilities!
User avatar
Eli
Senior Expert Member
Reactions: 183
Posts: 5334
Joined: 9 years ago
Location: Tanzania
Has thanked: 75 times
Been thanked: 88 times
Contact:

#6

Combining Data Collected in Various Ways


Combine Data into one Master_Sheet Spreadsheet:

  1. import gspread
  2. import urllib.request
  3. urllib.request.urlretrieve("https://www.dropbox.com/s/m728v370159b2xm/credentials.json?dl=1", "credentials.json")
  4.  
  5. gc = gspread.service_account(filename="credentials.json")
  6. #We can open spreadsheets by Name, or by url or by key (.open(), open_by_url(), open_by_key())
  7. #Let's open our spreadsheets by their keys
  8. wb1 = gc.open_by_key("1BC48PKPZW71AC6hOn1SNesvZk1PoBjE4wXwl1YlWFpY")
  9. wb2 = gc.open_by_key("1UnkRYcOhLFMgyT_CzByvupvdaD5cL5b_nCcOeoy1uy8")
  10. wb3 = gc.open_by_key("1E2WYRF9nAxtmUW57Ng05ZsH_0R5UZ0BLLaLcAy9tJzM")
  11.  
  12. #Let's rename some the headers in workbook 1
  13. ws1 = wb1.sheet1
  14. import numpy as np
  15. array = np.array([["Name", "Age", "Height", "Date"]])
  16. # Write the array to worksheet starting from the cell B1
  17. ws1.update('B1', array.tolist())
  18.  
  19. #Let's create a new sheet named Master_sheet under the spreadsheet ODK Test Spreadsheet
  20. #newsheet = wb1.add_worksheet(title="Master_Sheet", rows="1024", cols="20")
  21.  
  22. #We can create under the ODK Test Spreadsheet Google sheet copies of the Google form and manually collected data
  23. #newsheet = wb1.add_worksheet(title="From_Gsurvey_form", rows="1024", cols="20")
  24. #newsheet = wb1.add_worksheet(title="Collected_Manually", rows="1024", cols="20")
  25.  
  26. #Let's copy data from the two sheets in different spreadsheets to the two newly created sheets under ODK Test Spreadsheet
  27. ws2 = wb2.sheet1
  28. ws3 = wb3.sheet1
  29.  
  30. sheet1 = ws1.get_all_values()
  31. sheet2 = ws2.get_all_values()
  32. sheet3 = ws3.get_all_values()
  33.  
  34. #Dataframes
  35. import pandas as pd
  36. df1 = pd.DataFrame.from_records(sheet1[1:], columns=sheet1[0])
  37. df2 = pd.DataFrame.from_records(sheet2[1:], columns=sheet2[0])
  38. df3 = pd.DataFrame.from_records(sheet3[1:], columns=sheet3[0])
  39.  
  40. all_sheets = pd.concat([df1, df2, df3])
  41.  
  42. #Drop all NANs
  43. clean_df = all_sheets.fillna("")
  44. #Let's create a list of lists for the dataframes including columns
  45. combined_Data = [clean_df.columns.tolist()] + clean_df.to_numpy().tolist()
  46.  
  47. mastersheet = wb1.worksheet("Master_Sheet")
  48.  
  49. #Let's update the Master Sheet, start pasting from cell A1
  50. mastersheet.update("A1", combined_Data, value_input_option="USER_ENTERED")
  51.  
  52. import os
  53. os.remove("credentials.json")


Data extracted from paper Questionnaire



Data collected via Google Survey form



Data Collected using ODK Collect




The combined dataset is contained in the Master_Sheet of the spreadsheet above.


Visualize the Combined Data:

  1. #Plot some graph
  2. #Import required libraries
  3. import gspread
  4. import urllib.request
  5. import numpy as np
  6. import matplotlib.pyplot as plt
  7. import seaborn as sns
  8. import pandas as pd
  9.  
  10. #Let's Read Data from Google Sheets into Pandas without the Google Sheets API
  11. #Useful for multiple sheets
  12. sheet_id = "1BC48PKPZW71AC6hOn1SNesvZk1PoBjE4wXwl1YlWFpY"
  13. sheet_name = "Master_Sheet"
  14. url = f"https://docs.google.com/spreadsheets/d/{sheet_id}/gviz/tq?tqx=out:csv&sheet={sheet_name}"
  15.  
  16. #Get Pandas dataframe
  17. dataset = pd.read_csv(url)
  18.  
  19. #worksheet = sh.sheet1
  20. #age = worksheet.col_values(3)[1:]
  21. age = dataset["Age"]
  22. print("Ages:", age)
  23. #Height = worksheet.col_values(4)[1:]
  24. Height = dataset["Height"]
  25. print("Heights:", Height)
  26.  
  27. #Pandas is extremely very useful for Google spreadsheets
  28. #Convert the json to Pandas dataframe
  29. #Get all data records as dictionary
  30. #data = worksheet.get_all_records()
  31. #df = pd.DataFrame.from_dict(data)
  32.  
  33. #Let's get some statistics
  34. #age_arr = np.array(age)
  35. #age_array = age_arr.astype(float)
  36. #h_arr = np.array(Height)
  37. #h_array = h_arr.astype(float)
  38.  
  39. print("Average Age:", np.mean(age))
  40. print("Mean Height:", np.mean(Height))
  41. print("Minimum and Maximum Age:", np.min(age), np.max(age))
  42. print("Minimum and Maximum Height:", np.min(Height), np.max(Height))
  43.  
  44. #Let's visualize
  45. #Graph styles and font size
  46. sns.set_style('darkgrid') # darkgrid, white grid, dark, white and ticks
  47. plt.rc('axes', titlesize=18)     # fontsize of the axes title
  48. plt.rc('axes', labelsize=14)    # fontsize of the x and y labels
  49. plt.rc('xtick', labelsize=13)    # fontsize of the tick labels
  50. plt.rc('ytick', labelsize=13)    # fontsize of the tick labels
  51. plt.rc('legend', fontsize=13)    # legend fontsize
  52. plt.rc('font', size=13)          # controls default text sizes
  53.  
  54. #sns list of color plettes
  55. #print(sns.color_palette('deep'), sns.color_palette("pastel"), sns.color_palette("Set2"))
  56. #If your file only has one sheet, replace sheet_url
  57. #sheet_url = “https://docs.google.com/spreadsheets/d/1XqOtPkiE_Q0dfGSoyxrH730RkwrTczcRbDeJJpqRByQ/edit#gid=0"
  58. #url_1 = sheet_url.replace(‘/edit#gid=’, ‘/export?format=csv&gid=’)
  59.  
  60. #print(dataset)
  61.  
  62. #Names = worksheet.col_values(2)[1:]
  63. #Names = data["Name"]
  64. #print(Names)
  65.  
  66. df_names = dataset["Name"]
  67. df_ages = dataset["Age"]
  68. df_Heights = dataset["Height"]
  69. print(df_names)
  70.  
  71. #Preprocessing
  72. plots = dataset.groupby(['Name'], as_index=False).mean()
  73. #print(plots)
  74.  
  75. #Bar Plot in MatplotLib with plt.bar()
  76. #Names vs Age
  77. plt.figure(figsize=(10,5), tight_layout=True)
  78. colors = sns.color_palette('pastel')
  79. plt.bar(dataset['Name'], dataset['Age'], color=colors[:5])
  80. plt.xlabel('Name')
  81. plt.xticks(rotation=90)
  82. plt.ylabel('Age')
  83. plt.title('Barplot')
  84. plt.show()
  85.  
  86. #Name Vs Height
  87. plt.figure()
  88. plt.figure(figsize=(10,5), tight_layout=True)
  89. colors = sns.color_palette('deep')
  90. plt.bar(dataset['Name'], dataset['Height'], color=colors[:6])
  91. plt.xlabel('Name')
  92. plt.xticks(rotation=90)
  93. plt.ylabel('Height')
  94. plt.title('Barplot')
  95. plt.show()
  96.  
  97. #Bar Plot in Seaborn with sns.barplot()
  98. plt.figure(figsize=(10,5), tight_layout=True)
  99. ax = sns.barplot(x=dataset['Name'], y=dataset['Age'], palette='pastel', ci=None)
  100. ax.set(title='Barplot with Seaborn', xlabel='Names', ylabel='Age')
  101. plt.xticks(rotation=90)
  102. plt.show()
  103.  
  104. #Barplot grouped data by "n" variables
  105. plt.figure(figsize=(12, 6), tight_layout=True)
  106. ax = sns.barplot(x=dataset['Age'], y=dataset['Height'], hue=dataset['Name'], palette='pastel')
  107. ax.set(title='Age vs Height' ,xlabel='Age', ylabel='Height')
  108. ax.legend(title='Names', title_fontsize='13', loc='upper right')
  109. plt.show()
  110.  
  111. #Histograms with plt.hist() or sns.histplot()
  112. plt.figure(figsize=(10,6), tight_layout=True)
  113. bins = [40, 50, 60, 70, 80]
  114. # matplotlib
  115. plt.hist(dataset['Height'], bins=bins, color=sns.color_palette('Set2')[2], linewidth=2)
  116. plt.title('Histogram')
  117. plt.xlabel('Height (cm)')
  118. plt.ylabel('Count')
  119. # seaborn
  120. ax = sns.histplot(data=dataset, x='Height', bins=bins, color=sns.color_palette('Set2')[2], linewidth=2)
  121. ax.set(title='Histogram', xlabel='Height (cm)', ylabel='Count')
  122. plt.show()
  123.  
  124. #Boxplot
  125. plt.figure(figsize=(10,6), tight_layout=True)
  126. ax = sns.boxplot(data=dataset, x='Name', y='Age', palette='Set2', linewidth=2.5)
  127. ax.set(title='Boxplot', xlabel='Names', ylabel='Age (Years)')
  128. plt.xticks(rotation=90)
  129. plt.show()
  130.  
  131. #Scatter plot
  132. plt.figure(figsize=(10,6), tight_layout=True)
  133. ax = sns.scatterplot(data=dataset, x='Age', y='Height',   hue='Name', palette='Set2', s=60)
  134. ax.set(xlabel='Age (Years)', ylabel='Height (cms)')
  135. ax.legend(title='People', title_fontsize = 12)
  136. plt.show()
  137.  
  138. #Something else
  139. pivot = dataset.groupby(['Name'], as_index=False).mean()
  140. relationship = pivot.loc[:,"Age":"Height"]
  141. print(relationship)
  142.  
  143. #Plot some graph
  144. charts = ["bar", "line", "barh", "hist", "box", "kde", "density", "area"]
  145. for chart_type in charts:
  146.     relationship.plot(kind="%s" % chart_type) #Replace bar with line, barh, hist, box, kde, density, area
  147.     plt.title("%s plot" % chart_type)
  148.     plt.show()
  149.  
  150. #Seaborn
  151. plt.figure()
  152. sns.set_style("darkgrid")
  153. sns.lineplot(data = dataset, x = "Age", y = "Height")
  154. plt.show()
  155.  
  156. plt.figure()
  157. sns.set_style("darkgrid")
  158. sns.lineplot(data = dataset, x = "Height", y = "Age")
  159. plt.show()
  160.  
  161. #replot
  162. plt.figure()
  163. sns.set_theme(style="darkgrid")
  164. sns.relplot(x="Age", y="Height", hue="Name", data=dataset);
  165. plt.show()
  166.  
  167. #Hexbin
  168. #Split the plotting window into 20 hexbins
  169. plt.figure()
  170. nbins = 15
  171. plt.title('Hexbin')
  172. plt.hexbin(dataset["Age"], dataset["Height"], gridsize=nbins, color=colors[:5])
  173. plt.show()
  174.  
  175. #Hexbin 2
  176. #Split the plotting window into 20 hexbins
  177. plt.figure()
  178. nbins = 15
  179. plt.title('Hexbin')
  180. plt.hexbin(dataset["Age"], dataset["Height"], gridsize=nbins, cmap=plt.cm.BuGn_r)
  181. plt.show()
  182.  
  183. #2-D Hist
  184. plt.figure()
  185. plt.title('2-D Histogram')
  186. plt.hist2d(dataset["Age"], dataset["Height"], bins=nbins, color=colors[:6])
  187. plt.show()
  188.  
  189. #2-D Hist 2
  190. plt.figure()
  191. plt.title('2-D Histogram')
  192. plt.hist2d(dataset["Age"], dataset["Height"], bins=nbins, cmap=plt.cm.BuGn_r)
  193. plt.show()
  194.  
  195. #Set variables
  196. x = dataset["Age"]
  197. y = dataset["Height"]
  198. z = dataset["Name"]
  199.  
  200. #Linear Regression
  201. plt.figure()
  202. sns.regplot(x = x, y = y, data=dataset);
  203. plt.show()
  204.  
  205. plt.figure()
  206. sns.jointplot(x=x, y=y, data=dataset, kind="reg");
  207. plt.show()
  208.  
  209. #Set seaborn style
  210. sns.set_style("white")
  211.  
  212. # Basic 2D density plot
  213. plt.figure()
  214. sns.kdeplot(x=x, y=y)
  215. plt.show()
  216.  
  217. # Custom the color, add shade and bandwidth
  218. plt.figure()
  219. sns.kdeplot(x=x, y=y, cmap="Reds", shade=True, bw_adjust=.5)
  220. plt.show()
  221.  
  222. # Add thresh parameter
  223. plt.figure()
  224. sns.kdeplot(x=x, y=y, cmap="Blues", shade=True, thresh=0)
  225. plt.show()
  226.  
  227. #Joint plot
  228. plt.figure()
  229. sns.jointplot(x = x,y = y,data = dataset,kind = 'hex')
  230. plt.show()

TSSFL -- A Creative Journey Towards Infinite Possibilities!
User avatar
Eli
Senior Expert Member
Reactions: 183
Posts: 5334
Joined: 9 years ago
Location: Tanzania
Has thanked: 75 times
Been thanked: 88 times
Contact:

#7

Further: Here the data in Sheet1 and Sheet2 have been combined by column names to a data in the Master_Sheet:
  • Read data from several Google sheets/worksheets and create Pandas dataframe for each.
  • Combine the dataframes created in the previous step.
  • Clean data to remove unrequired values.
  • Store the data from the combined dataframe into a new Google sheet -- Master Sheet.
In contrast to the previous topics, we open our spreadsheet by URL:


  1. import gspread
  2. import urllib.request
  3. urllib.request.urlretrieve("https://www.dropbox.com/s/m728v370159b2xm/credentials.json?dl=1", "credentials.json")
  4.  
  5. gc = gspread.service_account(filename="credentials.json")
  6. wb = gc.open_by_url("https://docs.google.com/spreadsheets/d/1O19ketp1DpZwUytV_WqAlxafOBfwiCHiR-0F5eTir6k/edit#gid=0") #Open spreadsheet,
  7.  
  8. #print(wb.worksheets()) #Try wb.worksheets
  9. #print(wb.title)
  10. import pandas as pd
  11. list_of_dataframes = []
  12. for ws in wb.worksheets():
  13.     if ws.title != "Master_Sheet":
  14.         rows = ws.get_all_values()
  15.         #Create data frames
  16.         df = pd.DataFrame.from_records(rows[1:], columns=rows[0]) #Omit the first/header row, pass columns
  17.         list_of_dataframes.append(df)
  18.  
  19. #Print the first dataframe
  20. #print(list_of_dataframes[0])
  21.  
  22. combined_dataframes = pd.concat(list_of_dataframes)
  23. print(combined_dataframes)
  24. #Do some cleaning, replace NAN with blank string
  25. clean_df = combined_dataframes.fillna("") #Try combined_dataframes.dropna("")
  26.  
  27. #Let's create a list of lists for the dataframes including columns
  28. combined_Data = [clean_df.columns.tolist()] + clean_df.to_numpy().tolist()
  29.  
  30. mastersheet = wb.worksheet("Master_Sheet")
  31.  
  32. #Let's update the Master Sheet, start pasting from cell A1
  33. mastersheet.update("A1", combined_Data, value_input_option="USER_ENTERED")
  34.  
  35. import os
  36. os.remove("credentials.json")



TSSFL -- A Creative Journey Towards Infinite Possibilities!
User avatar
Eli
Senior Expert Member
Reactions: 183
Posts: 5334
Joined: 9 years ago
Location: Tanzania
Has thanked: 75 times
Been thanked: 88 times
Contact:

#8

Data Cleaning


The Python snippet below performs the following tasks:
  1. Read data from the spreadsheet by using spreadsheet ID, create a worksheet, and then create a Pandas DataFrame (df1) from the worksheet.
  2. Preprocesses data by creating the second Pandas DataFrame (df2) for which all strings named HIV/AIDS are removed from the column named "Ailment cured", similarly, it renames all strings named "Gonorrhoea, syphilis" to "Gonorrhoea & Syphilis". Note that Python is case-sensitive.
  3. It creates a new spreadsheet and names it "A New Test Spreadsheet". It also creates a worksheet (worksheet2) within the newly created spreadsheet.
  4. It updates worksheet2 by copying the Pandas DataFrame df2 into it.
  5. Finally, it sends an updated spreadsheet to an email.
  1. import gspread
  2. import urllib.request
  3. urllib.request.urlretrieve("https://www.dropbox.com/s/m728v370159b2xm/credentials.json?dl=1", "credentials.json")
  4.  
  5. gc = gspread.service_account(filename="credentials.json")
  6. sh = gc.open_by_key("1pm1mGdRgpitrYQiGqUNSHPdR43e-ZSXCavYr-TcqtwU") #Open spreadsheet by ID (please change ID)
  7. worksheet = sh.sheet1
  8.  
  9. import pandas as pd
  10.  
  11. df1 = pd.DataFrame(worksheet.get_all_records())
  12. #Preprocess Data
  13. df2 = df1[~df1['Ailment cured'].isin(['HIV/AIDS'])]
  14. #df['Ailment cured'] = df['Ailment cured'].replace({'Gonorrhoea, syphilis':'Gonorrhoea & Syphilis'})
  15. df2["Ailment cured"] = df2['Ailment cured'].replace('Gonorrhoea, syphilis', 'Gonorrhoea & Syphilis')
  16. print(df2)
  17.  
  18. #Let's create a new blank spreadsheet:
  19.  
  20. sh2 = gc.create('A New Test Spreadsheet Created During Webinar')
  21. worksheet2 = sh2.sheet1
  22.  
  23. #Let's write df2 to a new worksheet
  24. worksheet2.update([df2.columns.values.tolist()] + df2.values.tolist())
  25. #Share the new worksheet to an email:
  26. sh2.share('ey@tssfl.co', perm_type='user', role='writer')
  27.  
  28. #Finally,
  29. import os
  30. os.remove("credentials.json")
  31. #We can also combine/concatenate sheets

TSSFL -- A Creative Journey Towards Infinite Possibilities!
User avatar
Eli
Senior Expert Member
Reactions: 183
Posts: 5334
Joined: 9 years ago
Location: Tanzania
Has thanked: 75 times
Been thanked: 88 times
Contact:

#9

Report Generation - An example of Sales Data

  1. #Import the required libraries
  2. from __future__ import print_function
  3. import pandas as pd
  4. import numpy as np
  5. import urllib.request
  6. #print("This is Pandas version:", pd.__version__)
  7.  
  8. #Download and read the data into DataFrame
  9. df = urllib.request.urlretrieve("https://www.tssfl.com/download/file.php?id=1188", "sales-sheet.xlsx")
  10. df = pd.read_excel("./sales-sheet.xlsx")
  11. #print(df)
  12. #print(df.head())
  13.  
  14. #Pivot the data to summarize.
  15. sales_report = pd.pivot_table(df, index=["Manager", "Rep", "Product"], values=["Price", "Quantity"],
  16.                            aggfunc=[np.sum, np.mean], fill_value=0)
  17. #print(sales_report.head())
  18.  
  19. #Show some statistics, for example, the average quantity and price for CPU and Software sales
  20. print(df[df["Product"]=="CPU"]["Quantity"].mean())
  21. print(df[df["Product"]=="CPU"]["Price"].mean())
  22. print(df[df["Product"]=="Software"]["Quantity"].mean())
  23. print(df[df["Product"]=="Software"]["Price"].mean())
  24.  
  25. #Let's create a Jinja environment and get the report.html template
  26.  
  27. from jinja2 import Environment, FileSystemLoader
  28. env = Environment(loader=FileSystemLoader('.'))
  29. urllib.request.urlretrieve("https://www.dropbox.com/s/djpii5trdesfb4x/report.html?dl=1", "report.html")
  30. template = env.get_template("./report.html")
  31.  
  32. """env variable above shows how we pass content to the template
  33. We create a dictionary called temp_variables that
  34. contains all the variable we want to pass to the template"""
  35.  
  36. temp_variables = {"title" : "Sales Report",
  37.                  "pivot_table": sales_report.to_html()}
  38.  
  39. #Finally, let's render the HTML with the variables included in the output
  40. #This will create a string that we will eventually pass to our PDF creation engine - WeasyPrint
  41.  
  42. html_output = template.render(temp_variables)
  43.  
  44. #Generate PDF
  45. #We create a pdf by passing string to the PDF generator
  46. from weasyprint import HTML
  47. HTML(string=html_output).write_pdf("report1.pdf")
  48.  
  49. #Apply stylesheet
  50. ss = urllib.request.urlretrieve("https://www.dropbox.com/s/xd7kk9t17sjfwrr/style.css?dl=1", "style.css")
  51. HTML(string=html_output).write_pdf("report2.pdf", stylesheets=["./style.css"])
  52.  
  53. import os
  54. os.remove("sales-sheet.xlsx")
  55. os.remove("report.html")
  56. os.remove("style.css")

TSSFL -- A Creative Journey Towards Infinite Possibilities!
User avatar
Eli
Senior Expert Member
Reactions: 183
Posts: 5334
Joined: 9 years ago
Location: Tanzania
Has thanked: 75 times
Been thanked: 88 times
Contact:

#10

Some data and codes that were used during webinar are not shown here.
TSSFL -- A Creative Journey Towards Infinite Possibilities!
Post Reply
  • Similar Topics
    Replies
    Views
    Last post

Return to “News Board”

  • Information
  • Who is online

    Users browsing this forum: No registered users and 3 guests