TSSFL Stack, ODK Collect, and Google Drive Integration to Collect, Store, Manage, Process and Analyze Data

User avatar
Eli
Senior Expert Member
Reactions: 183
Posts: 5330
Joined: 9 years ago
Location: Tanzania
Has thanked: 75 times
Been thanked: 88 times
Contact:

#1

The Open Data Kit (ODK) is a free, open-source suite of tools that allows data collection using Android mobile devices and data submission to an online server, even without an Internet connection or mobile communication services at the time of data collection. ODK Collect replaces the traditional paper forms with electronic forms that allow text, numeric data, GPS, photo, video, barcodes, and audio uploads to an online server. The Open Data Kit has become a standard software that helps organizations and authors/researchers to collect and manage mobile data collection solutions. ODK is mentioned by Google among tools that create new knowledge, raise awareness, or enable people to take action to change the world.

This is a summary of the integration of a versatile set of tools and systems, featuring the TSSFL Stack, the Open Data Kit (ODK) Collect, and Google Drive to collect, store, manage, process, and analyze data. All these three architectures are designed for teams and collaborations. ODK Collect allows multiple team members to collect data with their android phones, at different times, paces, and locations, and yet the collected data is sent to the same Google spreadsheet for storage and management. From Google spreadsheet, the data is then programmatically acquired by TSSFL ODF for processing and analysis. TSSFL ODF does this while offering various collaboration options among the team members in question. Traditionally, ODK mainly worked with KoBoToolbox - a suite of tools for field data collection for use in challenging environments. However, the main collection application used by the KoBoToolbox is built/compatible with the ODK ecosystem. This means any form built for/by ODK Collect should also work for KoBoToolbox and vice-versa, read more.

The whole process is streamlined as follows:

1. Creating a form for data collection and submissions. The form is created using ODK Build at http://build.opendatakit.org/:

Image

2. Linking to Google Drive and hosting the survey form we built with ODK Build via XML in Google Drive so that the project team is able to download it to their Android phones. This step includes creating a Google spreadsheet that is placed where the completed survey responses from the project team will be sent and stored:
  • Go to Edit - > Form Properties
  • Fill the Title on Device, Instance Name and Public Key (All these are optional)
  • Copy and Paste the URL of the Google Spreadsheet that will collect data in the place of Submission URL
  • Click Done
  • Next go to File -> Export to XML and Download the form
  • Place the form in the same folder in the Google Drive as the Google Spreadsheet that will collect data
3. Installing & Configuring ODK Collect (from Google Play Store) or updating it to the latest version. Configuring includes uploading the form (stored in Google Drive) created using ODK Build into ODK Collect:

Image
Image

4. Collecting data using ODK Collect:

Image
Image

5. Sending the collected data to Google Spreadsheet in Google Drive:

Image
Image

6. Viewing the collected data stored in the spreadsheet (see below).

7. Integrating the spreadsheet into TSSFL Stack for Open Science and Collaborations between teams. This includes embedding the spreadsheet and enabling communication between TSSFL Stack and Google spreadsheets via Google Python APIs (Google Sheets API v4 and Google Drive API):



As the survey continues and more data is collected and submitted to Google sheet via ODK Collect, this spreadsheet will automatically update to contain the latest information.

8. Read, Process and Analyze data over TSSFL ODF with Python, and automate various tasks, see

- Automating the Google Spreadsheet Tasks with Python and TSSFL Stack

- Automate Multiple Excel Sheets and Produce Reports Using Python

- Automate Reports with Python and Pandas, Save the Output to HTML

- How to Use Python Pandas Pivot Table for Data Presentation and Analysis

- How to Generate PDF Reports with Pandas, Jinja and WeasyPrint

The sample code below can read and print data submitted to Google spreadsheet using ODK Collect, it can be extended to do extremely useful analyses using this data:


  1. #Plot some graph
  2. #Import required libraries
  3. import gspread
  4. import urllib.request
  5. import numpy as np
  6. import matplotlib.pyplot as plt
  7. import seaborn as sns
  8. import pandas as pd
  9.  
  10. """
  11. urllib.request.urlretrieve("https://www.dropbox.com/s/mqsyfuetv8potvd/credentials.json?dl=1", "credentials.json")
  12.  
  13. gc = gspread.service_account(filename="credentials.json")
  14. sh = gc.open_by_key("1BC48PKPZW71AC6hOn1SNesvZk1PoBjE4wXwl1YlWFpY") #Open spreadsheet,
  15. #the spreadsheet ID starts with 1019ke.... between "" in the line above
  16. """
  17. #Alternative
  18. #If your file only has one sheet, replace sheet_url
  19. #sheet_url = "https://docs.google.com/spreadsheets/d/1BC48PKPZW71AC6hOn1SNesvZk1PoBjE4wXwl1YlWFpY/edit#gid=0"
  20. #url_1 = sheet_url.replace('/edit#gid=', '/export?format=csv&gid=')
  21.  
  22. #Use the first two lines for a single sheet -- but the method is very slow for slow connection
  23. sheet_id = "1BC48PKPZW71AC6hOn1SNesvZk1PoBjE4wXwl1YlWFpY"
  24. #url = "https://docs.google.com/spreadsheets/export?id={}&exportFormat=csv".format(sheet_id)
  25. sheet_name = "Sheet1"
  26. url = f"https://docs.google.com/spreadsheets/d/{sheet_id}/gviz/tq?tqx=out:csv&sheet={sheet_name}"
  27.  
  28. data = pd.read_csv(url)
  29.  
  30. #worksheet = sh.sheet1
  31.  
  32. #Define variables
  33. var1 = "Name"
  34. var2 = "Age"
  35. var3 = "Height"
  36.  
  37. #age = worksheet.col_values(3)[1:]
  38. age = data[var2]
  39. print("Ages:", age)
  40. #height = worksheet.col_values(4)[1:]
  41. height = data[var3]
  42. print("Heights:", height)
  43.  
  44. #Pandas is extremely very useful for Google spreadsheets
  45. #Convert the json to Pandas dataframe
  46. #Get all data records as dictionary
  47. #data = worksheet.get_all_records()
  48. #df = pd.DataFrame.from_dict(data)
  49.  
  50. #Let's get some statistics
  51. #age_arr = np.array(age)
  52. #age_array = age_arr.astype(float)
  53. #h_arr = np.array(height)
  54. #h_array = h_arr.astype(float)
  55.  
  56. print("Average Age:", np.mean(age))
  57. print("Mean Height:", np.mean(height))
  58. print("Minimum and Maximum Age:", np.min(age), np.max(age))
  59. print("Minimum and Maximum Height:", np.min(height), np.max(height))
  60.  
  61. #Let's visualize
  62. #Graph styles and font size
  63. sns.set_style('darkgrid') # darkgrid, white grid, dark, white and ticks
  64. plt.rc('axes', titlesize=18)     # fontsize of the axes title
  65. plt.rc('axes', labelsize=14)    # fontsize of the x and y labels
  66. plt.rc('xtick', labelsize=13)    # fontsize of the tick labels
  67. plt.rc('ytick', labelsize=13)    # fontsize of the tick labels
  68. plt.rc('legend', fontsize=13)    # legend fontsize
  69. plt.rc('font', size=13)          # controls default text sizes
  70.  
  71. #sns list of color plettes
  72. #print(sns.color_palette('deep'), sns.color_palette("pastel"), sns.color_palette("Set2"))
  73.  
  74. #Let's Read Data from Google Sheets into Pandas without the Google Sheets API
  75. #Useful for multiple sheets
  76. #sheet_id = "1BC48PKPZW71AC6hOn1SNesvZk1PoBjE4wXwl1YlWFpY"
  77. #sheet_name = "Sheet1"
  78. #url = f"https://docs.google.com/spreadsheets/d/{sheet_id}/gviz/tq?tqx=out:csv&sheet={sheet_name}"
  79.  
  80. #If your file only has one sheet, replace sheet_url
  81. #sheet_url = “https://docs.google.com/spreadsheets/d/1XqOtPkiE_Q0dfGSoyxrH730RkwrTczcRbDeJJpqRByQ/edit#gid=0"
  82. #url_1 = sheet_url.replace(‘/edit#gid=’, ‘/export?format=csv&gid=’)
  83.  
  84. #Get Pandas dataframe
  85. dataset = pd.read_csv(url)
  86. #print(dataset)
  87.  
  88. #Names = worksheet.col_values(2)[1:]
  89. #Names = data[var1]
  90. #print(Names)
  91.  
  92. df_names = dataset[var1]
  93. df_ages = dataset[var2]
  94. df_heights = dataset[var3]
  95. print(df_names)
  96.  
  97. #Preprocessing
  98. plots = dataset.groupby([var1], as_index=False).mean()
  99. #print(plots)
  100.  
  101. #Bar Plot in MatplotLib with plt.bar()
  102. #Names vs Age
  103. plt.figure(figsize=(10,5), tight_layout=True)
  104. colors = sns.color_palette('pastel')
  105. plt.bar(dataset[var1], dataset[var2], color=colors[:5])
  106. plt.xlabel(var1)
  107. plt.xticks(rotation=90)
  108. plt.ylabel('Age')
  109. plt.title('Barplot')
  110. plt.show()
  111.  
  112. #Name Vs Height
  113. plt.figure()
  114. plt.figure(figsize=(10,5), tight_layout=True)
  115. colors = sns.color_palette('deep')
  116. plt.bar(dataset[var1], dataset[var3], color=colors[:6])
  117. plt.xlabel(var1)
  118. plt.xticks(rotation=90)
  119. plt.ylabel('Height')
  120. plt.title('Barplot')
  121. plt.show()
  122.  
  123. #Bar Plot in Seaborn with sns.barplot()
  124. plt.figure(figsize=(10,5), tight_layout=True)
  125. ax = sns.barplot(x=dataset[var1], y=dataset[var2], palette='pastel', ci=None)
  126. ax.set(title='Barplot with Seaborn', xlabel='Names', ylabel='Age')
  127. plt.xticks(rotation=90)
  128. plt.show()
  129.  
  130. #Barplot grouped data by "n" variables
  131. plt.figure(figsize=(12, 6), tight_layout=True)
  132. ax = sns.barplot(x=dataset[var2], y=dataset[var3], hue=dataset[var1], palette='pastel')
  133. ax.set(title='Age vs Height' ,xlabel='Age', ylabel='Height')
  134. ax.legend(title='Names', title_fontsize='13', loc='upper right')
  135. plt.show()
  136.  
  137. #Histograms with plt.hist() or sns.histplot()
  138. plt.figure(figsize=(10,6), tight_layout=True)
  139. bins = [160, 165, 170, 175, 180, 185, 190, 195, 200]
  140. # matplotlib
  141. plt.hist(dataset[var3], bins=bins, color=sns.color_palette('Set2')[2], linewidth=2)
  142. plt.title('Histogram')
  143. plt.xlabel('Height (cm)')
  144. plt.ylabel('Count')
  145. # seaborn
  146. ax = sns.histplot(data=dataset, x=var3, bins=bins, color=sns.color_palette('Set2')[2], linewidth=2)
  147. ax.set(title='Histogram', xlabel='Height (cm)', ylabel='Count')
  148. plt.show()
  149.  
  150. #Boxplot
  151. plt.figure(figsize=(10,6), tight_layout=True)
  152. ax = sns.boxplot(data=dataset, x=var1, y=var2, palette='Set2', linewidth=2.5)
  153. ax.set(title='Boxplot', xlabel='Names', ylabel='Age (Years)')
  154. plt.xticks(rotation=90)
  155. plt.show()
  156.  
  157. #Scatter plot
  158. plt.figure(figsize=(10,6), tight_layout=True)
  159. ax = sns.scatterplot(data=dataset, x=var2, y=var3,   hue=var1, palette='Set2', s=60)
  160. ax.set(xlabel='Age (Years)', ylabel='Height (cm)')
  161. ax.legend(title='People', title_fontsize = 12)
  162. plt.show()
  163.  
  164. #Something else
  165. pivot = dataset.groupby([var1], as_index=False).mean()
  166. relationship = pivot.loc[:,var2:var3]
  167. print(relationship)
  168.  
  169. #Plot some graph
  170. charts = ["bar", "line", "barh", "hist", "box", "kde", "density", "area"]
  171. for chart_type in charts:
  172.     relationship.plot(kind="%s" % chart_type) #Replace bar with line, barh, hist, box, kde, density, area
  173.     plt.title("%s plot" % chart_type)
  174.     plt.show()
  175.  
  176. #Seaborn
  177. plt.figure()
  178. sns.set_style("darkgrid")
  179. sns.lineplot(data = dataset, x = var2, y = var3)
  180. plt.show()
  181.  
  182. plt.figure()
  183. sns.set_style("whitegrid")
  184. sns.lineplot(data = dataset, x = var2, y = var3)
  185. plt.show()
  186.  
  187. #Hexbin
  188. #Split the plotting window into 20 hexbins
  189. plt.figure()
  190. nbins = 20
  191. plt.title('Hexbin')
  192. plt.hexbin(dataset[var2], dataset[var3], gridsize=nbins, color=colors[:6])
  193. plt.show()
  194.  
  195. #2-D Hist
  196. plt.figure()
  197. plt.title('2-D Histogram')
  198. plt.hist2d(dataset[var2], dataset[var3], bins=nbins, color=colors[:5])
  199. plt.show()
  200.  
  201. #Set variables
  202. x = dataset[var2]
  203. y = dataset[var3]
  204. z = dataset[var1]
  205.  
  206. #Linear Regression
  207. plt.figure()
  208. sns.regplot(x = x, y = y, data=dataset);
  209. plt.show()
  210.  
  211. plt.figure()
  212. sns.jointplot(x=x, y=y, data=dataset, kind="reg");
  213. plt.show()
  214.  
  215. #Set seaborn style
  216. sns.set_style("white")
  217.  
  218. # Basic 2D density plot
  219. plt.figure()
  220. sns.kdeplot(x=x, y=y)
  221. plt.show()
  222.  
  223. # Custom the color, add shade and bandwidth
  224. plt.figure()
  225. sns.kdeplot(x=x, y=y, cmap="Reds", shade=True, bw_adjust=.5)
  226. plt.show()
  227.  
  228. # Add thresh parameter
  229. plt.figure()
  230. sns.kdeplot(x=x, y=y, cmap="Blues", shade=True, thresh=0)
  231. plt.show()
  232.  
  233. #Joint plot
  234. plt.figure()
  235. sns.jointplot(x = x,y = y,data = dataset,kind = 'hex')
  236. plt.show()


TSSFL ODF is integrated with various Data Science Tools and Toolboxes for performing almost any data-related task.

Find detailed information regarding ODK Google Drive integration here.

ODK Central's real-time data feed for dashboards, integrations and more:

Attachments
Data_Collection_Form.png
ODK2.jpeg
ODK3.jpeg
ODK1.jpeg
ODK4.jpeg
ODK6.jpeg
ODK5.jpeg
0
TSSFL -- A Creative Journey Towards Infinite Possibilities!
User avatar
Eli
Senior Expert Member
Reactions: 183
Posts: 5330
Joined: 9 years ago
Location: Tanzania
Has thanked: 75 times
Been thanked: 88 times
Contact:

#2

0
TSSFL -- A Creative Journey Towards Infinite Possibilities!
User avatar
Eli
Senior Expert Member
Reactions: 183
Posts: 5330
Joined: 9 years ago
Location: Tanzania
Has thanked: 75 times
Been thanked: 88 times
Contact:

#3

Further Tests with TSSFL ODF - ODK Integration


0
TSSFL -- A Creative Journey Towards Infinite Possibilities!
User avatar
Eli
Senior Expert Member
Reactions: 183
Posts: 5330
Joined: 9 years ago
Location: Tanzania
Has thanked: 75 times
Been thanked: 88 times
Contact:

#4

Here is the code for the latter data:

  1. #Plot some graph
  2. #Import required libraries
  3. import gspread
  4. import urllib.request
  5. import numpy as np
  6. import matplotlib.pyplot as plt
  7. import seaborn as sns
  8. import pandas as pd
  9.  
  10. sheet_id = "1426rTslBl2mgggHIQnLR7WivW4xor6cp5H1Su-1SrdI"
  11. sheet_name = "Sheet1"
  12. url = f"https://docs.google.com/spreadsheets/d/{sheet_id}/gviz/tq?tqx=out:csv&sheet={sheet_name}"
  13.  
  14. #For a single sheet, use this
  15. #sheet_id = "1426rTslBl2mgggHIQnLR7WivW4xor6cp5H1Su-1SrdI"
  16. #url = "https://docs.google.com/spreadsheets/export?id={}&exportFormat=csv".format(sheet_id)
  17.  
  18. #Or
  19. #If your file only has one sheet, replace sheet_url
  20. #sheet_url = "https://docs.google.com/spreadsheets/d/1426rTslBl2mgggHIQnLR7WivW4xor6cp5H1Su-1SrdI/edit#gid=0"
  21. #url = sheet_url.replace('/edit#gid=', '/export?format=csv&gid=')
  22.  
  23. data = pd.read_csv(url)
  24.  
  25. #worksheet = sh.sheet1
  26.  
  27. #Define variables
  28. var1 = "data-Name"
  29. var2 = "data-Age"
  30. var3 = "data-Weight"
  31.  
  32. #age = worksheet.col_values(3)[1:]
  33. age = data[var2]
  34. print("Ages:", age)
  35. #weight = worksheet.col_values(4)[1:]
  36. weight = data[var3]
  37. print("Weights:", weight)
  38.  
  39. #Pandas is extremely very useful for Google spreadsheets
  40. #Convert the json to Pandas dataframe
  41. #Get all data records as dictionary
  42. #data = worksheet.get_all_records()
  43. #df = pd.DataFrame.from_dict(data)
  44.  
  45. #Let's get some statistics
  46. #age_arr = np.array(age)
  47. #age_array = age_arr.astype(float)
  48. #h_arr = np.array(weight)
  49. #h_array = h_arr.astype(float)
  50.  
  51. print("Average Age:", np.mean(age))
  52. print("Mean Weight:", np.mean(weight))
  53. print("Minimum and Maximum Age:", np.min(age), np.max(age))
  54. print("Minimum and Maximum Weight:", np.min(weight), np.max(weight))
  55.  
  56. #Let's visualize
  57. #Graph styles and font size
  58. sns.set_style('darkgrid') # darkgrid, white grid, dark, white and ticks
  59. plt.rc('axes', titlesize=18)     # fontsize of the axes title
  60. plt.rc('axes', labelsize=14)    # fontsize of the x and y labels
  61. plt.rc('xtick', labelsize=13)    # fontsize of the tick labels
  62. plt.rc('ytick', labelsize=13)    # fontsize of the tick labels
  63. plt.rc('legend', fontsize=13)    # legend fontsize
  64. plt.rc('font', size=13)          # controls default text sizes
  65.  
  66. #sns list of color plettes
  67. #print(sns.color_palette('deep'), sns.color_palette("pastel"), sns.color_palette("Set2"))
  68.  
  69. #Let's Read Data from Google Sheets into Pandas without the Google Sheets API
  70. #Useful for multiple sheets
  71. #sheet_id = "1426rTslBl2mgggHIQnLR7WivW4xor6cp5H1Su-1SrdI"
  72. #sheet_name = "Sheet1"
  73. #url = f"https://docs.google.com/spreadsheets/d/{sheet_id}/gviz/tq?tqx=out:csv&sheet={sheet_name}"
  74.  
  75. #If your file only has one sheet, replace sheet_url
  76. #sheet_url = “https://docs.google.com/spreadsheets/d/1XqOtPkiE_Q0dfGSoyxrH730RkwrTczcRbDeJJpqRByQ/edit#gid=0"
  77. #url_1 = sheet_url.replace(‘/edit#gid=’, ‘/export?format=csv&gid=’)
  78.  
  79. #Get Pandas dataframe
  80. dataset = pd.read_csv(url)
  81. #print(dataset)
  82.  
  83. #Names = worksheet.col_values(2)[1:]
  84. #Names = data[var1]
  85. #print(Names)
  86.  
  87. df_names = dataset[var1]
  88. df_ages = dataset[var2]
  89. df_weights = dataset[var3]
  90. print(df_names)
  91.  
  92. #Preprocessing
  93. plots = dataset.groupby([var1], as_index=False).mean(numeric_only=True)
  94. #print(plots)
  95.  
  96. #Bar Plot in MatplotLib with plt.bar()
  97. #Names vs Age
  98. plt.figure(figsize=(10,5), tight_layout=True)
  99. colors = sns.color_palette('pastel')
  100. plt.bar(dataset[var1], dataset[var2], color=colors[:5])
  101. plt.xlabel('Name')
  102. plt.xticks(rotation=90)
  103. plt.ylabel('Age')
  104. plt.title('Barplot')
  105. plt.show()
  106.  
  107. #Name Vs Weight
  108. plt.figure()
  109. plt.figure(figsize=(10,5), tight_layout=True)
  110. colors = sns.color_palette('deep')
  111. plt.bar(dataset[var1], dataset[var3], color=colors[:6])
  112. plt.xlabel('Name')
  113. plt.xticks(rotation=90)
  114. plt.ylabel('Weight')
  115. plt.title('Barplot')
  116. plt.show()
  117.  
  118. #Bar Plot in Seaborn with sns.barplot()
  119. plt.figure(figsize=(10,5), tight_layout=True)
  120. ax = sns.barplot(x=dataset[var1], y=dataset[var2], palette='pastel', ci=None)
  121. ax.set(title='Barplot with Seaborn', xlabel='Names', ylabel='Age')
  122. plt.xticks(rotation=90)
  123. plt.show()
  124.  
  125. #Barplot grouped data by "n" variables
  126. plt.figure(figsize=(12, 6), tight_layout=True)
  127. ax = sns.barplot(x=dataset[var2], y=dataset[var3], hue=dataset[var1], palette='pastel')
  128. ax.set(title='Age vs Weight' ,xlabel='Age', ylabel='Weight')
  129. ax.legend(title='Names', title_fontsize='13', loc='upper right')
  130. plt.show()
  131.  
  132. #Histograms with plt.hist() or sns.histplot()
  133. plt.figure(figsize=(10,6), tight_layout=True)
  134. bins = [40, 50, 60, 70, 80]
  135. # matplotlib
  136. plt.hist(dataset[var3], bins=bins, color=sns.color_palette('Set2')[2], linewidth=2)
  137. plt.title('Histogram')
  138. plt.xlabel('Weight (cm)')
  139. plt.ylabel('Count')
  140. # seaborn
  141. ax = sns.histplot(data=dataset, x=var3, bins=bins, color=sns.color_palette('Set2')[2], linewidth=2)
  142. ax.set(title='Histogram', xlabel='Weight (cm)', ylabel='Count')
  143. plt.show()
  144.  
  145. #Boxplot
  146. plt.figure(figsize=(10,6), tight_layout=True)
  147. ax = sns.boxplot(data=dataset, x=var1, y=var2, palette='Set2', linewidth=2.5)
  148. ax.set(title='Boxplot', xlabel='Names', ylabel='Age (Years)')
  149. plt.xticks(rotation=90)
  150. plt.show()
  151.  
  152. #Scatter plot
  153. plt.figure(figsize=(10,6), tight_layout=True)
  154. ax = sns.scatterplot(data=dataset, x=var2, y=var3,   hue=var1, palette='Set2', s=60)
  155. ax.set(xlabel='Age (Years)', ylabel='Weight (kgs)')
  156. ax.legend(title='People', title_fontsize = 12)
  157. plt.show()
  158.  
  159. #Something else
  160. pivot = dataset.groupby([var1], as_index=False).mean(numeric_only=True)
  161. relationship = pivot.loc[:,var2:var3]
  162. print(relationship)
  163.  
  164. #Plot some graph
  165. charts = ["bar", "line", "barh", "hist", "box", "kde", "density", "area"]
  166. for chart_type in charts:
  167.     relationship.plot(kind="%s" % chart_type) #Replace bar with line, barh, hist, box, kde, density, area
  168.     plt.title("%s plot" % chart_type)
  169.     plt.show()
  170.  
  171. #Seaborn
  172. plt.figure()
  173. sns.set_style("darkgrid")
  174. sns.lineplot(data = dataset, x = var2, y = var3)
  175. plt.show()
  176.  
  177. plt.figure()
  178. sns.set_style("darkgrid")
  179. sns.lineplot(data = dataset, x = var3, y = var2)
  180. plt.show()
  181.  
  182. #replot
  183. plt.figure()
  184. sns.set_theme(style="darkgrid")
  185. sns.relplot(x=var2, y=var3, hue=var1, data=data);
  186. plt.show()
  187.  
  188. #Hexbin
  189. #Split the plotting window into 20 hexbins
  190. plt.figure()
  191. nbins = 15
  192. plt.title('Hexbin')
  193. plt.hexbin(dataset[var2], dataset[var3], gridsize=nbins, color=colors[:5])
  194. plt.show()
  195.  
  196. #Hexbin 2
  197. #Split the plotting window into 20 hexbins
  198. plt.figure()
  199. nbins = 15
  200. plt.title('Hexbin')
  201. plt.hexbin(dataset[var2], dataset[var3], gridsize=nbins, cmap=plt.cm.BuGn_r)
  202. plt.show()
  203.  
  204. #2-D Hist
  205. plt.figure()
  206. plt.title('2-D Histogram')
  207. plt.hist2d(dataset[var2], dataset[var3], bins=nbins, color=colors[:6])
  208. plt.show()
  209.  
  210. #2-D Hist 2
  211. plt.figure()
  212. plt.title('2-D Histogram')
  213. plt.hist2d(dataset[var2], dataset[var3], bins=nbins, cmap=plt.cm.BuGn_r)
  214. plt.show()
  215.  
  216. #Set variables
  217. x = dataset[var2]
  218. y = dataset[var3]
  219. z = dataset[var1]
  220.  
  221. #Linear Regression
  222. plt.figure()
  223. sns.regplot(x = x, y = y, data=dataset);
  224. plt.show()
  225.  
  226. plt.figure()
  227. sns.jointplot(x=x, y=y, data=dataset, kind="reg");
  228. plt.show()
  229.  
  230. #Set seaborn style
  231. sns.set_style("white")
  232.  
  233. # Basic 2D density plot
  234. plt.figure()
  235. sns.kdeplot(x=x, y=y)
  236. plt.show()
  237.  
  238. # Custom the color, add shade and bandwidth
  239. plt.figure()
  240. sns.kdeplot(x=x, y=y, cmap="Reds", shade=True, bw_adjust=.5)
  241. plt.show()
  242.  
  243. # Add thresh parameter
  244. plt.figure()
  245. sns.kdeplot(x=x, y=y, cmap="Blues", shade=True, thresh=0)
  246. plt.show()
  247.  
  248. #Joint plot
  249. plt.figure()
  250. sns.jointplot(x = x,y = y,data = dataset,kind = 'hex')
  251. plt.show()

0
TSSFL -- A Creative Journey Towards Infinite Possibilities!
User avatar
Eli
Senior Expert Member
Reactions: 183
Posts: 5330
Joined: 9 years ago
Location: Tanzania
Has thanked: 75 times
Been thanked: 88 times
Contact:

#5

Here is the code to test the real research categorical:

  1. #Plot some graph
  2. #Import required libraries
  3. import gspread
  4. import urllib.request
  5. import numpy as np
  6. import matplotlib.pyplot as plt
  7. import seaborn as sns
  8. import pandas as pd
  9. from statsmodels.graphics.mosaicplot import mosaic
  10. #REDCap
  11. textstr = 'Created at \nwww.tssfl.com'
  12.  
  13. #Let's visualize
  14. #Graph styles and font size
  15. sns.set_style('darkgrid') # darkgrid, white grid, dark, white and ticks
  16. plt.rc('axes', titlesize=18)     # fontsize of the axes title
  17. plt.rc('axes', labelsize=14)    # fontsize of the x and y labels
  18. plt.rc('xtick', labelsize=13)    # fontsize of the tick labels
  19. plt.rc('ytick', labelsize=13)    # fontsize of the tick labels
  20. plt.rc('legend', fontsize=13)    # legend fontsize
  21. plt.rc('font', size=13)          # controls default text sizes
  22.  
  23. sheet_id = "1pm1mGdRgpitrYQiGqUNSHPdR43e-ZSXCavYr-TcqtwU"
  24. sheet_name = "Sheet1"
  25. url = f"https://docs.google.com/spreadsheets/d/{sheet_id}/gviz/tq?tqx=out:csv&sheet={sheet_name}"
  26.  
  27. #If there is only one sheet use this
  28. #url = "https://docs.google.com/spreadsheets/export?id={}&exportFormat=csv".format(sheet_id)
  29.  
  30. data = pd.read_csv(url)
  31.  
  32. #print(data)
  33. #Drop first row
  34. #df = data.drop(labels=0, axis=0)
  35. #df = data.drop(data.index[0])
  36. df = data[~data['Ailment cured'].isin(['HIV/AIDS'])]
  37. #df['Ailment cured'] = df['Ailment cured'].replace({'Gonorrhoea, syphilis':'Gonorrhoea & Syphilis'})
  38. df["Ailment cured"] = df['Ailment cured'].replace('Gonorrhoea, syphilis', 'Gonorrhoea & Syphilis')
  39. #print(df)
  40.  
  41. #Growth form vs Citation
  42. plt.figure(figsize=(8,5))
  43. sns.boxplot(x='Growth form',y='Citation',data=data, palette='rainbow')
  44. plt.show()
  45.  
  46. #Citation vs Growth form
  47. plt.figure(figsize=(8,5))
  48. sns.boxplot(x='Citation',y='Growth form',data=data, palette='rainbow')
  49. plt.show()
  50.  
  51. #Citation vs Growth form
  52. plt.figure(figsize=(8,5))
  53. sns.boxplot(x='Citation',y='Part used',data=data, palette='rainbow')
  54. plt.tight_layout() #figure.savefig('myplot.png', bbox_inches='tight')
  55. plt.show()
  56.  
  57. #Citation vs Ailment cured
  58. plt.figure(figsize=(10,5))
  59. sns.boxplot(x=df["Ailment cured"],y=df['Citation'],data=df, palette='rainbow')
  60. plt.xlabel("Ailment cured", labelpad=15)
  61. plt.tight_layout()
  62. plt.show()
  63.  
  64. #Swarm plot
  65. fig = plt.gcf()
  66. fig.set_size_inches(30, 30)
  67. sns.catplot(x="Citation", y="Scientific name", hue="Ailment cured", kind="swarm", data=df)
  68. plt.tight_layout()
  69. plt.show()
  70.  
  71.  
  72. #Adding hue
  73. #Citation vs Growth form
  74. plt.figure(figsize=(8,5))
  75. sns.boxplot(x='Citation',y='Growth form',data=data, hue ='Part used', palette='rainbow')
  76. plt.tight_layout()
  77. plt.show()
  78.  
  79. plt.figure(figsize=(8,5))
  80. sns.boxplot(x='Citation',y='Growth form',data=data, hue ='Ailment cured', palette='rainbow')
  81. plt.tight_layout()
  82. plt.show()
  83.  
  84. #Violin plots
  85. plt.figure(figsize=(8,6))
  86. sns.violinplot(x='Citation',y='Growth form',data=data, hue ='Part used', palette='rainbow')
  87. plt.show()
  88.  
  89. #Violin plots
  90. plt.figure(figsize=(8,6))
  91. sns.violinplot(x='Citation',y='Growth form',data=data, hue ='Ailment cured',palette='rainbow')
  92. plt.show()
  93.  
  94. #Boxen plots
  95. plt.figure(figsize=(8,6))
  96. sns.boxenplot(x='Citation',y='Growth form',data=data, hue ='Part used', palette='rainbow')
  97. plt.show()
  98.  
  99. plt.figure(figsize=(8,6))
  100. sns.boxenplot(x='Citation',y='Part used',data=data, hue ='Ailment cured', palette='rainbow')
  101. plt.tight_layout()
  102. plt.show()
  103.  
  104. #Bar plots
  105. plt.figure(figsize=(12,6))
  106. sns.barplot(x='Growth form',y='Citation',data=data, palette='rainbow', hue='Part used')
  107. plt.tight_layout()
  108. plt.show()
  109.  
  110. plt.figure(figsize=(12,6))
  111. ax = plt.subplot(111)
  112. sns.barplot(x='Ailment cured',y='Citation',data=data, palette='rainbow', hue='Part used')
  113. plt.tight_layout()
  114. ax.legend(bbox_to_anchor=(0.8, 0.45))
  115. #plt.legend(loc=1)
  116. plt.show()
  117.  
  118. #Point plot
  119. plt.figure(figsize=(10,6))
  120. sns.pointplot(x='Citation',y='Growth form',data=data)
  121. plt.show()
  122.  
  123. plt.figure(figsize=(10,6))
  124. sns.pointplot(x='Citation',y='Growth form',data=data, hue='Part used')
  125. plt.show()
  126.  
  127. plt.figure(figsize=(10,6))
  128. sns.pointplot(x='Citation',y='Growth form',data=data, hue='Part used')
  129. plt.show()
  130.  
  131. plt.figure(figsize=(10,6))
  132. sns.pointplot(x='Citation',y='Growth form',data=data, hue='Ailment cured')
  133. plt.show()
  134.  
  135. #Count plot
  136. plt.figure(figsize=(10,6))
  137. sns.countplot(x='Growth form',data=data, palette='rainbow')
  138. plt.show()
  139.  
  140. plt.figure(figsize=(10,6))
  141. sns.countplot(x='Growth form',data=data, hue='Part used', palette='rainbow')
  142. plt.legend(loc=1)
  143. plt.show()
  144.  
  145. plt.figure(figsize=(10,6))
  146. sns.countplot(x='Growth form',data=data, hue='Ailment cured', palette='rainbow')
  147. plt.legend(loc=2)
  148. plt.show()
  149.  
  150.  
  151. #Strip plot - Categorical Scatter Plots
  152. plt.figure(figsize=(12,8))
  153. sns.stripplot(x='Citation', y='Growth form', data=data, jitter=True, hue= 'Part used', dodge=True, palette='viridis')
  154. plt.show()
  155.  
  156. #Swarm plots
  157. plt.figure(figsize=(10,6))
  158. sns.swarmplot(x='Citation', y='Ailment cured', data=data, hue='Growth form', dodge=True, palette='viridis')
  159. plt.tight_layout()
  160. plt.show()
  161.  
  162. """
  163. #Combining plots
  164. plt.figure(figsize=(12,8))
  165. sns.violinplot(x='Citation',y="Growth form", data=data, hue='Part used', dodge='True', palette='rainbow')
  166. sns.swarmplot(x='Citation',y="Growth form", data=data, hue='Part used', dodge='True', color='grey', alpha=.8, s=4)
  167. plt.show()
  168.  
  169. #Plot 2
  170. plt.figure(figsize=(12,8))
  171. sns.boxplot(x='Citation',y='Part used',hue='Growth form',data=data, palette='rainbow')
  172. sns.swarmplot(x='Citation',y='Part used',hue='Growth form', dodge=True,data=data, alpha=.8,color='grey',s=4)
  173.  
  174. #Plot 3
  175. plt.figure(figsize=(12,7))
  176. sns.barplot(x='Growth form',y='Citation',data=data, palette='rainbow', hue='Part used')
  177. sns.stripplot(x='Growth form',y="Citation",data=data, hue='Citation', dodge='True', color='grey', alpha=.8, s=2)
  178. plt.show()
  179.  
  180. #Faceting Data with Catplot
  181. #https://towardsdatascience.com/a-complete-guide-to-plotting-categorical-variables-with-seaborn-bfe54db66bec
  182. g = sns.catplot(x='Citation',y='Growth form', col = 'Local name', data=data,
  183.            kind='bar', aspect=.6, palette='Set2')
  184. (g.set_axis_labels("Class", "Survival Rate")
  185. .set_titles("{col_name}")
  186. .set(ylim=(0,1)))
  187. plt.tight_layout()
  188. plt.savefig('seaborn_catplot.png', dpi=1000)
  189. """
  190.  
  191. categorical_features = ["Growth form", "Part used", "Ailment cured", "Citation"]
  192. fig, ax = plt.subplots(1, len(categorical_features), figsize=(16,8))
  193. for i, categorical_feature in enumerate(data[categorical_features]):
  194.     data[categorical_feature].value_counts().plot(kind="bar", ax=ax[i]).set_title(categorical_feature)
  195. plt.tight_layout()
  196. plt.show()
  197.  
  198. """
  199. #print(data)
  200. #print(data['Local Name'])
  201. data['Growth form'].value_counts().plot(kind='bar')
  202. plt.show()
  203. #data['Growth form'].value_counts().plot(kind='hist')
  204.  
  205. plt.figure()
  206. from statsmodels.graphics.mosaicplot import mosaic
  207. plt.rcParams['font.size'] = 16.0
  208. mosaic(data, ['Growth form', 'Part used']);
  209. plt.show()
  210. """
  211. plt.figure()
  212. sns.barplot(x=df['Growth form'].head(3),y=df['Citation'],data=df)
  213. plt.show()
  214.  
  215. #Add frequencies/counts and percentages on bar tops
  216. total = float(len(data))
  217. print(total)
  218. plt.figure(figsize=(10,6))
  219. ax = sns.countplot(x='Growth form',data=data, hue='Part used', palette='rainbow')
  220.  
  221. for p in ax.patches[0:]:
  222.     h = p.get_height()
  223.     x = p.get_x()+p.get_width()/2.0
  224.     if h != 0:
  225.         ax.annotate("%g" % p.get_height(), xy=(x,h-0.19), xytext=(0,4), rotation=0,
  226.                    textcoords="offset points", ha="center", va="bottom", color='green')
  227.  
  228. for p in ax.patches:
  229.     percentage = '{:.2f}%'.format(100 * p.get_height()/total)
  230.     x = p.get_x() + p.get_width()
  231.     y = p.get_height()
  232.     if y != 0:
  233.         ax.annotate(percentage, (x-0.02, y+0.45),ha='center', rotation=90, color='red')
  234.  
  235. plt.tight_layout()
  236. plt.ylabel("Counts")
  237. plt.legend(loc=1)
  238. plt.gcf().text(0.1, 0.77, textstr, fontsize=14, color='green')
  239. plt.show()
  240. plt.clf()
  241.  
  242. plt.figure(figsize=(10,6))
  243. ax = sns.countplot(x='Growth form',data=data, hue='Ailment cured', palette='rainbow')
  244. for p in ax.patches[0:]:
  245.     h = p.get_height()
  246.     x = p.get_x()+p.get_width()/2.0
  247.     if h != 0:
  248.         ax.annotate("%g" % p.get_height(), xy=(x,h-0.19), xytext=(0,4), rotation=0,
  249.                 textcoords="offset points", ha="center", va="bottom", color='green')
  250.  
  251. for p in ax.patches:
  252.     percentage = '{:.2f}%'.format(100 * p.get_height()/total)
  253.     x = p.get_x() + p.get_width()
  254.     y = p.get_height()
  255.     if y != 0:
  256.         ax.annotate(percentage, (x-0.06, y+0.30),ha='center', rotation=90, color='red')
  257.  
  258. plt.tight_layout()
  259. plt.ylabel("Counts")
  260. plt.legend(loc=1)
  261. plt.gcf().text(0.1, 0.77, textstr, fontsize=14, color='green')
  262. plt.show()
  263. plt.clf()
  264.  
  265. #Add frequencies only
  266. plt.figure(figsize=(10,6))
  267. ax = sns.countplot(x='Growth form',data=data, hue='Part used', palette='rainbow')
  268.  
  269. for p in ax.patches[0:]:
  270.     h = p.get_height()
  271.     x = p.get_x()+p.get_width()/2.0
  272.     if h != 0:
  273.         ax.annotate("%g" % p.get_height(), xy=(x,h), xytext=(0,4), rotation=0,
  274.                    textcoords="offset points", ha="center", va="bottom", color='green')
  275. plt.tight_layout()
  276. plt.ylabel("Counts")
  277. plt.legend(loc=1)
  278. plt.gcf().text(0.1, 0.77, textstr, fontsize=14, color='green')
  279. plt.show()
  280. plt.clf()
  281.  
  282. #Add percentages only
  283. plt.figure(figsize=(10,6))
  284. ax = sns.countplot(x='Growth form',data=data, hue='Part used', palette='rainbow')
  285.  
  286. for p in ax.patches:
  287.     percentage = '{:.2f}%'.format(100 * p.get_height()/total)
  288.     x = p.get_x() + p.get_width()/2.0
  289.     h = p.get_height()
  290.     if h !=0:
  291.         ax.annotate(percentage, xy=(x,h+0.1), ha="center", va="bottom", rotation=90, color='red') #textcoords="offset points",
  292.  
  293. plt.tight_layout()
  294. plt.ylabel("Counts")
  295. plt.legend(loc=1)
  296. plt.gcf().text(0.1, 0.77, textstr, fontsize=14, color='green')
  297. plt.show()
  298. plt.clf()
  299.  
  300. #Add frequencies only
  301. plt.figure(figsize=(10,6))
  302. ax = sns.countplot(x='Growth form',data=data, hue='Ailment cured', palette='rainbow')
  303. for p in ax.patches[0:]:
  304.     h = p.get_height()
  305.     x = p.get_x()+p.get_width()/2.0
  306.     if h != 0:
  307.         ax.annotate("%g" % p.get_height(), xy=(x,h), xytext=(0,4), rotation=0,
  308.                 textcoords="offset points", ha="center", va="bottom", color='green')
  309.  
  310. plt.tight_layout()
  311. plt.ylabel("Counts")
  312. plt.legend(loc=1)
  313. plt.gcf().text(0.1, 0.77, textstr, fontsize=14, color='green')
  314. plt.show()
  315. plt.clf()
  316.  
  317. #Add percentages only
  318. plt.figure(figsize=(10,6))
  319. ax = sns.countplot(x='Growth form',data=data, hue='Ailment cured', palette='rainbow')
  320.  
  321. for p in ax.patches:
  322.     percentage = '{:.2f}%'.format(100 * p.get_height()/total)
  323.     x = p.get_x() + p.get_width()/2.0
  324.     h = p.get_height()
  325.     if h !=0:
  326.         ax.annotate(percentage, xy=(x,h+0.1), ha="center", va="bottom", rotation=90, color='red') #textcoords="offset points",
  327.  
  328. plt.tight_layout()
  329. plt.ylabel("Counts")
  330. plt.legend(loc=1)
  331. plt.gcf().text(0.1, 0.77, textstr, fontsize=14, color='green')
  332. plt.show()
  333. plt.clf()

0
TSSFL -- A Creative Journey Towards Infinite Possibilities!
User avatar
Eli
Senior Expert Member
Reactions: 183
Posts: 5330
Joined: 9 years ago
Location: Tanzania
Has thanked: 75 times
Been thanked: 88 times
Contact:

#6

We can collect data with Google Survey forms and post them into a spreadsheet, and then extract statistics, visualize and analyze results with TSSFL ODF Tools. Here is an example:

0
TSSFL -- A Creative Journey Towards Infinite Possibilities!
User avatar
Eli
Senior Expert Member
Reactions: 183
Posts: 5330
Joined: 9 years ago
Location: Tanzania
Has thanked: 75 times
Been thanked: 88 times
Contact:

#7

0
TSSFL -- A Creative Journey Towards Infinite Possibilities!
User avatar
Eli
Senior Expert Member
Reactions: 183
Posts: 5330
Joined: 9 years ago
Location: Tanzania
Has thanked: 75 times
Been thanked: 88 times
Contact:

#8

Here is the code for the TSSFL ODF Google Survey Form responses:

  1. #Plot some graph
  2. #Import required libraries
  3. import gspread
  4. import urllib.request
  5. import numpy as np
  6. import matplotlib.pyplot as plt
  7. import seaborn as sns
  8. import pandas as pd
  9.  
  10. #Let's Read Data from Google Sheets into Pandas without the Google Sheets API
  11. #Useful for multiple sheets
  12.  
  13. sheet_id = "1UnkRYcOhLFMgyT_CzByvupvdaD5cL5b_nCcOeoy1uy8"
  14. sheet_name = "Sheet1"
  15. url = f"https://docs.google.com/spreadsheets/d/{sheet_id}/gviz/tq?tqx=out:csv&sheet={sheet_name}"
  16.  
  17. #Use for single sheet
  18. #url = "https://docs.google.com/spreadsheets/export?id={}&exportFormat=csv".format(sheet_id)
  19.  
  20. #Get Pandas dataframe
  21. dataset = pd.read_csv(url)
  22. print(dataset.columns)
  23. #worksheet = sh.sheet1
  24. #age = worksheet.col_values(3)[1:]
  25. age = dataset["Age"]
  26. print("Ages:", age)
  27. #Height = worksheet.col_values(4)[1:]
  28. Height = dataset["Height"]
  29. print("Heights:", Height)
  30.  
  31. #Pandas is extremely very useful for Google spreadsheets
  32. #Convert the json to Pandas dataframe
  33. #Get all data records as dictionary
  34. #data = worksheet.get_all_records()
  35. #df = pd.DataFrame.from_dict(data)
  36.  
  37. #Let's get some statistics
  38. #age_arr = np.array(age)
  39. #age_array = age_arr.astype(float)
  40. #h_arr = np.array(Height)
  41. #h_array = h_arr.astype(float)
  42.  
  43. print("Average Age:", np.mean(age))
  44. print("Mean Height:", np.mean(Height))
  45. print("Minimum and Maximum Age:", np.min(age), np.max(age))
  46. print("Minimum and Maximum Height:", np.min(Height), np.max(Height))
  47.  
  48. #Let's visualize
  49. #Graph styles and font size
  50. sns.set_style('darkgrid') # darkgrid, white grid, dark, white and ticks
  51. plt.rc('axes', titlesize=18)     # fontsize of the axes title
  52. plt.rc('axes', labelsize=14)    # fontsize of the x and y labels
  53. plt.rc('xtick', labelsize=13)    # fontsize of the tick labels
  54. plt.rc('ytick', labelsize=13)    # fontsize of the tick labels
  55. plt.rc('legend', fontsize=13)    # legend fontsize
  56. plt.rc('font', size=13)          # controls default text sizes
  57.  
  58. #sns list of color plettes
  59. #print(sns.color_palette('deep'), sns.color_palette("pastel"), sns.color_palette("Set2"))
  60. #If your file only has one sheet, replace sheet_url
  61. #sheet_url = “https://docs.google.com/spreadsheets/d/1XqOtPkiE_Q0dfGSoyxrH730RkwrTczcRbDeJJpqRByQ/edit#gid=0"
  62. #url_1 = sheet_url.replace(‘/edit#gid=’, ‘/export?format=csv&gid=’)
  63.  
  64. #print(dataset)
  65.  
  66. #Names = worksheet.col_values(2)[1:]
  67. #Names = data["Name"]
  68. #print(Names)
  69.  
  70. df_names = dataset["Name"]
  71. df_ages = dataset["Age"]
  72. df_Heights = dataset["Height"]
  73. print(df_names)
  74.  
  75. #Preprocessing
  76. plots = dataset.groupby(['Name'], as_index=False).mean(numeric_only=True)
  77. #print(plots)
  78.  
  79. #Bar Plot in MatplotLib with plt.bar()
  80. #Names vs Age
  81. plt.figure(figsize=(10,5), tight_layout=True)
  82. colors = sns.color_palette('pastel')
  83. plt.bar(dataset['Name'], dataset['Age'], color=colors[:5])
  84. plt.xlabel('Name')
  85. plt.xticks(rotation=90)
  86. plt.ylabel('Age')
  87. plt.title('Barplot')
  88. plt.show()
  89.  
  90. #Name Vs Height
  91. plt.figure()
  92. plt.figure(figsize=(10,5), tight_layout=True)
  93. colors = sns.color_palette('deep')
  94. plt.bar(dataset['Name'], dataset['Height'], color=colors[:6])
  95. plt.xlabel('Name')
  96. plt.xticks(rotation=90)
  97. plt.ylabel('Height')
  98. plt.title('Barplot')
  99. plt.show()
  100.  
  101. #Bar Plot in Seaborn with sns.barplot()
  102. plt.figure(figsize=(10,5), tight_layout=True)
  103. ax = sns.barplot(x=dataset['Name'], y=dataset['Age'], palette='pastel', ci=None)
  104. ax.set(title='Barplot with Seaborn', xlabel='Names', ylabel='Age')
  105. plt.xticks(rotation=90)
  106. plt.show()
  107.  
  108. #Barplot grouped data by "n" variables
  109. plt.figure(figsize=(12, 6), tight_layout=True)
  110. ax = sns.barplot(x=dataset['Age'], y=dataset['Height'], hue=dataset['Name'], palette='pastel')
  111. ax.set(title='Age vs Height' ,xlabel='Age', ylabel='Height')
  112. ax.legend(title='Names', title_fontsize='13', loc='upper right')
  113. plt.show()
  114.  
  115. #Histograms with plt.hist() or sns.histplot()
  116. plt.figure(figsize=(10,6), tight_layout=True)
  117. bins = [40, 50, 60, 70, 80]
  118. # matplotlib
  119. plt.hist(dataset['Height'], bins=bins, color=sns.color_palette('Set2')[2], linewidth=2)
  120. plt.title('Histogram')
  121. plt.xlabel('Height (cm)')
  122. plt.ylabel('Count')
  123. # seaborn
  124. ax = sns.histplot(data=dataset, x='Height', bins=bins, color=sns.color_palette('Set2')[2], linewidth=2)
  125. ax.set(title='Histogram', xlabel='Height (cm)', ylabel='Count')
  126. plt.show()
  127.  
  128. #Boxplot
  129. plt.figure(figsize=(10,6), tight_layout=True)
  130. ax = sns.boxplot(data=dataset, x='Name', y='Age', palette='Set2', linewidth=2.5)
  131. ax.set(title='Boxplot', xlabel='Names', ylabel='Age (Years)')
  132. plt.xticks(rotation=90)
  133. plt.show()
  134.  
  135. #Scatter plot
  136. plt.figure(figsize=(10,6), tight_layout=True)
  137. ax = sns.scatterplot(data=dataset, x='Age', y='Height',   hue='Name', palette='Set2', s=60)
  138. ax.set(xlabel='Age (Years)', ylabel='Height (cms)')
  139. ax.legend(title='People', title_fontsize = 12)
  140. plt.show()
  141.  
  142. #Something else
  143. pivot = dataset.groupby(['Name'], as_index=False).mean(numeric_only=True)
  144. relationship = pivot.loc[:,"Age":"Height"]
  145. print(relationship)
  146.  
  147. #Plot some graph
  148. charts = ["bar", "line", "barh", "hist", "box", "kde", "density", "area"]
  149. for chart_type in charts:
  150.     relationship.plot(kind="%s" % chart_type) #Replace bar with line, barh, hist, box, kde, density, area
  151.     plt.title("%s plot" % chart_type)
  152.     plt.show()
  153.  
  154. #Seaborn
  155. plt.figure()
  156. sns.set_style("darkgrid")
  157. sns.lineplot(data = dataset, x = "Age", y = "Height")
  158. plt.show()
  159.  
  160. plt.figure()
  161. sns.set_style("darkgrid")
  162. sns.lineplot(data = dataset, x = "Height", y = "Age")
  163. plt.show()
  164.  
  165. #replot
  166. plt.figure()
  167. sns.set_theme(style="darkgrid")
  168. sns.relplot(x="Age", y="Height", hue="Name", data=dataset);
  169. plt.show()
  170.  
  171. #Hexbin
  172. #Split the plotting window into 20 hexbins
  173. plt.figure()
  174. nbins = 15
  175. plt.title('Hexbin')
  176. plt.hexbin(dataset["Age"], dataset["Height"], gridsize=nbins, color=colors[:5])
  177. plt.show()
  178.  
  179. #Hexbin 2
  180. #Split the plotting window into 20 hexbins
  181. plt.figure()
  182. nbins = 15
  183. plt.title('Hexbin')
  184. plt.hexbin(dataset["Age"], dataset["Height"], gridsize=nbins, cmap=plt.cm.BuGn_r)
  185. plt.show()
  186.  
  187. #2-D Hist
  188. plt.figure()
  189. plt.title('2-D Histogram')
  190. plt.hist2d(dataset["Age"], dataset["Height"], bins=nbins, color=colors[:6])
  191. plt.show()
  192.  
  193. #2-D Hist 2
  194. plt.figure()
  195. plt.title('2-D Histogram')
  196. plt.hist2d(dataset["Age"], dataset["Height"], bins=nbins, cmap=plt.cm.BuGn_r)
  197. plt.show()
  198.  
  199. #Set variables
  200. x = dataset["Age"]
  201. y = dataset["Height"]
  202. z = dataset["Name"]
  203.  
  204. #Linear Regression
  205. plt.figure()
  206. sns.regplot(x = x, y = y, data=dataset);
  207. plt.show()
  208.  
  209. plt.figure()
  210. sns.jointplot(x=x, y=y, data=dataset, kind="reg");
  211. plt.show()
  212.  
  213. #Set seaborn style
  214. sns.set_style("white")
  215.  
  216. # Basic 2D density plot
  217. plt.figure()
  218. sns.kdeplot(x=x, y=y)
  219. plt.show()
  220.  
  221. # Custom the color, add shade and bandwidth
  222. plt.figure()
  223. sns.kdeplot(x=x, y=y, cmap="Reds", shade=True, bw_adjust=.5)
  224. plt.show()
  225.  
  226. # Add thresh parameter
  227. plt.figure()
  228. sns.kdeplot(x=x, y=y, cmap="Blues", shade=True, thresh=0)
  229. plt.show()
  230.  
  231. #Joint plot
  232. plt.figure()
  233. sns.jointplot(x = x,y = y,data = dataset,kind = 'hex')
  234. plt.show()

0
TSSFL -- A Creative Journey Towards Infinite Possibilities!
User avatar
Eli
Senior Expert Member
Reactions: 183
Posts: 5330
Joined: 9 years ago
Location: Tanzania
Has thanked: 75 times
Been thanked: 88 times
Contact:

#9

Every technology/software is callable on every page of the forum even multiple times:

0
TSSFL -- A Creative Journey Towards Infinite Possibilities!
User avatar
Eli
Senior Expert Member
Reactions: 183
Posts: 5330
Joined: 9 years ago
Location: Tanzania
Has thanked: 75 times
Been thanked: 88 times
Contact:

#10

ODK Training with Janeth:


0
TSSFL -- A Creative Journey Towards Infinite Possibilities!
Post Reply
  • Similar Topics
    Replies
    Views
    Last post

Return to “Technologies for Teaching, Learning, Research, Problem Solving and Business”

  • Information
  • Who is online

    Users browsing this forum: No registered users and 0 guests