TSSFL TECHNOLOGY STACK

Posted: **Mon Apr 19, 2021 1:37 am**

The Open Data Kit (ODK) is a free, open-source suite of tools that allows data collection using Android mobile devices and data submission to an online server, even without an Internet connection or mobile communication services at the time of data collection. ODK Collect replaces the traditional paper forms with electronic forms that allow text, numeric data, GPS, photo, video, barcodes, and audio uploads to an online server. The Open Data Kit has become a standard software that helps organizations and authors/researchers to collect and manage mobile data collection solutions. ODK is mentioned by Google among tools that create new knowledge, raise awareness, or enable people to take action to change the world.

This is a summary of the integration of a versatile set of tools and systems, featuring the TSSFL Stack, the Open Data Kit (ODK) Collect, and Google Drive to collect, store, manage, process, and analyze data. All these three architectures are designed for teams and collaborations. ODK Collect allows multiple team members to collect data with their android phones, at different times, paces, and locations, and yet the collected data is sent to the same Google spreadsheet for storage and management. From Google spreadsheet, the data is then programmatically acquired by TSSFL ODF for processing and analysis. TSSFL ODF does this while offering various collaboration options among the team members in question. Traditionally, ODK mainly worked with KoBoToolbox - a suite of tools for field data collection for use in challenging environments. However, the main collection application used by the KoBoToolbox is built/compatible with the ODK ecosystem. This means any form built for/by ODK Collect should also work for KoBoToolbox and vice-versa, read more.

The whole process is streamlined as follows:

1. Creating a form for data collection and submissions. The form is created using ODK Build at http://build.opendatakit.org/:

2. Linking to Google Drive and hosting the survey form we built with ODK Build via XML in Google Drive so that the project team is able to download it to their Android phones. This step includes creating a Google spreadsheet that is placed where the completed survey responses from the project team will be sent and stored:

Go to Edit - > Form Properties
Fill the Title on Device, Instance Name and Public Key (All these are optional)
Copy and Paste the URL of the Google Spreadsheet that will collect data in the place of Submission URL
Click Done
Next go to File -> Export to XML and Download the form
Place the form in the same folder in the Google Drive as the Google Spreadsheet that will collect data

3. Installing & Configuring ODK Collect (from Google Play Store) or updating it to the latest version. Configuring includes uploading the form (stored in Google Drive) created using ODK Build into ODK Collect:

4. Collecting data using ODK Collect:

5. Sending the collected data to Google Spreadsheet in Google Drive:

6. Viewing the collected data stored in the spreadsheet (see below).

7. Integrating the spreadsheet into TSSFL Stack for Open Science and Collaborations between teams. This includes embedding the spreadsheet and enabling communication between TSSFL Stack and Google spreadsheets via Google Python APIs (Google Sheets API v4 and Google Drive API):

As the survey continues and more data is collected and submitted to Google sheet via ODK Collect, this spreadsheet will automatically update to contain the latest information.

8. Read, Process and Analyze data over TSSFL ODF with Python, and automate various tasks, see

- Automating the Google Spreadsheet Tasks with Python and TSSFL Stack

- Automate Multiple Excel Sheets and Produce Reports Using Python

- Automate Reports with Python and Pandas, Save the Output to HTML

- How to Use Python Pandas Pivot Table for Data Presentation and Analysis

- How to Generate PDF Reports with Pandas, Jinja and WeasyPrint

The sample code below can read and print data submitted to Google spreadsheet using ODK Collect, it can be extended to do extremely useful analyses using this data:

Code: [Select all] [Expand/Collapse]

#Plot some graph
#Import required libraries
import gspread
import urllib.request
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
 
"""
urllib.request.urlretrieve("https://www.dropbox.com/s/mqsyfuetv8potvd/credentials.json?dl=1", "credentials.json")
 
gc = gspread.service_account(filename="credentials.json")
sh = gc.open_by_key("1BC48PKPZW71AC6hOn1SNesvZk1PoBjE4wXwl1YlWFpY") #Open spreadsheet,
#the spreadsheet ID starts with 1019ke.... between "" in the line above
"""
#Alternative
#If your file only has one sheet, replace sheet_url
#sheet_url = "https://docs.google.com/spreadsheets/d/1BC48PKPZW71AC6hOn1SNesvZk1PoBjE4wXwl1YlWFpY/edit#gid=0"
#url_1 = sheet_url.replace('/edit#gid=', '/export?format=csv&gid=')
 
#Use the first two lines for a single sheet -- but the method is very slow for slow connection
sheet_id = "1BC48PKPZW71AC6hOn1SNesvZk1PoBjE4wXwl1YlWFpY"
#url = "https://docs.google.com/spreadsheets/export?id={}&exportFormat=csv".format(sheet_id)
sheet_name = "Sheet1"
url = f"https://docs.google.com/spreadsheets/d/{sheet_id}/gviz/tq?tqx=out:csv&sheet={sheet_name}"
 
data = pd.read_csv(url)
 
#worksheet = sh.sheet1
 
#Define variables
var1 = "Name"
var2 = "Age"
var3 = "Height"
 
#age = worksheet.col_values(3)[1:]
age = data[var2]
print("Ages:", age)
#height = worksheet.col_values(4)[1:]
height = data[var3]
print("Heights:", height)
 
#Pandas is extremely very useful for Google spreadsheets
#Convert the json to Pandas dataframe
#Get all data records as dictionary
#data = worksheet.get_all_records()
#df = pd.DataFrame.from_dict(data)
 
#Let's get some statistics
#age_arr = np.array(age)
#age_array = age_arr.astype(float)
#h_arr = np.array(height)
#h_array = h_arr.astype(float)
 
print("Average Age:", np.mean(age))
print("Mean Height:", np.mean(height))
print("Minimum and Maximum Age:", np.min(age), np.max(age))
print("Minimum and Maximum Height:", np.min(height), np.max(height))
 
#Let's visualize
#Graph styles and font size
sns.set_style('darkgrid') # darkgrid, white grid, dark, white and ticks
plt.rc('axes', titlesize=18)     # fontsize of the axes title
plt.rc('axes', labelsize=14)    # fontsize of the x and y labels
plt.rc('xtick', labelsize=13)    # fontsize of the tick labels
plt.rc('ytick', labelsize=13)    # fontsize of the tick labels
plt.rc('legend', fontsize=13)    # legend fontsize
plt.rc('font', size=13)          # controls default text sizes
 
#sns list of color plettes
#print(sns.color_palette('deep'), sns.color_palette("pastel"), sns.color_palette("Set2"))
 
#Let's Read Data from Google Sheets into Pandas without the Google Sheets API
#Useful for multiple sheets
#sheet_id = "1BC48PKPZW71AC6hOn1SNesvZk1PoBjE4wXwl1YlWFpY"
#sheet_name = "Sheet1"
#url = f"https://docs.google.com/spreadsheets/d/{sheet_id}/gviz/tq?tqx=out:csv&sheet={sheet_name}"
 
#If your file only has one sheet, replace sheet_url
#sheet_url = “https://docs.google.com/spreadsheets/d/1XqOtPkiE_Q0dfGSoyxrH730RkwrTczcRbDeJJpqRByQ/edit#gid=0"
#url_1 = sheet_url.replace(‘/edit#gid=’, ‘/export?format=csv&gid=’)
 
#Get Pandas dataframe
dataset = pd.read_csv(url)
#print(dataset)
 
#Names = worksheet.col_values(2)[1:]
#Names = data[var1]
#print(Names)
 
df_names = dataset[var1]
df_ages = dataset[var2]
df_heights = dataset[var3]
print(df_names)
 
#Preprocessing
plots = dataset.groupby([var1], as_index=False).mean()
#print(plots)
 
#Bar Plot in MatplotLib with plt.bar()
#Names vs Age
plt.figure(figsize=(10,5), tight_layout=True)
colors = sns.color_palette('pastel')
plt.bar(dataset[var1], dataset[var2], color=colors[:5])
plt.xlabel(var1)
plt.xticks(rotation=90)
plt.ylabel('Age')
plt.title('Barplot')
plt.show()
 
#Name Vs Height
plt.figure()
plt.figure(figsize=(10,5), tight_layout=True)
colors = sns.color_palette('deep')
plt.bar(dataset[var1], dataset[var3], color=colors[:6])
plt.xlabel(var1)
plt.xticks(rotation=90)
plt.ylabel('Height')
plt.title('Barplot')
plt.show()
 
#Bar Plot in Seaborn with sns.barplot()
plt.figure(figsize=(10,5), tight_layout=True)
ax = sns.barplot(x=dataset[var1], y=dataset[var2], palette='pastel', ci=None)
ax.set(title='Barplot with Seaborn', xlabel='Names', ylabel='Age')
plt.xticks(rotation=90)
plt.show()
 
#Barplot grouped data by "n" variables
plt.figure(figsize=(12, 6), tight_layout=True)
ax = sns.barplot(x=dataset[var2], y=dataset[var3], hue=dataset[var1], palette='pastel')
ax.set(title='Age vs Height' ,xlabel='Age', ylabel='Height')
ax.legend(title='Names', title_fontsize='13', loc='upper right')
plt.show()
 
#Histograms with plt.hist() or sns.histplot()
plt.figure(figsize=(10,6), tight_layout=True)
bins = [160, 165, 170, 175, 180, 185, 190, 195, 200]
# matplotlib
plt.hist(dataset[var3], bins=bins, color=sns.color_palette('Set2')[2], linewidth=2)
plt.title('Histogram')
plt.xlabel('Height (cm)')
plt.ylabel('Count')
# seaborn
ax = sns.histplot(data=dataset, x=var3, bins=bins, color=sns.color_palette('Set2')[2], linewidth=2)
ax.set(title='Histogram', xlabel='Height (cm)', ylabel='Count')
plt.show()
 
#Boxplot
plt.figure(figsize=(10,6), tight_layout=True)
ax = sns.boxplot(data=dataset, x=var1, y=var2, palette='Set2', linewidth=2.5)
ax.set(title='Boxplot', xlabel='Names', ylabel='Age (Years)')
plt.xticks(rotation=90)
plt.show()
 
#Scatter plot
plt.figure(figsize=(10,6), tight_layout=True)
ax = sns.scatterplot(data=dataset, x=var2, y=var3,   hue=var1, palette='Set2', s=60)
ax.set(xlabel='Age (Years)', ylabel='Height (cm)')
ax.legend(title='People', title_fontsize = 12)
plt.show()
 
#Something else
pivot = dataset.groupby([var1], as_index=False).mean()
relationship = pivot.loc[:,var2:var3]
print(relationship)
 
#Plot some graph
charts = ["bar", "line", "barh", "hist", "box", "kde", "density", "area"]
for chart_type in charts:
    relationship.plot(kind="%s" % chart_type) #Replace bar with line, barh, hist, box, kde, density, area
    plt.title("%s plot" % chart_type)
    plt.show()
 
#Seaborn
plt.figure()
sns.set_style("darkgrid")
sns.lineplot(data = dataset, x = var2, y = var3)
plt.show()
 
plt.figure()
sns.set_style("whitegrid")
sns.lineplot(data = dataset, x = var2, y = var3)
plt.show()
 
#Hexbin
#Split the plotting window into 20 hexbins
plt.figure()
nbins = 20
plt.title('Hexbin')
plt.hexbin(dataset[var2], dataset[var3], gridsize=nbins, color=colors[:6])
plt.show()
 
#2-D Hist
plt.figure()
plt.title('2-D Histogram')
plt.hist2d(dataset[var2], dataset[var3], bins=nbins, color=colors[:5])
plt.show()
 
#Set variables
x = dataset[var2]
y = dataset[var3]
z = dataset[var1]
 
#Linear Regression
plt.figure()
sns.regplot(x = x, y = y, data=dataset);
plt.show()
 
plt.figure()
sns.jointplot(x=x, y=y, data=dataset, kind="reg");
plt.show()
 
#Set seaborn style
sns.set_style("white")
 
# Basic 2D density plot
plt.figure()
sns.kdeplot(x=x, y=y)
plt.show()
 
# Custom the color, add shade and bandwidth
plt.figure()
sns.kdeplot(x=x, y=y, cmap="Reds", shade=True, bw_adjust=.5)
plt.show()
 
# Add thresh parameter
plt.figure()
sns.kdeplot(x=x, y=y, cmap="Blues", shade=True, thresh=0)
plt.show()
 
#Joint plot
plt.figure()
sns.jointplot(x = x,y = y,data = dataset,kind = 'hex')
plt.show()

TSSFL ODF is integrated with various Data Science Tools and Toolboxes for performing almost any data-related task.

Find detailed information regarding ODK Google Drive integration here.

ODK Central's real-time data feed for dashboards, integrations and more:

Posted: **Thu Apr 22, 2021 11:23 am**

Posted: **Fri Apr 30, 2021 1:20 pm**

Further Tests with TSSFL ODF - ODK Integration

Posted: **Fri Apr 30, 2021 3:20 pm**

Here is the code for the latter data:

Code: [Select all] [Expand/Collapse]

#Plot some graph
#Import required libraries
import gspread
import urllib.request
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
 
sheet_id = "1426rTslBl2mgggHIQnLR7WivW4xor6cp5H1Su-1SrdI"
sheet_name = "Sheet1"
url = f"https://docs.google.com/spreadsheets/d/{sheet_id}/gviz/tq?tqx=out:csv&sheet={sheet_name}"
 
#For a single sheet, use this
#sheet_id = "1426rTslBl2mgggHIQnLR7WivW4xor6cp5H1Su-1SrdI"
#url = "https://docs.google.com/spreadsheets/export?id={}&exportFormat=csv".format(sheet_id)
 
#Or
#If your file only has one sheet, replace sheet_url
#sheet_url = "https://docs.google.com/spreadsheets/d/1426rTslBl2mgggHIQnLR7WivW4xor6cp5H1Su-1SrdI/edit#gid=0"
#url = sheet_url.replace('/edit#gid=', '/export?format=csv&gid=')
 
data = pd.read_csv(url)
 
#worksheet = sh.sheet1
 
#Define variables
var1 = "data-Name"
var2 = "data-Age"
var3 = "data-Weight"
 
#age = worksheet.col_values(3)[1:]
age = data[var2]
print("Ages:", age)
#weight = worksheet.col_values(4)[1:]
weight = data[var3]
print("Weights:", weight)
 
#Pandas is extremely very useful for Google spreadsheets
#Convert the json to Pandas dataframe
#Get all data records as dictionary
#data = worksheet.get_all_records()
#df = pd.DataFrame.from_dict(data)
 
#Let's get some statistics
#age_arr = np.array(age)
#age_array = age_arr.astype(float)
#h_arr = np.array(weight)
#h_array = h_arr.astype(float)
 
print("Average Age:", np.mean(age))
print("Mean Weight:", np.mean(weight))
print("Minimum and Maximum Age:", np.min(age), np.max(age))
print("Minimum and Maximum Weight:", np.min(weight), np.max(weight))
 
#Let's visualize
#Graph styles and font size
sns.set_style('darkgrid') # darkgrid, white grid, dark, white and ticks
plt.rc('axes', titlesize=18)     # fontsize of the axes title
plt.rc('axes', labelsize=14)    # fontsize of the x and y labels
plt.rc('xtick', labelsize=13)    # fontsize of the tick labels
plt.rc('ytick', labelsize=13)    # fontsize of the tick labels
plt.rc('legend', fontsize=13)    # legend fontsize
plt.rc('font', size=13)          # controls default text sizes
 
#sns list of color plettes
#print(sns.color_palette('deep'), sns.color_palette("pastel"), sns.color_palette("Set2"))
 
#Let's Read Data from Google Sheets into Pandas without the Google Sheets API
#Useful for multiple sheets
#sheet_id = "1426rTslBl2mgggHIQnLR7WivW4xor6cp5H1Su-1SrdI"
#sheet_name = "Sheet1"
#url = f"https://docs.google.com/spreadsheets/d/{sheet_id}/gviz/tq?tqx=out:csv&sheet={sheet_name}"
 
#If your file only has one sheet, replace sheet_url
#sheet_url = “https://docs.google.com/spreadsheets/d/1XqOtPkiE_Q0dfGSoyxrH730RkwrTczcRbDeJJpqRByQ/edit#gid=0"
#url_1 = sheet_url.replace(‘/edit#gid=’, ‘/export?format=csv&gid=’)
 
#Get Pandas dataframe
dataset = pd.read_csv(url)
#print(dataset)
 
#Names = worksheet.col_values(2)[1:]
#Names = data[var1]
#print(Names)
 
df_names = dataset[var1]
df_ages = dataset[var2]
df_weights = dataset[var3]
print(df_names)
 
#Preprocessing
plots = dataset.groupby([var1], as_index=False).mean(numeric_only=True)
#print(plots)
 
#Bar Plot in MatplotLib with plt.bar()
#Names vs Age
plt.figure(figsize=(10,5), tight_layout=True)
colors = sns.color_palette('pastel')
plt.bar(dataset[var1], dataset[var2], color=colors[:5])
plt.xlabel('Name')
plt.xticks(rotation=90)
plt.ylabel('Age')
plt.title('Barplot')
plt.show()
 
#Name Vs Weight
plt.figure()
plt.figure(figsize=(10,5), tight_layout=True)
colors = sns.color_palette('deep')
plt.bar(dataset[var1], dataset[var3], color=colors[:6])
plt.xlabel('Name')
plt.xticks(rotation=90)
plt.ylabel('Weight')
plt.title('Barplot')
plt.show()
 
#Bar Plot in Seaborn with sns.barplot()
plt.figure(figsize=(10,5), tight_layout=True)
ax = sns.barplot(x=dataset[var1], y=dataset[var2], palette='pastel', ci=None)
ax.set(title='Barplot with Seaborn', xlabel='Names', ylabel='Age')
plt.xticks(rotation=90)
plt.show()
 
#Barplot grouped data by "n" variables
plt.figure(figsize=(12, 6), tight_layout=True)
ax = sns.barplot(x=dataset[var2], y=dataset[var3], hue=dataset[var1], palette='pastel')
ax.set(title='Age vs Weight' ,xlabel='Age', ylabel='Weight')
ax.legend(title='Names', title_fontsize='13', loc='upper right')
plt.show()
 
#Histograms with plt.hist() or sns.histplot()
plt.figure(figsize=(10,6), tight_layout=True)
bins = [40, 50, 60, 70, 80]
# matplotlib
plt.hist(dataset[var3], bins=bins, color=sns.color_palette('Set2')[2], linewidth=2)
plt.title('Histogram')
plt.xlabel('Weight (cm)')
plt.ylabel('Count')
# seaborn
ax = sns.histplot(data=dataset, x=var3, bins=bins, color=sns.color_palette('Set2')[2], linewidth=2)
ax.set(title='Histogram', xlabel='Weight (cm)', ylabel='Count')
plt.show()
 
#Boxplot
plt.figure(figsize=(10,6), tight_layout=True)
ax = sns.boxplot(data=dataset, x=var1, y=var2, palette='Set2', linewidth=2.5)
ax.set(title='Boxplot', xlabel='Names', ylabel='Age (Years)')
plt.xticks(rotation=90)
plt.show()
 
#Scatter plot
plt.figure(figsize=(10,6), tight_layout=True)
ax = sns.scatterplot(data=dataset, x=var2, y=var3,   hue=var1, palette='Set2', s=60)
ax.set(xlabel='Age (Years)', ylabel='Weight (kgs)')
ax.legend(title='People', title_fontsize = 12)
plt.show()
 
#Something else
pivot = dataset.groupby([var1], as_index=False).mean(numeric_only=True)
relationship = pivot.loc[:,var2:var3]
print(relationship)
 
#Plot some graph
charts = ["bar", "line", "barh", "hist", "box", "kde", "density", "area"]
for chart_type in charts:
    relationship.plot(kind="%s" % chart_type) #Replace bar with line, barh, hist, box, kde, density, area
    plt.title("%s plot" % chart_type)
    plt.show()
 
#Seaborn
plt.figure()
sns.set_style("darkgrid")
sns.lineplot(data = dataset, x = var2, y = var3)
plt.show()
 
plt.figure()
sns.set_style("darkgrid")
sns.lineplot(data = dataset, x = var3, y = var2)
plt.show()
 
#replot
plt.figure()
sns.set_theme(style="darkgrid")
sns.relplot(x=var2, y=var3, hue=var1, data=data);
plt.show()
 
#Hexbin
#Split the plotting window into 20 hexbins
plt.figure()
nbins = 15
plt.title('Hexbin')
plt.hexbin(dataset[var2], dataset[var3], gridsize=nbins, color=colors[:5])
plt.show()
 
#Hexbin 2
#Split the plotting window into 20 hexbins
plt.figure()
nbins = 15
plt.title('Hexbin')
plt.hexbin(dataset[var2], dataset[var3], gridsize=nbins, cmap=plt.cm.BuGn_r)
plt.show()
 
#2-D Hist
plt.figure()
plt.title('2-D Histogram')
plt.hist2d(dataset[var2], dataset[var3], bins=nbins, color=colors[:6])
plt.show()
 
#2-D Hist 2
plt.figure()
plt.title('2-D Histogram')
plt.hist2d(dataset[var2], dataset[var3], bins=nbins, cmap=plt.cm.BuGn_r)
plt.show()
 
#Set variables
x = dataset[var2]
y = dataset[var3]
z = dataset[var1]
 
#Linear Regression
plt.figure()
sns.regplot(x = x, y = y, data=dataset);
plt.show()
 
plt.figure()
sns.jointplot(x=x, y=y, data=dataset, kind="reg");
plt.show()
 
#Set seaborn style
sns.set_style("white")
 
# Basic 2D density plot
plt.figure()
sns.kdeplot(x=x, y=y)
plt.show()
 
# Custom the color, add shade and bandwidth
plt.figure()
sns.kdeplot(x=x, y=y, cmap="Reds", shade=True, bw_adjust=.5)
plt.show()
 
# Add thresh parameter
plt.figure()
sns.kdeplot(x=x, y=y, cmap="Blues", shade=True, thresh=0)
plt.show()
 
#Joint plot
plt.figure()
sns.jointplot(x = x,y = y,data = dataset,kind = 'hex')
plt.show()

Posted: **Mon May 03, 2021 7:25 pm**

Here is the code to test the real research categorical:

Code: [Select all] [Expand/Collapse]

#Plot some graph
#Import required libraries
import gspread
import urllib.request
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
from statsmodels.graphics.mosaicplot import mosaic
#REDCap
textstr = 'Created at \nwww.tssfl.com'
 
#Let's visualize
#Graph styles and font size
sns.set_style('darkgrid') # darkgrid, white grid, dark, white and ticks
plt.rc('axes', titlesize=18)     # fontsize of the axes title
plt.rc('axes', labelsize=14)    # fontsize of the x and y labels
plt.rc('xtick', labelsize=13)    # fontsize of the tick labels
plt.rc('ytick', labelsize=13)    # fontsize of the tick labels
plt.rc('legend', fontsize=13)    # legend fontsize
plt.rc('font', size=13)          # controls default text sizes
 
sheet_id = "1pm1mGdRgpitrYQiGqUNSHPdR43e-ZSXCavYr-TcqtwU"
sheet_name = "Sheet1"
url = f"https://docs.google.com/spreadsheets/d/{sheet_id}/gviz/tq?tqx=out:csv&sheet={sheet_name}"
 
#If there is only one sheet use this
#url = "https://docs.google.com/spreadsheets/export?id={}&exportFormat=csv".format(sheet_id)
 
data = pd.read_csv(url)
 
#print(data)
#Drop first row
#df = data.drop(labels=0, axis=0)
#df = data.drop(data.index[0])
df = data[~data['Ailment cured'].isin(['HIV/AIDS'])]
#df['Ailment cured'] = df['Ailment cured'].replace({'Gonorrhoea, syphilis':'Gonorrhoea & Syphilis'})
df["Ailment cured"] = df['Ailment cured'].replace('Gonorrhoea, syphilis', 'Gonorrhoea & Syphilis')
#print(df)
 
#Growth form vs Citation
plt.figure(figsize=(8,5))
sns.boxplot(x='Growth form',y='Citation',data=data, palette='rainbow')
plt.show()
 
#Citation vs Growth form
plt.figure(figsize=(8,5))
sns.boxplot(x='Citation',y='Growth form',data=data, palette='rainbow')
plt.show()
 
#Citation vs Growth form
plt.figure(figsize=(8,5))
sns.boxplot(x='Citation',y='Part used',data=data, palette='rainbow')
plt.tight_layout() #figure.savefig('myplot.png', bbox_inches='tight')
plt.show()
 
#Citation vs Ailment cured
plt.figure(figsize=(10,5))
sns.boxplot(x=df["Ailment cured"],y=df['Citation'],data=df, palette='rainbow')
plt.xlabel("Ailment cured", labelpad=15)
plt.tight_layout()
plt.show()
 
#Swarm plot
fig = plt.gcf()
fig.set_size_inches(30, 30)
sns.catplot(x="Citation", y="Scientific name", hue="Ailment cured", kind="swarm", data=df)
plt.tight_layout()
plt.show()
 
 
#Adding hue
#Citation vs Growth form
plt.figure(figsize=(8,5))
sns.boxplot(x='Citation',y='Growth form',data=data, hue ='Part used', palette='rainbow')
plt.tight_layout()
plt.show()
 
plt.figure(figsize=(8,5))
sns.boxplot(x='Citation',y='Growth form',data=data, hue ='Ailment cured', palette='rainbow')
plt.tight_layout()
plt.show()
 
#Violin plots
plt.figure(figsize=(8,6))
sns.violinplot(x='Citation',y='Growth form',data=data, hue ='Part used', palette='rainbow')
plt.show()
 
#Violin plots
plt.figure(figsize=(8,6))
sns.violinplot(x='Citation',y='Growth form',data=data, hue ='Ailment cured',palette='rainbow')
plt.show()
 
#Boxen plots
plt.figure(figsize=(8,6))
sns.boxenplot(x='Citation',y='Growth form',data=data, hue ='Part used', palette='rainbow')
plt.show()
 
plt.figure(figsize=(8,6))
sns.boxenplot(x='Citation',y='Part used',data=data, hue ='Ailment cured', palette='rainbow')
plt.tight_layout()
plt.show()
 
#Bar plots
plt.figure(figsize=(12,6))
sns.barplot(x='Growth form',y='Citation',data=data, palette='rainbow', hue='Part used')
plt.tight_layout()
plt.show()
 
plt.figure(figsize=(12,6))
ax = plt.subplot(111)
sns.barplot(x='Ailment cured',y='Citation',data=data, palette='rainbow', hue='Part used')
plt.tight_layout()
ax.legend(bbox_to_anchor=(0.8, 0.45))
#plt.legend(loc=1)
plt.show()
 
#Point plot
plt.figure(figsize=(10,6))
sns.pointplot(x='Citation',y='Growth form',data=data)
plt.show()
 
plt.figure(figsize=(10,6))
sns.pointplot(x='Citation',y='Growth form',data=data, hue='Part used')
plt.show()
 
plt.figure(figsize=(10,6))
sns.pointplot(x='Citation',y='Growth form',data=data, hue='Part used')
plt.show()
 
plt.figure(figsize=(10,6))
sns.pointplot(x='Citation',y='Growth form',data=data, hue='Ailment cured')
plt.show()
 
#Count plot
plt.figure(figsize=(10,6))
sns.countplot(x='Growth form',data=data, palette='rainbow')
plt.show()
 
plt.figure(figsize=(10,6))
sns.countplot(x='Growth form',data=data, hue='Part used', palette='rainbow')
plt.legend(loc=1)
plt.show()
 
plt.figure(figsize=(10,6))
sns.countplot(x='Growth form',data=data, hue='Ailment cured', palette='rainbow')
plt.legend(loc=2)
plt.show()
 
 
#Strip plot - Categorical Scatter Plots
plt.figure(figsize=(12,8))
sns.stripplot(x='Citation', y='Growth form', data=data, jitter=True, hue= 'Part used', dodge=True, palette='viridis')
plt.show()
 
#Swarm plots
plt.figure(figsize=(10,6))
sns.swarmplot(x='Citation', y='Ailment cured', data=data, hue='Growth form', dodge=True, palette='viridis')
plt.tight_layout()
plt.show()
 
"""
#Combining plots
plt.figure(figsize=(12,8))
sns.violinplot(x='Citation',y="Growth form", data=data, hue='Part used', dodge='True', palette='rainbow')
sns.swarmplot(x='Citation',y="Growth form", data=data, hue='Part used', dodge='True', color='grey', alpha=.8, s=4)
plt.show()
 
#Plot 2
plt.figure(figsize=(12,8))
sns.boxplot(x='Citation',y='Part used',hue='Growth form',data=data, palette='rainbow')
sns.swarmplot(x='Citation',y='Part used',hue='Growth form', dodge=True,data=data, alpha=.8,color='grey',s=4)
 
#Plot 3
plt.figure(figsize=(12,7))
sns.barplot(x='Growth form',y='Citation',data=data, palette='rainbow', hue='Part used')
sns.stripplot(x='Growth form',y="Citation",data=data, hue='Citation', dodge='True', color='grey', alpha=.8, s=2)
plt.show()
 
#Faceting Data with Catplot
#https://towardsdatascience.com/a-complete-guide-to-plotting-categorical-variables-with-seaborn-bfe54db66bec
g = sns.catplot(x='Citation',y='Growth form', col = 'Local name', data=data,
            kind='bar', aspect=.6, palette='Set2')
(g.set_axis_labels("Class", "Survival Rate")
.set_titles("{col_name}")
.set(ylim=(0,1)))
plt.tight_layout()
plt.savefig('seaborn_catplot.png', dpi=1000)
"""
 
categorical_features = ["Growth form", "Part used", "Ailment cured", "Citation"]
fig, ax = plt.subplots(1, len(categorical_features), figsize=(16,8))
for i, categorical_feature in enumerate(data[categorical_features]):
    data[categorical_feature].value_counts().plot(kind="bar", ax=ax[i]).set_title(categorical_feature)
plt.tight_layout()
plt.show()
 
"""
#print(data)
#print(data['Local Name'])
data['Growth form'].value_counts().plot(kind='bar')
plt.show()
#data['Growth form'].value_counts().plot(kind='hist')
 
plt.figure()
from statsmodels.graphics.mosaicplot import mosaic
plt.rcParams['font.size'] = 16.0
mosaic(data, ['Growth form', 'Part used']);
plt.show()
"""
plt.figure()
sns.barplot(x=df['Growth form'].head(3),y=df['Citation'],data=df)
plt.show()
 
#Add frequencies/counts and percentages on bar tops
total = float(len(data))
print(total)
plt.figure(figsize=(10,6))
ax = sns.countplot(x='Growth form',data=data, hue='Part used', palette='rainbow')
 
for p in ax.patches[0:]:
    h = p.get_height()
    x = p.get_x()+p.get_width()/2.0
    if h != 0:
        ax.annotate("%g" % p.get_height(), xy=(x,h-0.19), xytext=(0,4), rotation=0,
                   textcoords="offset points", ha="center", va="bottom", color='green')
 
for p in ax.patches:
    percentage = '{:.2f}%'.format(100 * p.get_height()/total)
    x = p.get_x() + p.get_width()
    y = p.get_height()
    if y != 0:
        ax.annotate(percentage, (x-0.02, y+0.45),ha='center', rotation=90, color='red')
 
plt.tight_layout()
plt.ylabel("Counts")
plt.legend(loc=1)
plt.gcf().text(0.1, 0.77, textstr, fontsize=14, color='green')
plt.show()
plt.clf()
 
plt.figure(figsize=(10,6))
ax = sns.countplot(x='Growth form',data=data, hue='Ailment cured', palette='rainbow')
for p in ax.patches[0:]:
    h = p.get_height()
    x = p.get_x()+p.get_width()/2.0
    if h != 0:
        ax.annotate("%g" % p.get_height(), xy=(x,h-0.19), xytext=(0,4), rotation=0,
                textcoords="offset points", ha="center", va="bottom", color='green')
 
for p in ax.patches:
    percentage = '{:.2f}%'.format(100 * p.get_height()/total)
    x = p.get_x() + p.get_width()
    y = p.get_height()
    if y != 0:
        ax.annotate(percentage, (x-0.06, y+0.30),ha='center', rotation=90, color='red')
 
plt.tight_layout()
plt.ylabel("Counts")
plt.legend(loc=1)
plt.gcf().text(0.1, 0.77, textstr, fontsize=14, color='green')
plt.show()
plt.clf()
 
#Add frequencies only
plt.figure(figsize=(10,6))
ax = sns.countplot(x='Growth form',data=data, hue='Part used', palette='rainbow')
 
for p in ax.patches[0:]:
    h = p.get_height()
    x = p.get_x()+p.get_width()/2.0
    if h != 0:
        ax.annotate("%g" % p.get_height(), xy=(x,h), xytext=(0,4), rotation=0,
                   textcoords="offset points", ha="center", va="bottom", color='green')
plt.tight_layout()
plt.ylabel("Counts")
plt.legend(loc=1)
plt.gcf().text(0.1, 0.77, textstr, fontsize=14, color='green')
plt.show()
plt.clf()
 
#Add percentages only
plt.figure(figsize=(10,6))
ax = sns.countplot(x='Growth form',data=data, hue='Part used', palette='rainbow')
 
for p in ax.patches:
    percentage = '{:.2f}%'.format(100 * p.get_height()/total)
    x = p.get_x() + p.get_width()/2.0
    h = p.get_height()
    if h !=0:
        ax.annotate(percentage, xy=(x,h+0.1), ha="center", va="bottom", rotation=90, color='red') #textcoords="offset points",
 
plt.tight_layout()
plt.ylabel("Counts")
plt.legend(loc=1)
plt.gcf().text(0.1, 0.77, textstr, fontsize=14, color='green')
plt.show()
plt.clf()
 
#Add frequencies only
plt.figure(figsize=(10,6))
ax = sns.countplot(x='Growth form',data=data, hue='Ailment cured', palette='rainbow')
for p in ax.patches[0:]:
    h = p.get_height()
    x = p.get_x()+p.get_width()/2.0
    if h != 0:
        ax.annotate("%g" % p.get_height(), xy=(x,h), xytext=(0,4), rotation=0,
                textcoords="offset points", ha="center", va="bottom", color='green')
 
plt.tight_layout()
plt.ylabel("Counts")
plt.legend(loc=1)
plt.gcf().text(0.1, 0.77, textstr, fontsize=14, color='green')
plt.show()
plt.clf()
 
#Add percentages only
plt.figure(figsize=(10,6))
ax = sns.countplot(x='Growth form',data=data, hue='Ailment cured', palette='rainbow')
 
for p in ax.patches:
    percentage = '{:.2f}%'.format(100 * p.get_height()/total)
    x = p.get_x() + p.get_width()/2.0
    h = p.get_height()
    if h !=0:
        ax.annotate(percentage, xy=(x,h+0.1), ha="center", va="bottom", rotation=90, color='red') #textcoords="offset points",
 
plt.tight_layout()
plt.ylabel("Counts")
plt.legend(loc=1)
plt.gcf().text(0.1, 0.77, textstr, fontsize=14, color='green')
plt.show()
plt.clf()

Posted: **Thu May 06, 2021 7:05 am**

We can collect data with Google Survey forms and post them into a spreadsheet, and then extract statistics, visualize and analyze results with TSSFL ODF Tools. Here is an example:

Posted: **Thu May 06, 2021 7:09 am**

Posted: **Thu May 06, 2021 7:45 am**

Here is the code for the TSSFL ODF Google Survey Form responses:

Code: [Select all] [Expand/Collapse]

#Plot some graph
#Import required libraries
import gspread
import urllib.request
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
 
#Let's Read Data from Google Sheets into Pandas without the Google Sheets API
#Useful for multiple sheets
 
sheet_id = "1UnkRYcOhLFMgyT_CzByvupvdaD5cL5b_nCcOeoy1uy8"
sheet_name = "Sheet1"
url = f"https://docs.google.com/spreadsheets/d/{sheet_id}/gviz/tq?tqx=out:csv&sheet={sheet_name}"
 
#Use for single sheet
#url = "https://docs.google.com/spreadsheets/export?id={}&exportFormat=csv".format(sheet_id)
 
#Get Pandas dataframe
dataset = pd.read_csv(url)
print(dataset.columns)
#worksheet = sh.sheet1
#age = worksheet.col_values(3)[1:]
age = dataset["Age"]
print("Ages:", age)
#Height = worksheet.col_values(4)[1:]
Height = dataset["Height"]
print("Heights:", Height)
 
#Pandas is extremely very useful for Google spreadsheets
#Convert the json to Pandas dataframe
#Get all data records as dictionary
#data = worksheet.get_all_records()
#df = pd.DataFrame.from_dict(data)
 
#Let's get some statistics
#age_arr = np.array(age)
#age_array = age_arr.astype(float)
#h_arr = np.array(Height)
#h_array = h_arr.astype(float)
 
print("Average Age:", np.mean(age))
print("Mean Height:", np.mean(Height))
print("Minimum and Maximum Age:", np.min(age), np.max(age))
print("Minimum and Maximum Height:", np.min(Height), np.max(Height))
 
#Let's visualize
#Graph styles and font size
sns.set_style('darkgrid') # darkgrid, white grid, dark, white and ticks
plt.rc('axes', titlesize=18)     # fontsize of the axes title
plt.rc('axes', labelsize=14)    # fontsize of the x and y labels
plt.rc('xtick', labelsize=13)    # fontsize of the tick labels
plt.rc('ytick', labelsize=13)    # fontsize of the tick labels
plt.rc('legend', fontsize=13)    # legend fontsize
plt.rc('font', size=13)          # controls default text sizes
 
#sns list of color plettes
#print(sns.color_palette('deep'), sns.color_palette("pastel"), sns.color_palette("Set2"))
#If your file only has one sheet, replace sheet_url
#sheet_url = “https://docs.google.com/spreadsheets/d/1XqOtPkiE_Q0dfGSoyxrH730RkwrTczcRbDeJJpqRByQ/edit#gid=0"
#url_1 = sheet_url.replace(‘/edit#gid=’, ‘/export?format=csv&gid=’)
 
#print(dataset)
 
#Names = worksheet.col_values(2)[1:]
#Names = data["Name"]
#print(Names)
 
df_names = dataset["Name"]
df_ages = dataset["Age"]
df_Heights = dataset["Height"]
print(df_names)
 
#Preprocessing
plots = dataset.groupby(['Name'], as_index=False).mean(numeric_only=True)
#print(plots)
 
#Bar Plot in MatplotLib with plt.bar()
#Names vs Age
plt.figure(figsize=(10,5), tight_layout=True)
colors = sns.color_palette('pastel')
plt.bar(dataset['Name'], dataset['Age'], color=colors[:5])
plt.xlabel('Name')
plt.xticks(rotation=90)
plt.ylabel('Age')
plt.title('Barplot')
plt.show()
 
#Name Vs Height
plt.figure()
plt.figure(figsize=(10,5), tight_layout=True)
colors = sns.color_palette('deep')
plt.bar(dataset['Name'], dataset['Height'], color=colors[:6])
plt.xlabel('Name')
plt.xticks(rotation=90)
plt.ylabel('Height')
plt.title('Barplot')
plt.show()
 
#Bar Plot in Seaborn with sns.barplot()
plt.figure(figsize=(10,5), tight_layout=True)
ax = sns.barplot(x=dataset['Name'], y=dataset['Age'], palette='pastel', ci=None)
ax.set(title='Barplot with Seaborn', xlabel='Names', ylabel='Age')
plt.xticks(rotation=90)
plt.show()
 
#Barplot grouped data by "n" variables
plt.figure(figsize=(12, 6), tight_layout=True)
ax = sns.barplot(x=dataset['Age'], y=dataset['Height'], hue=dataset['Name'], palette='pastel')
ax.set(title='Age vs Height' ,xlabel='Age', ylabel='Height')
ax.legend(title='Names', title_fontsize='13', loc='upper right')
plt.show()
 
#Histograms with plt.hist() or sns.histplot()
plt.figure(figsize=(10,6), tight_layout=True)
bins = [40, 50, 60, 70, 80]
# matplotlib
plt.hist(dataset['Height'], bins=bins, color=sns.color_palette('Set2')[2], linewidth=2)
plt.title('Histogram')
plt.xlabel('Height (cm)')
plt.ylabel('Count')
# seaborn
ax = sns.histplot(data=dataset, x='Height', bins=bins, color=sns.color_palette('Set2')[2], linewidth=2)
ax.set(title='Histogram', xlabel='Height (cm)', ylabel='Count')
plt.show()
 
#Boxplot
plt.figure(figsize=(10,6), tight_layout=True)
ax = sns.boxplot(data=dataset, x='Name', y='Age', palette='Set2', linewidth=2.5)
ax.set(title='Boxplot', xlabel='Names', ylabel='Age (Years)')
plt.xticks(rotation=90)
plt.show()
 
#Scatter plot
plt.figure(figsize=(10,6), tight_layout=True)
ax = sns.scatterplot(data=dataset, x='Age', y='Height',   hue='Name', palette='Set2', s=60)
ax.set(xlabel='Age (Years)', ylabel='Height (cms)')
ax.legend(title='People', title_fontsize = 12)
plt.show()
 
#Something else
pivot = dataset.groupby(['Name'], as_index=False).mean(numeric_only=True)
relationship = pivot.loc[:,"Age":"Height"]
print(relationship)
 
#Plot some graph
charts = ["bar", "line", "barh", "hist", "box", "kde", "density", "area"]
for chart_type in charts:
    relationship.plot(kind="%s" % chart_type) #Replace bar with line, barh, hist, box, kde, density, area
    plt.title("%s plot" % chart_type)
    plt.show()
 
#Seaborn
plt.figure()
sns.set_style("darkgrid")
sns.lineplot(data = dataset, x = "Age", y = "Height")
plt.show()
 
plt.figure()
sns.set_style("darkgrid")
sns.lineplot(data = dataset, x = "Height", y = "Age")
plt.show()
 
#replot
plt.figure()
sns.set_theme(style="darkgrid")
sns.relplot(x="Age", y="Height", hue="Name", data=dataset);
plt.show()
 
#Hexbin
#Split the plotting window into 20 hexbins
plt.figure()
nbins = 15
plt.title('Hexbin')
plt.hexbin(dataset["Age"], dataset["Height"], gridsize=nbins, color=colors[:5])
plt.show()
 
#Hexbin 2
#Split the plotting window into 20 hexbins
plt.figure()
nbins = 15
plt.title('Hexbin')
plt.hexbin(dataset["Age"], dataset["Height"], gridsize=nbins, cmap=plt.cm.BuGn_r)
plt.show()
 
#2-D Hist
plt.figure()
plt.title('2-D Histogram')
plt.hist2d(dataset["Age"], dataset["Height"], bins=nbins, color=colors[:6])
plt.show()
 
#2-D Hist 2
plt.figure()
plt.title('2-D Histogram')
plt.hist2d(dataset["Age"], dataset["Height"], bins=nbins, cmap=plt.cm.BuGn_r)
plt.show()
 
#Set variables
x = dataset["Age"]
y = dataset["Height"]
z = dataset["Name"]
 
#Linear Regression
plt.figure()
sns.regplot(x = x, y = y, data=dataset);
plt.show()
 
plt.figure()
sns.jointplot(x=x, y=y, data=dataset, kind="reg");
plt.show()
 
#Set seaborn style
sns.set_style("white")
 
# Basic 2D density plot
plt.figure()
sns.kdeplot(x=x, y=y)
plt.show()
 
# Custom the color, add shade and bandwidth
plt.figure()
sns.kdeplot(x=x, y=y, cmap="Reds", shade=True, bw_adjust=.5)
plt.show()
 
# Add thresh parameter
plt.figure()
sns.kdeplot(x=x, y=y, cmap="Blues", shade=True, thresh=0)
plt.show()
 
#Joint plot
plt.figure()
sns.jointplot(x = x,y = y,data = dataset,kind = 'hex')
plt.show()

Posted: **Thu May 06, 2021 7:48 am**

Every technology/software is callable on every page of the forum even multiple times:

Posted: **Wed Sep 01, 2021 1:44 pm**

ODK Training with Janeth:

TSSFL TECHNOLOGY STACK

TSSFL Stack, ODK Collect, and Google Drive Integration to Collect, Store, Manage, Process and Analyze Data

TSSFL Stack, ODK Collect, and Google Drive Integration to Collect, Store, Manage, Process and Analyze Data

Re: TSSFL ODF, ODK Collect, and Google Drive Integration to Collect, Store, Manage, Process and Analyze Data

Re: TSSFL ODF, ODK Collect, and Google Drive Integration to Collect, Store, Manage, Process and Analyze Data

Re: TSSFL ODF, ODK Collect, and Google Drive Integration to Collect, Store, Manage, Process and Analyze Data

Re: TSSFL ODF, ODK Collect, and Google Drive Integration to Collect, Store, Manage, Process and Analyze Data

Re: TSSFL ODF, ODK Collect, and Google Drive Integration to Collect, Store, Manage, Process and Analyze Data

Re: TSSFL ODF, ODK Collect, and Google Drive Integration to Collect, Store, Manage, Process and Analyze Data

Re: TSSFL ODF, ODK Collect, and Google Drive Integration to Collect, Store, Manage, Process and Analyze Data

Re: TSSFL ODF, ODK Collect, and Google Drive Integration to Collect, Store, Manage, Process and Analyze Data

Re: TSSFL ODF, ODK Collect, and Google Drive Integration to Collect, Store, Manage, Process and Analyze Data