TSSFL Stack, ODK Collect, and Google Drive Integration to Collect, Store, Manage, Process and Analyze Data
Posted: Mon Apr 19, 2021 1:37 am
The Open Data Kit (ODK) is a free, open-source suite of tools that allows data collection using Android mobile devices and data submission to an online server, even without an Internet connection or mobile communication services at the time of data collection. ODK Collect replaces the traditional paper forms with electronic forms that allow text, numeric data, GPS, photo, video, barcodes, and audio uploads to an online server. The Open Data Kit has become a standard software that helps organizations and authors/researchers to collect and manage mobile data collection solutions. ODK is mentioned by Google among tools that create new knowledge, raise awareness, or enable people to take action to change the world.
This is a summary of the integration of a versatile set of tools and systems, featuring the TSSFL Stack, the Open Data Kit (ODK) Collect, and Google Drive to collect, store, manage, process, and analyze data. All these three architectures are designed for teams and collaborations. ODK Collect allows multiple team members to collect data with their android phones, at different times, paces, and locations, and yet the collected data is sent to the same Google spreadsheet for storage and management. From Google spreadsheet, the data is then programmatically acquired by TSSFL ODF for processing and analysis. TSSFL ODF does this while offering various collaboration options among the team members in question. Traditionally, ODK mainly worked with KoBoToolbox - a suite of tools for field data collection for use in challenging environments. However, the main collection application used by the KoBoToolbox is built/compatible with the ODK ecosystem. This means any form built for/by ODK Collect should also work for KoBoToolbox and vice-versa, read more.
The whole process is streamlined as follows:
1. Creating a form for data collection and submissions. The form is created using ODK Build at http://build.opendatakit.org/:

2. Linking to Google Drive and hosting the survey form we built with ODK Build via XML in Google Drive so that the project team is able to download it to their Android phones. This step includes creating a Google spreadsheet that is placed where the completed survey responses from the project team will be sent and stored:


4. Collecting data using ODK Collect:


5. Sending the collected data to Google Spreadsheet in Google Drive:


6. Viewing the collected data stored in the spreadsheet (see below).
7. Integrating the spreadsheet into TSSFL Stack for Open Science and Collaborations between teams. This includes embedding the spreadsheet and enabling communication between TSSFL Stack and Google spreadsheets via Google Python APIs (Google Sheets API v4 and Google Drive API):
As the survey continues and more data is collected and submitted to Google sheet via ODK Collect, this spreadsheet will automatically update to contain the latest information.
8. Read, Process and Analyze data over TSSFL ODF with Python, and automate various tasks, see
- Automating the Google Spreadsheet Tasks with Python and TSSFL Stack
- Automate Multiple Excel Sheets and Produce Reports Using Python
- Automate Reports with Python and Pandas, Save the Output to HTML
- How to Use Python Pandas Pivot Table for Data Presentation and Analysis
- How to Generate PDF Reports with Pandas, Jinja and WeasyPrint
The sample code below can read and print data submitted to Google spreadsheet using ODK Collect, it can be extended to do extremely useful analyses using this data:
TSSFL ODF is integrated with various Data Science Tools and Toolboxes for performing almost any data-related task.
Find detailed information regarding ODK Google Drive integration here.
ODK Central's real-time data feed for dashboards, integrations and more:
This is a summary of the integration of a versatile set of tools and systems, featuring the TSSFL Stack, the Open Data Kit (ODK) Collect, and Google Drive to collect, store, manage, process, and analyze data. All these three architectures are designed for teams and collaborations. ODK Collect allows multiple team members to collect data with their android phones, at different times, paces, and locations, and yet the collected data is sent to the same Google spreadsheet for storage and management. From Google spreadsheet, the data is then programmatically acquired by TSSFL ODF for processing and analysis. TSSFL ODF does this while offering various collaboration options among the team members in question. Traditionally, ODK mainly worked with KoBoToolbox - a suite of tools for field data collection for use in challenging environments. However, the main collection application used by the KoBoToolbox is built/compatible with the ODK ecosystem. This means any form built for/by ODK Collect should also work for KoBoToolbox and vice-versa, read more.
The whole process is streamlined as follows:
1. Creating a form for data collection and submissions. The form is created using ODK Build at http://build.opendatakit.org/:
2. Linking to Google Drive and hosting the survey form we built with ODK Build via XML in Google Drive so that the project team is able to download it to their Android phones. This step includes creating a Google spreadsheet that is placed where the completed survey responses from the project team will be sent and stored:
- Go to Edit - > Form Properties
- Fill the Title on Device, Instance Name and Public Key (All these are optional)
- Copy and Paste the URL of the Google Spreadsheet that will collect data in the place of Submission URL
- Click Done
- Next go to File -> Export to XML and Download the form
- Place the form in the same folder in the Google Drive as the Google Spreadsheet that will collect data
4. Collecting data using ODK Collect:
5. Sending the collected data to Google Spreadsheet in Google Drive:
6. Viewing the collected data stored in the spreadsheet (see below).
7. Integrating the spreadsheet into TSSFL Stack for Open Science and Collaborations between teams. This includes embedding the spreadsheet and enabling communication between TSSFL Stack and Google spreadsheets via Google Python APIs (Google Sheets API v4 and Google Drive API):
As the survey continues and more data is collected and submitted to Google sheet via ODK Collect, this spreadsheet will automatically update to contain the latest information.
8. Read, Process and Analyze data over TSSFL ODF with Python, and automate various tasks, see
- Automating the Google Spreadsheet Tasks with Python and TSSFL Stack
- Automate Multiple Excel Sheets and Produce Reports Using Python
- Automate Reports with Python and Pandas, Save the Output to HTML
- How to Use Python Pandas Pivot Table for Data Presentation and Analysis
- How to Generate PDF Reports with Pandas, Jinja and WeasyPrint
The sample code below can read and print data submitted to Google spreadsheet using ODK Collect, it can be extended to do extremely useful analyses using this data:
- #Plot some graph
- #Import required libraries
- import gspread
- import urllib.request
- import numpy as np
- import matplotlib.pyplot as plt
- import seaborn as sns
- import pandas as pd
- """
- urllib.request.urlretrieve("https://www.dropbox.com/s/mqsyfuetv8potvd/credentials.json?dl=1", "credentials.json")
- gc = gspread.service_account(filename="credentials.json")
- sh = gc.open_by_key("1BC48PKPZW71AC6hOn1SNesvZk1PoBjE4wXwl1YlWFpY") #Open spreadsheet,
- #the spreadsheet ID starts with 1019ke.... between "" in the line above
- """
- #Alternative
- #If your file only has one sheet, replace sheet_url
- #sheet_url = "https://docs.google.com/spreadsheets/d/1BC48PKPZW71AC6hOn1SNesvZk1PoBjE4wXwl1YlWFpY/edit#gid=0"
- #url_1 = sheet_url.replace('/edit#gid=', '/export?format=csv&gid=')
- #Use the first two lines for a single sheet -- but the method is very slow for slow connection
- sheet_id = "1BC48PKPZW71AC6hOn1SNesvZk1PoBjE4wXwl1YlWFpY"
- #url = "https://docs.google.com/spreadsheets/export?id={}&exportFormat=csv".format(sheet_id)
- sheet_name = "Sheet1"
- url = f"https://docs.google.com/spreadsheets/d/{sheet_id}/gviz/tq?tqx=out:csv&sheet={sheet_name}"
- data = pd.read_csv(url)
- #worksheet = sh.sheet1
- #Define variables
- var1 = "Name"
- var2 = "Age"
- var3 = "Height"
- #age = worksheet.col_values(3)[1:]
- age = data[var2]
- print("Ages:", age)
- #height = worksheet.col_values(4)[1:]
- height = data[var3]
- print("Heights:", height)
- #Pandas is extremely very useful for Google spreadsheets
- #Convert the json to Pandas dataframe
- #Get all data records as dictionary
- #data = worksheet.get_all_records()
- #df = pd.DataFrame.from_dict(data)
- #Let's get some statistics
- #age_arr = np.array(age)
- #age_array = age_arr.astype(float)
- #h_arr = np.array(height)
- #h_array = h_arr.astype(float)
- print("Average Age:", np.mean(age))
- print("Mean Height:", np.mean(height))
- print("Minimum and Maximum Age:", np.min(age), np.max(age))
- print("Minimum and Maximum Height:", np.min(height), np.max(height))
- #Let's visualize
- #Graph styles and font size
- sns.set_style('darkgrid') # darkgrid, white grid, dark, white and ticks
- plt.rc('axes', titlesize=18) # fontsize of the axes title
- plt.rc('axes', labelsize=14) # fontsize of the x and y labels
- plt.rc('xtick', labelsize=13) # fontsize of the tick labels
- plt.rc('ytick', labelsize=13) # fontsize of the tick labels
- plt.rc('legend', fontsize=13) # legend fontsize
- plt.rc('font', size=13) # controls default text sizes
- #sns list of color plettes
- #print(sns.color_palette('deep'), sns.color_palette("pastel"), sns.color_palette("Set2"))
- #Let's Read Data from Google Sheets into Pandas without the Google Sheets API
- #Useful for multiple sheets
- #sheet_id = "1BC48PKPZW71AC6hOn1SNesvZk1PoBjE4wXwl1YlWFpY"
- #sheet_name = "Sheet1"
- #url = f"https://docs.google.com/spreadsheets/d/{sheet_id}/gviz/tq?tqx=out:csv&sheet={sheet_name}"
- #If your file only has one sheet, replace sheet_url
- #sheet_url = “https://docs.google.com/spreadsheets/d/1XqOtPkiE_Q0dfGSoyxrH730RkwrTczcRbDeJJpqRByQ/edit#gid=0"
- #url_1 = sheet_url.replace(‘/edit#gid=’, ‘/export?format=csv&gid=’)
- #Get Pandas dataframe
- dataset = pd.read_csv(url)
- #print(dataset)
- #Names = worksheet.col_values(2)[1:]
- #Names = data[var1]
- #print(Names)
- df_names = dataset[var1]
- df_ages = dataset[var2]
- df_heights = dataset[var3]
- print(df_names)
- #Preprocessing
- plots = dataset.groupby([var1], as_index=False).mean()
- #print(plots)
- #Bar Plot in MatplotLib with plt.bar()
- #Names vs Age
- plt.figure(figsize=(10,5), tight_layout=True)
- colors = sns.color_palette('pastel')
- plt.bar(dataset[var1], dataset[var2], color=colors[:5])
- plt.xlabel(var1)
- plt.xticks(rotation=90)
- plt.ylabel('Age')
- plt.title('Barplot')
- plt.show()
- #Name Vs Height
- plt.figure()
- plt.figure(figsize=(10,5), tight_layout=True)
- colors = sns.color_palette('deep')
- plt.bar(dataset[var1], dataset[var3], color=colors[:6])
- plt.xlabel(var1)
- plt.xticks(rotation=90)
- plt.ylabel('Height')
- plt.title('Barplot')
- plt.show()
- #Bar Plot in Seaborn with sns.barplot()
- plt.figure(figsize=(10,5), tight_layout=True)
- ax = sns.barplot(x=dataset[var1], y=dataset[var2], palette='pastel', ci=None)
- ax.set(title='Barplot with Seaborn', xlabel='Names', ylabel='Age')
- plt.xticks(rotation=90)
- plt.show()
- #Barplot grouped data by "n" variables
- plt.figure(figsize=(12, 6), tight_layout=True)
- ax = sns.barplot(x=dataset[var2], y=dataset[var3], hue=dataset[var1], palette='pastel')
- ax.set(title='Age vs Height' ,xlabel='Age', ylabel='Height')
- ax.legend(title='Names', title_fontsize='13', loc='upper right')
- plt.show()
- #Histograms with plt.hist() or sns.histplot()
- plt.figure(figsize=(10,6), tight_layout=True)
- bins = [160, 165, 170, 175, 180, 185, 190, 195, 200]
- # matplotlib
- plt.hist(dataset[var3], bins=bins, color=sns.color_palette('Set2')[2], linewidth=2)
- plt.title('Histogram')
- plt.xlabel('Height (cm)')
- plt.ylabel('Count')
- # seaborn
- ax = sns.histplot(data=dataset, x=var3, bins=bins, color=sns.color_palette('Set2')[2], linewidth=2)
- ax.set(title='Histogram', xlabel='Height (cm)', ylabel='Count')
- plt.show()
- #Boxplot
- plt.figure(figsize=(10,6), tight_layout=True)
- ax = sns.boxplot(data=dataset, x=var1, y=var2, palette='Set2', linewidth=2.5)
- ax.set(title='Boxplot', xlabel='Names', ylabel='Age (Years)')
- plt.xticks(rotation=90)
- plt.show()
- #Scatter plot
- plt.figure(figsize=(10,6), tight_layout=True)
- ax = sns.scatterplot(data=dataset, x=var2, y=var3, hue=var1, palette='Set2', s=60)
- ax.set(xlabel='Age (Years)', ylabel='Height (cm)')
- ax.legend(title='People', title_fontsize = 12)
- plt.show()
- #Something else
- pivot = dataset.groupby([var1], as_index=False).mean()
- relationship = pivot.loc[:,var2:var3]
- print(relationship)
- #Plot some graph
- charts = ["bar", "line", "barh", "hist", "box", "kde", "density", "area"]
- for chart_type in charts:
- relationship.plot(kind="%s" % chart_type) #Replace bar with line, barh, hist, box, kde, density, area
- plt.title("%s plot" % chart_type)
- plt.show()
- #Seaborn
- plt.figure()
- sns.set_style("darkgrid")
- sns.lineplot(data = dataset, x = var2, y = var3)
- plt.show()
- plt.figure()
- sns.set_style("whitegrid")
- sns.lineplot(data = dataset, x = var2, y = var3)
- plt.show()
- #Hexbin
- #Split the plotting window into 20 hexbins
- plt.figure()
- nbins = 20
- plt.title('Hexbin')
- plt.hexbin(dataset[var2], dataset[var3], gridsize=nbins, color=colors[:6])
- plt.show()
- #2-D Hist
- plt.figure()
- plt.title('2-D Histogram')
- plt.hist2d(dataset[var2], dataset[var3], bins=nbins, color=colors[:5])
- plt.show()
- #Set variables
- x = dataset[var2]
- y = dataset[var3]
- z = dataset[var1]
- #Linear Regression
- plt.figure()
- sns.regplot(x = x, y = y, data=dataset);
- plt.show()
- plt.figure()
- sns.jointplot(x=x, y=y, data=dataset, kind="reg");
- plt.show()
- #Set seaborn style
- sns.set_style("white")
- # Basic 2D density plot
- plt.figure()
- sns.kdeplot(x=x, y=y)
- plt.show()
- # Custom the color, add shade and bandwidth
- plt.figure()
- sns.kdeplot(x=x, y=y, cmap="Reds", shade=True, bw_adjust=.5)
- plt.show()
- # Add thresh parameter
- plt.figure()
- sns.kdeplot(x=x, y=y, cmap="Blues", shade=True, thresh=0)
- plt.show()
- #Joint plot
- plt.figure()
- sns.jointplot(x = x,y = y,data = dataset,kind = 'hex')
- plt.show()
TSSFL ODF is integrated with various Data Science Tools and Toolboxes for performing almost any data-related task.
Find detailed information regarding ODK Google Drive integration here.
ODK Central's real-time data feed for dashboards, integrations and more: