TSSFL TECHNOLOGY STACK

Posted: **Tue May 04, 2021 6:55 am**

This topic is an extension of the topics Automating Spreadsheets and Automating Multiple Excel Sheets with TSSFL ODF. It briefly showcases how various data tasks, such as preprocessing/cleaning can be carried out. Data cleaning/preprocessing is an important screening stage before carrying out data analysis. The goal of data preprocessing is to discard irrelevant and redundant or noisy information and unreliable data which can produce misleading results.

The Python snippet below performs the following tasks:

Read data from the spreadsheet by using spreadsheet ID, create a worksheet, and then create a Pandas DataFrame (df1) from the worksheet.
Preprocesses data by creating the second Pandas DataFrame (df2) for which all strings named HIV/AIDS are removed from the column named "Ailment cured", similarly, it renames all strings named "Gonorrhoea, syphilis" to "Gonorrhoea & Syphilis". Note that Python is case-sensitive.
It creates a new spreadsheet and names it "A New Test Spreadsheet". It also creates a worksheet (worksheet2) within the newly created spreadsheet.
It updates worksheet2 by copying the Pandas DataFrame df2 into it.
Finally, it sends an updated spreadsheet to an email.

Code: [Select all] [Expand/Collapse]

import gspread
import urllib.request
urllib.request.urlretrieve("https://www.dropbox.com/s/m728v370159b2xm/credentials.json?dl=1", "credentials.json")
 
gc = gspread.service_account(filename="credentials.json")
sh = gc.open_by_key("1pm1mGdRgpitrYQiGqUNSHPdR43e-ZSXCavYr-TcqtwU") #Open spreadsheet by ID
worksheet = sh.sheet1
 
import pandas as pd
 
df1 = pd.DataFrame(worksheet.get_all_records())
#Preprocess Data
df2 = df1[~df1['Ailment cured'].isin(['HIV/AIDS'])]
#df['Ailment cured'] = df['Ailment cured'].replace({'Gonorrhoea, syphilis':'Gonorrhoea & Syphilis'})
df2["Ailment cured"] = df2['Ailment cured'].replace('Gonorrhoea, syphilis', 'Gonorrhoea & Syphilis')
print(df2)
 
#Let's create a new blank spreadsheet:
 
sh2 = gc.create('A New Test Spreadsheet')
worksheet2 = sh2.sheet1
 
#Let's write df2 to a new worksheet
worksheet2.update([df2.columns.values.tolist()] + df2.values.tolist())
#Share the new worksheet to an email:
sh2.share('ey@tssfl.co', perm_type='user', role='writer')
 
#Finally,
import os
os.remove("credentials.json")
#We can also combine/concatenate sheets

See more functionalities at Examples of gspread Usage

TSSFL TECHNOLOGY STACK

Data Preprocessing/Cleaning with TSSFL Stack, Pandas, and Gspread

Data Preprocessing/Cleaning with TSSFL Stack, Pandas, and Gspread