Data Preprocessing/Cleaning with TSSFL Stack, Pandas, and Gspread

Active Topics

- by Eli 1 day ago All in One: YouTube, TED, X, Facebook and Instagram Reels, Videos, Images and Text Posts View the latest post Replies 332 Views 40417
- by Eli 1 day ago Iran's President Ebrahim Raisi Aged 63 Dies in a Helicopter Crash View the latest post Replies 3 Views 63
- by Eli 1 day ago Re: What is in Your Mind? View the latest post Replies 717 Views 307169
- by Eli 3 days ago PySpark for Large Data Processing View the latest post Replies 2 Views 8172
- by Eli 3 days ago Online Bible View the latest post Replies 3 Views 23330
- by Eli 3 days ago Generating SSH Key and Adding it to the ssh-agent for Authentication on GitHub View the latest post Replies 1 Views 488
- by Eli 1 week ago Russia Invades Ukraine View the latest post Replies 663 Views 240974
- by Eli 2 weeks ago President Museveni's Speech During International Development Association (IDA) Summit View the latest post Replies 1 Views 509
- by Eli 2 weeks ago From Simple Linear Regression Analysis to Covariance & Correlation to Independent Determinant, and R-Squared View the latest post Replies 11 Views 25144
- by Eli 2 weeks ago Collection of Greatest Christian Hymns of all Times View the latest post Replies 34 Views 72570

Data Preprocessing/Cleaning with TSSFL Stack, Pandas, and Gspread

1 post • Page 1 of 1

Eli: Senior Expert Member; Reactions: 183; Posts: 5410; Joined: 9 years ago; Location: Tanzania; Has thanked: 75 times; Been thanked: 88 times; Contact:
Contact Eli

Website

Quote

This topic is an extension of the topics Automating Spreadsheets and Automating Multiple Excel Sheets with TSSFL ODF. It briefly showcases how various data tasks, such as preprocessing/cleaning can be carried out. Data cleaning/preprocessing is an important screening stage before carrying out data analysis. The goal of data preprocessing is to discard irrelevant and redundant or noisy information and unreliable data which can produce misleading results.

The Python snippet below performs the following tasks:

Read data from the spreadsheet by using spreadsheet ID, create a worksheet, and then create a Pandas DataFrame (df1) from the worksheet.
Preprocesses data by creating the second Pandas DataFrame (df2) for which all strings named HIV/AIDS are removed from the column named "Ailment cured", similarly, it renames all strings named "Gonorrhoea, syphilis" to "Gonorrhoea & Syphilis". Note that Python is case-sensitive.
It creates a new spreadsheet and names it "A New Test Spreadsheet". It also creates a worksheet (worksheet2) within the newly created spreadsheet.
It updates worksheet2 by copying the Pandas DataFrame df2 into it.
Finally, it sends an updated spreadsheet to an email.

Code: [Select all] [Expand/Collapse]

import gspread
import urllib.request
urllib.request.urlretrieve("https://www.dropbox.com/s/m728v370159b2xm/credentials.json?dl=1", "credentials.json")
 
gc = gspread.service_account(filename="credentials.json")
sh = gc.open_by_key("1pm1mGdRgpitrYQiGqUNSHPdR43e-ZSXCavYr-TcqtwU") #Open spreadsheet by ID
worksheet = sh.sheet1
 
import pandas as pd
 
df1 = pd.DataFrame(worksheet.get_all_records())
#Preprocess Data
df2 = df1[~df1['Ailment cured'].isin(['HIV/AIDS'])]
#df['Ailment cured'] = df['Ailment cured'].replace({'Gonorrhoea, syphilis':'Gonorrhoea & Syphilis'})
df2["Ailment cured"] = df2['Ailment cured'].replace('Gonorrhoea, syphilis', 'Gonorrhoea & Syphilis')
print(df2)
 
#Let's create a new blank spreadsheet:
 
sh2 = gc.create('A New Test Spreadsheet')
worksheet2 = sh2.sheet1
 
#Let's write df2 to a new worksheet
worksheet2.update([df2.columns.values.tolist()] + df2.values.tolist())
#Share the new worksheet to an email:
sh2.share('ey@tssfl.co', perm_type='user', role='writer')
 
#Finally,
import os
os.remove("credentials.json")
#We can also combine/concatenate sheets

See more functionalities at Examples of gspread Usage

TSSFL -- A Creative Journey Towards Infinite Possibilities!

Post Reply

1 post • Page 1 of 1

Return to “Python Programming”

Information

Who is online

Users browsing this forum: No registered users and 5 guests

Data Preprocessing/Cleaning with TSSFL Stack, Pandas, and Gspread

Who is online

Login • Register