Data Preprocessing/Cleaning with TSSFL Stack, Pandas, and Gspread

Active Topics

- by Eli 1 day ago Re: What is in Your Mind? View the latest post Replies 689 Views 275747
- by Eli 3 days ago Iran Launches Retaliatory Attack Against Israel, and Israel Retaliates by Attacking Iranian Isfahan Millitary Base View the latest post Replies 28 Views 921
- by Eli 4 days ago All in One: YouTube, TED, X, Facebook and Instagram Reels, Videos, Images and Text Posts View the latest post Replies 319 Views 10621
- by Eli 1 week ago Python Packages for Scientific Computing View the latest post Replies 8 Views 4508
- by Eli 1 week ago Dunia Yetu: Building Tanzania's Digital Future Together View the latest post Replies 5 Views 1873
- by Eli 1 week ago Russia Invades Ukraine View the latest post Replies 646 Views 212108
- by Eli 1 week ago Programmatically Move Files from One Folder to Another View the latest post Replies 6 Views 1439
- by Eli 2 weeks ago Collection of Greatest Christian Hymns of all Times View the latest post Replies 33 Views 45340
- by Eli 2 weeks ago What is Retrieval-Augmented Generation (RAG)? View the latest post Replies 2 Views 368
- by Eli 2 weeks ago Chat With ChatGPT - An Interactive Conversational AI View the latest post Replies 22 Views 25992

Data Preprocessing/Cleaning with TSSFL Stack, Pandas, and Gspread

1 post • Page 1 of 1

Eli: Senior Expert Member; Reactions: 183; Posts: 5334; Joined: 9 years ago; Location: Tanzania; Has thanked: 75 times; Been thanked: 88 times; Contact:
Contact Eli

Website

Quote

This topic is an extension of the topics Automating Spreadsheets and Automating Multiple Excel Sheets with TSSFL ODF. It briefly showcases how various data tasks, such as preprocessing/cleaning can be carried out. Data cleaning/preprocessing is an important screening stage before carrying out data analysis. The goal of data preprocessing is to discard irrelevant and redundant or noisy information and unreliable data which can produce misleading results.

The Python snippet below performs the following tasks:

Read data from the spreadsheet by using spreadsheet ID, create a worksheet, and then create a Pandas DataFrame (df1) from the worksheet.
Preprocesses data by creating the second Pandas DataFrame (df2) for which all strings named HIV/AIDS are removed from the column named "Ailment cured", similarly, it renames all strings named "Gonorrhoea, syphilis" to "Gonorrhoea & Syphilis". Note that Python is case-sensitive.
It creates a new spreadsheet and names it "A New Test Spreadsheet". It also creates a worksheet (worksheet2) within the newly created spreadsheet.
It updates worksheet2 by copying the Pandas DataFrame df2 into it.
Finally, it sends an updated spreadsheet to an email.

Code: [Select all] [Expand/Collapse]

import gspread
import urllib.request
urllib.request.urlretrieve("https://www.dropbox.com/s/m728v370159b2xm/credentials.json?dl=1", "credentials.json")
 
gc = gspread.service_account(filename="credentials.json")
sh = gc.open_by_key("1pm1mGdRgpitrYQiGqUNSHPdR43e-ZSXCavYr-TcqtwU") #Open spreadsheet by ID
worksheet = sh.sheet1
 
import pandas as pd
 
df1 = pd.DataFrame(worksheet.get_all_records())
#Preprocess Data
df2 = df1[~df1['Ailment cured'].isin(['HIV/AIDS'])]
#df['Ailment cured'] = df['Ailment cured'].replace({'Gonorrhoea, syphilis':'Gonorrhoea & Syphilis'})
df2["Ailment cured"] = df2['Ailment cured'].replace('Gonorrhoea, syphilis', 'Gonorrhoea & Syphilis')
print(df2)
 
#Let's create a new blank spreadsheet:
 
sh2 = gc.create('A New Test Spreadsheet')
worksheet2 = sh2.sheet1
 
#Let's write df2 to a new worksheet
worksheet2.update([df2.columns.values.tolist()] + df2.values.tolist())
#Share the new worksheet to an email:
sh2.share('ey@tssfl.co', perm_type='user', role='writer')
 
#Finally,
import os
os.remove("credentials.json")
#We can also combine/concatenate sheets

See more functionalities at Examples of gspread Usage

TSSFL -- A Creative Journey Towards Infinite Possibilities!

Post Reply

1 post • Page 1 of 1

Return to “Python Programming”

Information

Who is online

Users browsing this forum: No registered users and 10 guests

Data Preprocessing/Cleaning with TSSFL Stack, Pandas, and Gspread

Who is online

Login • Register