Combine Multiple Google Sheets to One Master Sheet Using Column Names

Including Cython, Jython, IronPython, PyPy, Django framework, and interpreters: Ipython, IPython Jupyter/Notebook, CPython


Post Reply
User avatar
Eli
Senior Expert Member
Reactions: 183
Posts: 5334
Joined: 9 years ago
Location: Tanzania
Has thanked: 75 times
Been thanked: 88 times
Contact:

#1

This topic builds on the two previous topics:

1. Automating the Google Spreadsheet Tasks with Python and TSSFL Open Discussion Forums -- This topic explains the authentication process and how to work with Google Spreadsheets from TSSFL ODF.
2. Automate Multiple Excel Sheets and Produce Reports Using Python -- This topic is about combining multiple excel sheets and automating reports from the combined excel data.

In this topic, we present a Python snippet below that is able to do the following:
  • Read data from several Google sheets/worksheets and create Pandas dataframe for each.
  • Combine the dataframes created in the previous step.
  • Clean data to remove unrequired values.
  • Store the data from the combined dataframe into a new Google sheet -- Master Sheet.
In contrast to the previous topics, we open our spreadsheet by URL:


  1. import gspread
  2. import urllib.request
  3. urllib.request.urlretrieve("https://www.dropbox.com/s/m728v370159b2xm/credentials.json?dl=1", "credentials.json")
  4.  
  5. gc = gspread.service_account(filename="credentials.json")
  6. wb = gc.open_by_url("https://docs.google.com/spreadsheets/d/1O19ketp1DpZwUytV_WqAlxafOBfwiCHiR-0F5eTir6k/edit#gid=0") #Open spreadsheet,
  7.  
  8. #print(wb.worksheets()) #Try wb.worksheets
  9. #print(wb.title)
  10. import pandas as pd
  11. list_of_dataframes = []
  12. for ws in wb.worksheets():
  13.     if ws.title != "Master_Sheet":
  14.         rows = ws.get_all_values()
  15.         #Create data frames
  16.         df = pd.DataFrame.from_records(rows[1:], columns=rows[0]) #Omit the first/header row, pass columns
  17.         list_of_dataframes.append(df)
  18.  
  19. #Print the first dataframe
  20. #print(list_of_dataframes[0])
  21.  
  22. combined_dataframes = pd.concat(list_of_dataframes)
  23. print(combined_dataframes)
  24. #Do some cleaning, replace NAN with blank string
  25. clean_df = combined_dataframes.fillna("") #Try combined_dataframes.dropna("")
  26.  
  27. #Let's create a list of lists for the dataframes including columns
  28. combined_Data = [clean_df.columns.tolist()] + clean_df.to_numpy().tolist()
  29.  
  30. mastersheet = wb.worksheet("Master_Sheet")
  31.  
  32. #Let's update the Master Sheet, start pasting from cell A1
  33. mastersheet.update("A1", combined_Data, value_input_option="USER_ENTERED")
  34.  
  35. import os
  36. os.remove("credentials.json")



Here the data in Sheet1 and Sheet2 have been combined by column names to a data in the Master_Sheet:





See related approach that uses Google Colab and a different authentication process:

0
TSSFL -- A Creative Journey Towards Infinite Possibilities!
Post Reply

Return to “Python Programming”

  • Information
  • Who is online

    Users browsing this forum: No registered users and 16 guests