How to Generate PDF Reports with Pandas, Jinja and WeasyPrint

Including Cython, Jython, IronPython, PyPy, Django framework, and interpreters: Ipython, IPython Jupyter/Notebook, CPython


Post Reply
User avatar
Eli
Senior Expert Member
Reactions: 183
Posts: 5326
Joined: 9 years ago
Location: Tanzania
Has thanked: 75 times
Been thanked: 88 times
Contact:

#1

Pandas is a powerful tool for manipulating large amounts of data, summarizing it, and producing a clean report. Pandas supports output to CSV, Excel, HTML, JSON, and more. In this topic, we programmatically describe how to combine multiple pieces of data. See a different approach,Automate Multiple Excel Sheets and Produce Reports Using Python, and also take a look at How to Use Python Pandas Pivot Table for Data Presentation and Analysis into an HTML template and then converting it to a standalone PDF document using Jinja templates and WeasyPrint. We will use Jinja and HTML as templating tools for generating structured data, and WeasyPrint for producing a pdf report.

Jinja templating is very powerful and has advanced features, it is usually used as a companion tool by Django and Flask for developing various Python and web applications.

In order to use Jinja, we need to create a template, add variables into the context of the template, and render the template into HTML. Let's call such a simple template report.html:

  1. <!DOCTYPE html>
  2. <html>
  3. <head lang="en">
  4.     <meta charset="UTF-8">
  5.     <title>{{ title }}</title>
  6. </head>
  7. <body>
  8.     <h2>Sales Report</h2>
  9.      {{pivot_table }}
  10. </body>
  11. </html>


In this template, {{ title }} and {{ pivot_table }} are placeholders for variables that we will supply when we render the document.


Here is the whole Python code (Run this code here):

  1. #Import the required libraries
  2. from __future__ import print_function
  3. import pandas as pd
  4. import numpy as np
  5. import urllib.request
  6. #print("This is Pandas version:", pd.__version__)
  7.  
  8. #Download and read the data into DataFrame
  9.  
  10. #Download and read the data into DataFrame
  11. df = urllib.request.urlretrieve("https://www.dropbox.com/s/s80a85w8j92szqu/sales-sheet.xlsx?dl=1", "sales-sheet.xlsx")
  12. #df = urllib.request.urlretrieve("https://tssfl.com/download/file.php?id=1243", "sales-sheet.xlsx")
  13. df = pd.read_excel("./sales-sheet.xlsx")
  14. #print(df)
  15. #print(df.head())
  16.  
  17. #Pivot the data to summarize.
  18. sales_report = pd.pivot_table(df, index=["Manager", "Rep", "Product"], values=["Price", "Quantity"],
  19.                            aggfunc=[np.sum, np.mean], fill_value=0)
  20. #print(sales_report.head())
  21.  
  22. #Show some statistics, for example, the average quantity and price for CPU and Software sales
  23. print(df[df["Product"]=="CPU"]["Quantity"].mean())
  24. print(df[df["Product"]=="CPU"]["Price"].mean())
  25. print(df[df["Product"]=="Software"]["Quantity"].mean())
  26. print(df[df["Product"]=="Software"]["Price"].mean())
  27.  
  28. #Let's create a Jinja environment and get the report.html template
  29.  
  30. from jinja2 import Environment, FileSystemLoader
  31. env = Environment(loader=FileSystemLoader('.'))
  32. urllib.request.urlretrieve("https://www.dropbox.com/s/djpii5trdesfb4x/report.html?dl=1", "report.html")
  33. template = env.get_template("./report.html")
  34.  
  35. """env variable above shows how we pass content to the template
  36. We create a dictionary called temp_variables that
  37. contains all the variable we want to pass to the template"""
  38.  
  39. temp_variables = {"title" : "Sales Report",
  40.                  "pivot_table": sales_report.to_html()}
  41.  
  42. #Finally, let's render the HTML with the variables included in the output
  43. #This will create a string that we will eventually pass to our PDF creation engine - WeasyPrint
  44.  
  45. html_output = template.render(temp_variables)
  46.  
  47. #Generate PDF
  48. #We create a pdf by passing string to the PDF generator
  49. from weasyprint import HTML
  50. HTML(string=html_output).write_pdf("report.pdf")
  51. import os
  52. os.remove("sales-sheet.xlsx")
  53. os.remove("report.html")


Next time we will see how to beautify the pdf reports.

The output pdf report is:
Reference: PBpython
Attachments
sales-sheet.xlsx
(5.55 KiB) Downloaded 43 times
sales-sheet.xlsx
(5.55 KiB) Downloaded 43 times
0
TSSFL -- A Creative Journey Towards Infinite Possibilities!
User avatar
Eli
Senior Expert Member
Reactions: 183
Posts: 5326
Joined: 9 years ago
Location: Tanzania
Has thanked: 75 times
Been thanked: 88 times
Contact:

#2

Next, we apply CSS styling to our report in order to beautify it. Here is the full code and the output report2.pdf:

  1. #Import the required libraries
  2. from __future__ import print_function
  3. import pandas as pd
  4. import numpy as np
  5. import urllib.request
  6. #print("This is Pandas version:", pd.__version__)
  7.  
  8. #Download and read the data into DataFrame
  9.  
  10. #Download and read the data into DataFrame
  11. df = urllib.request.urlretrieve("https://www.dropbox.com/s/s80a85w8j92szqu/sales-sheet.xlsx?dl=1", "sales-sheet.xlsx")
  12. #df = urllib.request.urlretrieve("https://tssfl.com/download/file.php?id=1243", "sales-sheet.xlsx")
  13. df = pd.read_excel("./sales-sheet.xlsx")
  14. #print(df)
  15. #print(df.head())
  16.  
  17. #Pivot the data to summarize.
  18. sales_report = pd.pivot_table(df, index=["Manager", "Rep", "Product"], values=["Price", "Quantity"],
  19.                            aggfunc=[np.sum, np.mean], fill_value=0)
  20. #print(sales_report.head())
  21.  
  22. #Show some statistics, for example, the average quantity and price for CPU and Software sales
  23. print(df[df["Product"]=="CPU"]["Quantity"].mean())
  24. print(df[df["Product"]=="CPU"]["Price"].mean())
  25. print(df[df["Product"]=="Software"]["Quantity"].mean())
  26. print(df[df["Product"]=="Software"]["Price"].mean())
  27.  
  28. #Let's create a Jinja environment and get the report.html template
  29.  
  30. from jinja2 import Environment, FileSystemLoader
  31. env = Environment(loader=FileSystemLoader('.'))
  32. urllib.request.urlretrieve("https://www.dropbox.com/s/djpii5trdesfb4x/report.html?dl=1", "report.html")
  33. template = env.get_template("./report.html")
  34.  
  35. """env variable above shows how we pass content to the template
  36. We create a dictionary called temp_variables that
  37. contains all the variable we want to pass to the template"""
  38.  
  39. temp_variables = {"title" : "Sales Report",
  40.                  "pivot_table": sales_report.to_html()}
  41.  
  42. #Finally, let's render the HTML with the variables included in the output
  43. #This will create a string that we will eventually pass to our PDF creation engine - WeasyPrint
  44.  
  45. html_output = template.render(temp_variables)
  46.  
  47. #Generate PDF
  48. #We create a pdf by passing string to the PDF generator
  49. from weasyprint import HTML
  50. HTML(string=html_output).write_pdf("report1.pdf")
  51.  
  52. #Apply stylesheet
  53. ss = urllib.request.urlretrieve("https://www.dropbox.com/s/xd7kk9t17sjfwrr/style.css?dl=1", "style.css")
  54. HTML(string=html_output).write_pdf("report2.pdf", stylesheets=["./style.css"])
  55.  
  56. import os
  57. os.remove("sales-sheet.xlsx")
  58. os.remove("report.html")
  59. os.remove("style.css")


PDF report:
0
TSSFL -- A Creative Journey Towards Infinite Possibilities!
Post Reply

Return to “Python Programming”

  • Information
  • Who is online

    Users browsing this forum: No registered users and 1 guest