How to Generate PDF Reports with Pandas, Jinja and WeasyPrint

#1

Pandas is a powerful tool for manipulating large amounts of data, summarizing it, and producing a clean report. Pandas supports output to CSV, Excel, HTML, JSON, and more. In this topic, we programmatically describe how to combine multiple pieces of data. See a different approach,Automate Multiple Excel Sheets and Produce Reports Using Python, and also take a look at How to Use Python Pandas Pivot Table for Data Presentation and Analysis into an HTML template and then converting it to a standalone PDF document using Jinja templates and WeasyPrint. We will use Jinja and HTML as templating tools for generating structured data, and WeasyPrint for producing a pdf report.

Jinja templating is very powerful and has advanced features, it is usually used as a companion tool by Django and Flask for developing various Python and web applications.

In order to use Jinja, we need to create a template, add variables into the context of the template, and render the template into HTML. Let's call such a simple template report.html:

Code: [Select all] [Expand/Collapse]

<!DOCTYPE html>
<html>
<head lang="en">
    <meta charset="UTF-8">
    <title>{{ title }}</title>
</head>
<body>
    <h2>Sales Report</h2>
     {{pivot_table }}
</body>
</html>

In this template, {{ title }} and {{ pivot_table }} are placeholders for variables that we will supply when we render the document.

Here is the whole Python code (Run this code here):

Code: [Select all] [Expand/Collapse]

#Import the required libraries
from __future__ import print_function
import pandas as pd
import numpy as np
import urllib.request
#print("This is Pandas version:", pd.__version__)
 
#Download and read the data into DataFrame
 
#Download and read the data into DataFrame
df = urllib.request.urlretrieve("https://www.dropbox.com/s/s80a85w8j92szqu/sales-sheet.xlsx?dl=1", "sales-sheet.xlsx")
#df = urllib.request.urlretrieve("https://tssfl.com/download/file.php?id=1243", "sales-sheet.xlsx")
df = pd.read_excel("./sales-sheet.xlsx")
#print(df)
#print(df.head())
 
#Pivot the data to summarize.
sales_report = pd.pivot_table(df, index=["Manager", "Rep", "Product"], values=["Price", "Quantity"],
                           aggfunc=[np.sum, np.mean], fill_value=0)
#print(sales_report.head())
 
#Show some statistics, for example, the average quantity and price for CPU and Software sales
print(df[df["Product"]=="CPU"]["Quantity"].mean())
print(df[df["Product"]=="CPU"]["Price"].mean())
print(df[df["Product"]=="Software"]["Quantity"].mean())
print(df[df["Product"]=="Software"]["Price"].mean())
 
#Let's create a Jinja environment and get the report.html template
 
from jinja2 import Environment, FileSystemLoader
env = Environment(loader=FileSystemLoader('.'))
urllib.request.urlretrieve("https://www.dropbox.com/s/djpii5trdesfb4x/report.html?dl=1", "report.html")
template = env.get_template("./report.html")
 
"""env variable above shows how we pass content to the template
We create a dictionary called temp_variables that
contains all the variable we want to pass to the template"""
 
temp_variables = {"title" : "Sales Report",
                 "pivot_table": sales_report.to_html()}
 
#Finally, let's render the HTML with the variables included in the output
#This will create a string that we will eventually pass to our PDF creation engine - WeasyPrint
 
html_output = template.render(temp_variables)
 
#Generate PDF
#We create a pdf by passing string to the PDF generator
from weasyprint import HTML
HTML(string=html_output).write_pdf("report.pdf")
import os
os.remove("sales-sheet.xlsx")
os.remove("report.html")

Next time we will see how to beautify the pdf reports.

The output pdf report is:

Reference: PBpython

#2

Next, we apply CSS styling to our report in order to beautify it. Here is the full code and the output report2.pdf:

Code: [Select all] [Expand/Collapse]

#Import the required libraries
from __future__ import print_function
import pandas as pd
import numpy as np
import urllib.request
#print("This is Pandas version:", pd.__version__)
 
#Download and read the data into DataFrame
 
#Download and read the data into DataFrame
df = urllib.request.urlretrieve("https://www.dropbox.com/s/s80a85w8j92szqu/sales-sheet.xlsx?dl=1", "sales-sheet.xlsx")
#df = urllib.request.urlretrieve("https://tssfl.com/download/file.php?id=1243", "sales-sheet.xlsx")
df = pd.read_excel("./sales-sheet.xlsx")
#print(df)
#print(df.head())
 
#Pivot the data to summarize.
sales_report = pd.pivot_table(df, index=["Manager", "Rep", "Product"], values=["Price", "Quantity"],
                           aggfunc=[np.sum, np.mean], fill_value=0)
#print(sales_report.head())
 
#Show some statistics, for example, the average quantity and price for CPU and Software sales
print(df[df["Product"]=="CPU"]["Quantity"].mean())
print(df[df["Product"]=="CPU"]["Price"].mean())
print(df[df["Product"]=="Software"]["Quantity"].mean())
print(df[df["Product"]=="Software"]["Price"].mean())
 
#Let's create a Jinja environment and get the report.html template
 
from jinja2 import Environment, FileSystemLoader
env = Environment(loader=FileSystemLoader('.'))
urllib.request.urlretrieve("https://www.dropbox.com/s/djpii5trdesfb4x/report.html?dl=1", "report.html")
template = env.get_template("./report.html")
 
"""env variable above shows how we pass content to the template
We create a dictionary called temp_variables that
contains all the variable we want to pass to the template"""
 
temp_variables = {"title" : "Sales Report",
                 "pivot_table": sales_report.to_html()}
 
#Finally, let's render the HTML with the variables included in the output
#This will create a string that we will eventually pass to our PDF creation engine - WeasyPrint
 
html_output = template.render(temp_variables)
 
#Generate PDF
#We create a pdf by passing string to the PDF generator
from weasyprint import HTML
HTML(string=html_output).write_pdf("report1.pdf")
 
#Apply stylesheet
ss = urllib.request.urlretrieve("https://www.dropbox.com/s/xd7kk9t17sjfwrr/style.css?dl=1", "style.css")
HTML(string=html_output).write_pdf("report2.pdf", stylesheets=["./style.css"])
 
import os
os.remove("sales-sheet.xlsx")
os.remove("report.html")
os.remove("style.css")

PDF report:

TSSFL TECHNOLOGY STACK

How to Generate PDF Reports with Pandas, Jinja and WeasyPrint

Who is online

How to Generate PDF Reports with Pandas, Jinja and WeasyPrint

Who is online

Login • Register