PDF CHUNK SUMMARIZATION -
Simon-Pierre Boucher
2024-09-14
In [1]:
import os
import requests
import pdfplumber
from fpdf import FPDF
from dotenv import load_dotenv
import json

# Charger les variables d'environnement à partir du fichier .env
load_dotenv()

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

# Limite maximale de tokens pour GPT-4
MAX_TOKENS = 8192

# Fonction pour convertir un PDF en texte avec pdfplumber
def pdf_to_text(pdf_path):
    text = ""
    with pdfplumber.open(pdf_path) as pdf:
        for page in pdf.pages:
            text += page.extract_text() + "\n"
    return text

# Fonction pour diviser le texte en chunks
def split_text_into_chunks(text, max_tokens):
    words = text.split()
    chunks = []
    current_chunk = []
    current_length = 0
    for word in words:
        current_chunk.append(word)
        current_length += len(word) + 1  # +1 pour l'espace
        if current_length >= max_tokens:
            chunks.append(" ".join(current_chunk))
            current_chunk = []
            current_length = 0
    if current_chunk:
        chunks.append(" ".join(current_chunk))
    return chunks

# Fonction maison pour appeler l'API OpenAI
def homemade_openai_api_call(api_key, prompt, model="gpt-4o-mini", temperature=0.7, max_tokens=2000, stop=None):
    url = "https://api.openai.com/v1/chat/completions"
    headers = {
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json"
    }
    body = {
        "model": model,
        "messages": [{"role": "user", "content": prompt}],
        "temperature": temperature,
        "max_tokens": max_tokens
    }
    if stop:
        body["stop"] = stop

    response = requests.post(url, headers=headers, data=json.dumps(body))
    if response.status_code != 200:
        return f"Error: API request failed with status code {response.status_code} - {response.text}"
    
    result = response.json()
    if 'choices' in result and result['choices']:
        return result['choices'][0]['message']['content'].strip()
    else:
        return "Error: Unable to fetch response from OpenAI API"

# Fonction pour générer des résumés pour chaque chunk
def summarize_chunks(chunks):
    summaries = []
    for i, chunk in enumerate(chunks):
        print(f"Processing chunk {i+1}/{len(chunks)}")
        prompt = f"Summarize the following text in one sentence:\n\n{chunk}"
        summary = homemade_openai_api_call(OPENAI_API_KEY, prompt)
        summaries.append(summary)
        print(f"Chunk {i+1} summarized: {summary}")
    return summaries

# Fonction pour générer le résumé final
def final_summary(summaries):
    combined_summary = " ".join(summaries)
    prompt = f"Summarize the following text in a 10000 word summary:\n\n{combined_summary}"
    final_summary = homemade_openai_api_call(OPENAI_API_KEY, prompt)
    return final_summary

# Fonction pour créer un PDF avec le résumé final
def create_pdf(summary, output_pdf):
    pdf = FPDF()
    pdf.add_page()
    pdf.set_font("Arial", size=12)
    pdf.multi_cell(0, 10, summary)
    pdf.output(output_pdf)
In [2]:
# Exemple d'utilisation
pdf_path = "Book.pdf"


# Conversion du PDF en texte
text = pdf_to_text(pdf_path)

# Division du texte en chunks
chunks = split_text_into_chunks(text, MAX_TOKENS // 2)

# Génération des résumés pour chaque chunk
summaries = summarize_chunks(chunks)

# Génération du résumé final
book_summary = final_summary(summaries)
Processing chunk 1/32
Chunk 1 summarized: The document outlines a detailed list of authorized infrastructure projects in Québec for 2024-2034, classified by 19 activity sectors, as part of the 2024-2025 Expenditure Budget, while noting accessibility limitations and providing assistance options.
Processing chunk 2/32
Chunk 2 summarized: The text outlines a comprehensive overview of planned infrastructure investments across various sectors in Québec from 2024 to 2034, detailing specific projects and their funding statuses in areas such as agriculture, culture, education, health, transportation, and community development.
Processing chunk 3/32
Chunk 3 summarized: The text outlines various infrastructure projects in Québec, detailing their status, costs, and planned contributions from the Gouvernement du Québec for the 2024-2034 period, including ongoing maintenance, renovations, and new constructions across multiple regions.
Processing chunk 4/32
Chunk 4 summarized: The text outlines the financial contributions and progress statuses of various cultural and infrastructure projects in Québec for the 2024-2034 period, detailing investments of over $20 million, with a total planned contribution of $2.164 billion from the Gouvernement du Québec.
Processing chunk 5/32
Chunk 5 summarized: The 2024-2034 QIP outlines a total investment of $1.434 billion for various projects in Quebec, including 12 projects valued at $20 million or more, with significant contributions planned for sports development and indigenous initiatives.
Processing chunk 6/32
Chunk 6 summarized: The 2024-2034 Québec Infrastructure Plan (QIP) outlines a total investment of approximately $22.7 billion across 168 projects, including significant contributions for educational and cultural infrastructure, with various projects at different stages of planning and progress.
Processing chunk 7/32
Chunk 7 summarized: The text lists multiple construction, renovation, and expansion projects for primary and secondary schools across various school service centers in Montreal and other regions of Quebec.
Processing chunk 8/32
Chunk 8 summarized: The text lists various school construction, expansion, and renovation projects across different regions in Quebec, highlighting the ongoing efforts to improve educational facilities.
Processing chunk 9/32
Chunk 9 summarized: The text outlines various construction and renovation projects for primary and secondary schools across different regions in Quebec, detailing the planned investments and contributions from the Gouvernement du Québec for the years 2024-2034.
Processing chunk 10/32
Chunk 10 summarized: The text outlines various school construction and renovation projects in Quebec, detailing their financial contributions and statuses across multiple educational institutions.
Processing chunk 11/32
Chunk 11 summarized: The text outlines various construction and renovation projects for primary and secondary schools across multiple regions, detailing their funding contributions and total costs, amounting to 3,405.2 million dollars for 53 projects.
Processing chunk 12/32
Chunk 12 summarized: The text lists various construction, reconstruction, and expansion projects for elementary and secondary schools across different regions in Quebec, detailing specific schools and their respective service centers.
Processing chunk 13/32
Chunk 13 summarized: The text outlines the status updates of various educational infrastructure projects in Québec as part of the 2024-2034 QIP, detailing changes in progress from "under study" to "in progress," as well as financial contributions and project counts in different stages.
Processing chunk 14/32
Chunk 14 summarized: The text outlines various construction and renovation projects at educational institutions in Montréal and surrounding areas, detailing their budgets, statuses, and contributions from the Government of Québec, with a total of 15 higher education projects planned or in progress valued at approximately 1.8 billion dollars.
Processing chunk 15/32
Chunk 15 summarized: The 2024-2034 QIP outlines planned and ongoing investments in social and community housing in Quebec, totaling approximately $3.78 billion across 46 projects, with 20 projects valued at over $20 million currently in the planning stage.
Processing chunk 16/32
Chunk 16 summarized: The text outlines various construction projects for social and community housing across multiple regions in Québec, detailing the number of housing units, project statuses, and financial contributions from the government planned for the 2024-2034 period.
Processing chunk 17/32
Chunk 17 summarized: The Gouvernement du Québec has allocated $104.4 million for four infrastructure projects in the 2024-2034 QIP, including flood protection and wastewater treatment facilities in various regions.
Processing chunk 18/32
Chunk 18 summarized: The 2024-2034 QIP outlines various infrastructure projects in Quebec, including significant investments in road networks and educational facilities, with a total planned expenditure of $34.5 billion across 103 projects.
Processing chunk 19/32
Chunk 19 summarized: The text outlines various construction, reconstruction, and refurbishment projects for roads and bridges across Quebec, detailing specific locations, project phases, and funding contributions planned for the 2024-2034 Quebec Infrastructure Plan, totaling $5.73 billion across 44 projects.
Processing chunk 20/32
Chunk 20 summarized: The text outlines various ongoing and planned road construction and reconstruction projects across Quebec, detailing their costs, phases, and locations.
Processing chunk 21/32
Chunk 21 summarized: The text outlines various infrastructure projects planned or in progress for the 2024-2034 period in Quebec, detailing contributions, costs, and statuses for multiple highway constructions, renovations, and public transit improvements totaling approximately $13.99 billion across 48 projects.
Processing chunk 22/32
Chunk 22 summarized: The Gouvernement du Québec plans to invest a total of approximately $790 million in eight projects and $222.4 million in two projects as part of its 2024-2034 QIP, focusing on digital health, integrated resource management, and various administrative transformations.
Processing chunk 23/32
Chunk 23 summarized: The Gouvernement du Québec plans to invest a total of approximately $23.8 billion in 95 health and social services projects from 2024 to 2034, with various projects at different stages of development and significant funding allocated for projects valued at $20 million or more.
Processing chunk 24/32
Chunk 24 summarized: The text outlines various health and social services construction and renovation projects in Quebec, detailing contributions and planned expenditures totaling approximately $5.4 billion for 35 projects from 2024 to 2034.
Processing chunk 25/32
Chunk 25 summarized: The text outlines various healthcare construction and renovation projects in Montreal and surrounding areas, detailing the funding contributions from Québec for facilities such as hospitals, nursing homes, and rehabilitation centers.
Processing chunk 26/32
Chunk 26 summarized: The text details various construction and expansion projects in healthcare, social services, and tourism sectors in Quebec, with financial figures and project statuses for the 2023-2033 and 2024-2034 periods, highlighting a total planned investment of approximately $1.8 billion across 15 projects.
Processing chunk 27/32
Chunk 27 summarized: The text outlines various infrastructure projects in Québec with budgets of $20 million or more, detailing their progress status (under study, in planning, and in progress), associated government contributions, and a note on projects removed from the published list.
Processing chunk 28/32
Chunk 28 summarized: The text outlines the division of a major project related to the Olympic Park in Montreal into two separate initiatives focusing on the replacement of mechanical and electrical systems, while also detailing various public transit investments planned by the Gouvernement du Québec for 2024-2034.
Processing chunk 29/32
Chunk 29 summarized: The Gouvernement du Québec has planned a total investment of $7,676.7 million for 20 public transit projects in the 2024-2034 QIP, with various contributions from Québec and partners aimed at improving and expanding transportation infrastructure in the region.
Processing chunk 30/32
Chunk 30 summarized: The 2023-2033 QIP has seen changes in project statuses, with some projects moving to "in progress," investments being planned for various transportation sectors, and a total planned investment of $4.69 billion for 2024-2034, while some projects were removed from programming in favor of others.
Processing chunk 31/32
Chunk 31 summarized: The text outlines various infrastructure projects in Quebec, including the refurbishment and reconstruction of railways and ports, with a total investment of approximately 1.54 billion, while also noting changes in project statuses and removals from the planning list.
Processing chunk 32/32
Chunk 32 summarized: The original project encompassing the Édifices Hector-Fabre and Marie-Fitzbach in Québec has been divided into two separate projects, while another project related to the Parc olympique in Montréal has also been split, with the work for one project completed and financial assistance provided in advance, leading to a reallocation of funds to other cultural infrastructure projects.
In [3]:
print(book_summary)
The document provides an extensive overview of the planned infrastructure projects in Québec for the 2024-2034 period, categorizing these projects into 19 different activity sectors as part of the 2024-2025 Expenditure Budget. The focus is on accessibility limitations and available assistance options for these initiatives.

**Investment Overview:**
The planned investments amount to approximately $22.7 billion across 168 projects. Significant attention is given to various sectors, including agriculture, culture, education, health, transportation, and community development. A total investment of $1.434 billion is specifically earmarked for 12 projects valued at $20 million or more, with particular emphasis on sports development and indigenous initiatives.

**Educational Infrastructure:**
The document highlights numerous construction, renovation, and expansion projects for primary and secondary schools throughout Quebec. With a total of 3,405.2 million dollars allocated for 53 educational projects, it details their statuses, financial contributions, and specific service centers. Additionally, 15 higher education projects valued at approximately $1.8 billion are planned or currently ongoing.

**Social and Community Housing:**
A significant investment of around $3.78 billion is projected for social and community housing, encompassing 46 projects. Notably, 20 projects are valued at over $20 million and are currently in the planning phase.

**Transportation Infrastructure:**
Transportation projects represent a major focus, with planned expenditures amounting to $34.5 billion across 103 projects. This includes approximately $5.73 billion for 44 road and bridge projects, as well as around $13.99 billion for highway constructions, renovations, and public transit improvements across 48 projects. The Gouvernement du Québec is set to invest approximately $7.68 billion for 20 public transit projects, aimed at enhancing transportation networks and infrastructure.

**Healthcare and Social Services:**
The document outlines a robust investment of about $23.8 billion in 95 health and social services projects from 2024 to 2034. This includes $5.4 billion for 35 construction and renovation projects specifically related to healthcare facilities, such as hospitals and nursing homes.

**Cultural Infrastructure:**
The cultural sector also sees significant planned investments, with various projects being monitored for progress. Noteworthy is the division of major projects related to the Olympic Park in Montreal into two separate initiatives.

**Conclusion:**
The comprehensive plan for infrastructure development in Québec from 2024 to 2034 indicates a strategic focus on improving various sectors vital for community welfare and economic development. The document emphasizes government contributions, project statuses, and significant funding allocated to ensure progress in education, healthcare, transportation, and community development, thereby shaping a robust infrastructure landscape for the province.
In [ ]: