WeasyPrint + pypdf: видимые, нередактируемые escape-символы в скобках.

WeasyPrint + pypdf: видимые, нередактируемые escape-символы в скобках. ⇐ Python

1 сообщение • Страница 1 из 1

Anonymous

WeasyPrint + pypdf: видимые, нередактируемые escape-символы в скобках.

Цитата

Сообщение Anonymous » 23 дек 2025, 14:12

Я создаю PDF-форму из HTML (WeasyPrint), заполняю поля AcroForm файлом pypdf, а затем объединяю эту страницу с четырехстраничным шаблоном.
Однако, когда данные содержат круглые скобки, средства просмотра PDF-файлов (браузер, Acrobat Reader) отображают экранированные круглые скобки, которые присутствуют только в представлении и поэтому не могут быть удалены:

Режим редактирования:

Что я пробовал: удаление объектов /AP с каждой страницы. Это сработало, но теперь в Acrobat Reader страница выглядит пустой, пока я не щелкну (или не отредактирую) поле, после чего становится видимым только значение этого поля:

Соответствующий фрагмент кода:

Код: Выделить всё

def generate_pdf_file(
a: ObjA,
rows: list[ObjB],
):
# 1) Build a one-page, form-enabled PDF from HTML (WeasyPrint)
form_pdf_bytes = generate_pdf_form(a, rows)  # already filled

# 2) Load filled form (1 page) + template (4 pages)
template_path = django.contrib.staticfiles.find("pdf_file.pdf")
assert template_path, "CMR template file not found"

template_reader = pypdf.PdfReader(template_path)
form_reader = pypdf.PdfReader(io.BytesIO(form_pdf_bytes))

# 3) Create final 4-page PDF: each page starts as the (filled) form page
writer = pypdf.PdfWriter()

for page_num in range(4):
# Clone the single form page
form_page = form_reader.pages[0]
base_page = template_reader.pages[page_num]
base_page.merge_page(form_page)
writer.add_page(base_page)

out_buf = io.BytesIO()
writer.write(out_buf)
return out_buf.getvalue()

def generate_pdf_form(a: ObjA, rows: list[ObjB]):
"""
Render the HTML form with WeasyPrint (with AcroForm fields),
then fill those fields using pypdf and return the filled PDF bytes.
"""
html_form = django.template.loader.render_to_string(
"form.html",
context=dict(data_range=range(DATA_ROWS)),
)

# Create an empty (but form-enabled) PDF
empty_form_pdf = HTML(string=html_form).render(pdf_forms=True).write_pdf()

# Fill fields
form_data = data_method(a, rows)
filled_pdf = fill_pdf_form(io.BytesIO(empty_form_pdf), form_data)
return filled_pdf

def fill_pdf_form(document: io.BytesIO, data: dict) -> bytes:
"""
Fill AcroForm fields using pypdf.
"""
reader = pypdf.PdfReader(document)
writer = pypdf.PdfWriter()

writer.append(reader)

for page in writer.pages:
writer.update_page_form_field_values(page, data, auto_regenerate=False)
### Commented out what worked to remove escape characters, but broke Acrobat Reader view
#     annots = page.get("/Annots")
#     if annots:
#         for annot_ref in cast(List[IndirectObject], annots):
#             annot = annot_ref.get_object()
#             if annot is None:
#                 continue
#             annot_dict = cast(MutableMapping[str, Any], annot)
#             if "/AP" in annot_dict:
#                 del annot_dict["/AP"]

# writer._root_object.update({NameObject("/NeedAppearances"): BooleanObject(True)})
out = io.BytesIO()
writer.write(out)
return out.getvalue()

Есть ли способ или другое обходное решение, позволяющее удалить escape-символы в скобках и сохранить видимость в Acrobat?

Подробнее здесь: https://stackoverflow.com/questions/798 ... arentheses

1766488341

Anonymous

Я создаю PDF-форму из HTML (WeasyPrint), заполняю поля AcroForm файлом pypdf, а затем объединяю эту страницу с четырехстраничным шаблоном.
Однако, когда данные содержат круглые скобки, средства просмотра PDF-файлов (браузер, Acrobat Reader) отображают экранированные круглые скобки, которые присутствуют только в представлении и поэтому не могут быть удалены:

[img]https://i.sstatic.net/2fRlwaaM.jpg[/img]

Режим редактирования:

[img]https://i.sstatic.net/fPCUdK6t.jpg[/img]

Что я пробовал: удаление объектов /AP с каждой страницы. Это сработало, но теперь в Acrobat Reader страница выглядит пустой, пока я не щелкну (или не отредактирую) поле, после чего становится видимым только значение этого поля:

[img]https://i.sstatic.net/FyaHOrTV.jpg[/img]

Соответствующий фрагмент кода:
[code]def generate_pdf_file(
a: ObjA,
rows: list[ObjB],
):
# 1) Build a one-page, form-enabled PDF from HTML (WeasyPrint)
form_pdf_bytes = generate_pdf_form(a, rows)  # already filled

# 2) Load filled form (1 page) + template (4 pages)
template_path = django.contrib.staticfiles.find("pdf_file.pdf")
assert template_path, "CMR template file not found"

template_reader = pypdf.PdfReader(template_path)
form_reader = pypdf.PdfReader(io.BytesIO(form_pdf_bytes))

# 3) Create final 4-page PDF: each page starts as the (filled) form page
writer = pypdf.PdfWriter()

for page_num in range(4):
# Clone the single form page
form_page = form_reader.pages[0]
base_page = template_reader.pages[page_num]
base_page.merge_page(form_page)
writer.add_page(base_page)

out_buf = io.BytesIO()
writer.write(out_buf)
return out_buf.getvalue()

def generate_pdf_form(a: ObjA, rows: list[ObjB]):
"""
Render the HTML form with WeasyPrint (with AcroForm fields),
then fill those fields using pypdf and return the filled PDF bytes.
"""
html_form = django.template.loader.render_to_string(
"form.html",
context=dict(data_range=range(DATA_ROWS)),
)

# Create an empty (but form-enabled) PDF
empty_form_pdf = HTML(string=html_form).render(pdf_forms=True).write_pdf()

# Fill fields
form_data = data_method(a, rows)
filled_pdf = fill_pdf_form(io.BytesIO(empty_form_pdf), form_data)
return filled_pdf

def fill_pdf_form(document: io.BytesIO, data: dict) -> bytes:
"""
Fill AcroForm fields using pypdf.
"""
reader = pypdf.PdfReader(document)
writer = pypdf.PdfWriter()

writer.append(reader)

for page in writer.pages:
writer.update_page_form_field_values(page, data, auto_regenerate=False)
### Commented out what worked to remove escape characters, but broke Acrobat Reader view
#     annots = page.get("/Annots")
#     if annots:
#         for annot_ref in cast(List[IndirectObject], annots):
#             annot = annot_ref.get_object()
#             if annot is None:
#                 continue
#             annot_dict = cast(MutableMapping[str, Any], annot)
#             if "/AP" in annot_dict:
#                 del annot_dict["/AP"]

# writer._root_object.update({NameObject("/NeedAppearances"): BooleanObject(True)})
out = io.BytesIO()
writer.write(out)
return out.getvalue()
[/code]
Есть ли способ или другое обходное решение, позволяющее удалить escape-символы в скобках и сохранить видимость в Acrobat? 

Подробнее здесь: [url]https://stackoverflow.com/questions/79853586/weasyprint-pypdf-visible-non-editable-escape-characters-on-parentheses[/url]