How to Generate a Table of Contents in Docx with Python
Tables of contents (TOCs) are an essential part of long documents, providing a quick overview of the document's structure and enabling easy navigation. While most Office software packages offer built-in TOC generation mechanisms, they can be difficult to call from code.
This blog post will show you how to generate a TOC in Docx with Python. We will use the following steps:
- Create a TOC placeholder in the Docx document.
- Refresh the TOC.
1. Create a TOC placeholder in the Docx document
The first step is to create a TOC placeholder in the Docx document. This can be done using the following Python code:
We will use the python-docx
package.
pip install python-docx
from docx.oxml.ns import qn from docx.oxml import OxmlElement def insert_toc(d, levels="1-3"): """ Insert "Table of Contents" to Document Parameters: ------------ d: Document Object 文档对象 levels: string default "1-3" 根据 addheading 更新目录 """ sdt = OxmlElement('w:sdt') sdtpr = OxmlElement('w:sdtPr') docpartobj = OxmlElement('w:docPartObj') docpartgallery = OxmlElement('w:docPartGallery') docpartgallery.set(qn('w:val'), 'Table of Contents') docpartunique = OxmlElement('w:docPartUnique') docpartunique.set(qn('w:val'), 'true') docpartobj.append(docpartgallery) docpartobj.append(docpartunique) sdtpr.append(docpartobj) sdt.append(sdtpr) sdtcontent = OxmlElement('w:sdtContent') p = OxmlElement('w:p') r = OxmlElement('w:r') t = OxmlElement('w:t') t.text = 'Contents' r.append(t) p.append(r) sdtcontent.append(p) fldChar = OxmlElement('w:fldChar') # creates a new element fldChar.set(qn('w:fldCharType'), 'begin') # sets attribute on element instrText = OxmlElement('w:instrText') instrText.set(qn('xml:space'), 'preserve') # sets attribute on element instrText.text = f'TOC \\o "{levels}" \\h \\z \\u' # change 1-3 depending on heading levels you need fldChar2 = OxmlElement('w:fldChar') fldChar2.set(qn('w:fldCharType'), 'separate') # fldChar3 = OxmlElement('w:t') # fldChar3.text = "Right-click to update field." fldChar3 = OxmlElement('w:updateFields') fldChar3.set(qn('w:val'), 'true') fldChar2.append(fldChar3) fldChar4 = OxmlElement('w:fldChar') fldChar4.set(qn('w:fldCharType'), 'end') p2 = OxmlElement('w:p') r2 = OxmlElement('w:r') r2.append(fldChar) r2.append(instrText) r2.append(fldChar2) r2.append(fldChar4) p2.append(r2) sdtcontent.append(p2) sdt.append(sdtcontent) d._element.body.insert_element_before(sdt, *('w:sectPr',)) return d
2. Generate the TOC
We now have a TOC section in our document, but it's empty. How do we refresh the TOC? There are two ways to do this:
2.1. Method 1: Using a LibreOffice macro
- Create Macro Module
REM ***** BASIC ***** Option Explicit Sub UpdateIndexes(path As String) '''Update indexes, such as for the table of contents''' Dim doc As Object Dim args() doc = StarDesktop.loadComponentFromUrl(convertToUrl(path), "_default", 0, args()) Dim i As Integer With doc ' Only process Writer documents If .supportsService("com.sun.star.text.GenericTextDocument") Then For i = 0 To .getDocumentIndexes().count - 1 .getDocumentIndexes().getByIndex(i).update() Next i End If End With ' ThisComponent doc.store() doc.close(True) End Sub ' UpdateIndexes
- Import the Macro Module
$ mv ~/.config/libreoffice/4/user/basic ~/basic_backup $ cp basic ~/.config/libreoffice/4/user/ -r
- Run the Command
$ soffice --headless "macro:///Standard.YourModuleName.UpdateIndex(/path/to/file.odt)"
2.2. Method 2: Using Unoserver
- Pull the Unoserver Docker Image
docker pull chanmo/unoserver
- Run the Unoserver container
docker run -p 5000:5000 chanmo/unoserver
- Update the TOC Using HTTPie
http -f POST :5000/convert/docx file@/path/to/demo.docx -o demo.docx
The disadvantage of using a LibreOffice macro is that the server needs to have LibreOffice installed. If you don't want to install LibreOffice, you can use the Unoserver method instead.