How to Generate a Table of Contents in Docx with Python

Tables of contents (TOCs) are an essential part of long documents, providing a quick overview of the document's structure and enabling easy navigation. While most Office software packages offer built-in TOC generation mechanisms, they can be difficult to call from code.

This blog post will show you how to generate a TOC in Docx with Python. We will use the following steps:

  1. Create a TOC placeholder in the Docx document.
  2. Refresh the TOC.

1. Create a TOC placeholder in the Docx document

The first step is to create a TOC placeholder in the Docx document. This can be done using the following Python code:

We will use the python-docx package.

pip install python-docx
from docx.oxml.ns import qn
from docx.oxml import OxmlElement

def insert_toc(d, levels="1-3"):
      """
      Insert "Table of Contents" to Document

      Parameters:
      ------------

      d: Document Object
         文档对象

      levels: string
              default "1-3"
      根据 addheading 更新目录
      """
      sdt = OxmlElement('w:sdt')
      sdtpr = OxmlElement('w:sdtPr')
      docpartobj = OxmlElement('w:docPartObj')
      docpartgallery = OxmlElement('w:docPartGallery')
      docpartgallery.set(qn('w:val'), 'Table of Contents')
      docpartunique = OxmlElement('w:docPartUnique')
      docpartunique.set(qn('w:val'), 'true')
      docpartobj.append(docpartgallery)
      docpartobj.append(docpartunique)
      sdtpr.append(docpartobj)
      sdt.append(sdtpr)

      sdtcontent = OxmlElement('w:sdtContent')

      p = OxmlElement('w:p')
      r = OxmlElement('w:r')
      t = OxmlElement('w:t')
      t.text = 'Contents'
      r.append(t)
      p.append(r)
      sdtcontent.append(p)

      fldChar = OxmlElement('w:fldChar')  # creates a new element
      fldChar.set(qn('w:fldCharType'), 'begin')  # sets attribute on element
      instrText = OxmlElement('w:instrText')
      instrText.set(qn('xml:space'), 'preserve')  # sets attribute on element
      instrText.text = f'TOC \\o "{levels}" \\h \\z \\u'   # change 1-3 depending on heading levels you need

      fldChar2 = OxmlElement('w:fldChar')
      fldChar2.set(qn('w:fldCharType'), 'separate')
      # fldChar3 = OxmlElement('w:t')
      # fldChar3.text = "Right-click to update field."
      fldChar3 = OxmlElement('w:updateFields')
      fldChar3.set(qn('w:val'), 'true')
      fldChar2.append(fldChar3)

      fldChar4 = OxmlElement('w:fldChar')
      fldChar4.set(qn('w:fldCharType'), 'end')

      p2 = OxmlElement('w:p')
      r2 = OxmlElement('w:r')
      r2.append(fldChar)
      r2.append(instrText)
      r2.append(fldChar2)
      r2.append(fldChar4)
      p2.append(r2)

      sdtcontent.append(p2)
      sdt.append(sdtcontent)
      d._element.body.insert_element_before(sdt, *('w:sectPr',))

      return d  

Github

2. Generate the TOC

We now have a TOC section in our document, but it's empty. How do we refresh the TOC? There are two ways to do this:

2.1. Method 1: Using a LibreOffice macro

  1. Create Macro Module
REM  *****  BASIC  *****

 Option Explicit

 Sub UpdateIndexes(path As String)
     '''Update indexes, such as for the table of contents''' 
     Dim doc As Object
     Dim args()

     doc = StarDesktop.loadComponentFromUrl(convertToUrl(path), "_default", 0, args())

     Dim i As Integer

     With doc ' Only process Writer documents
         If .supportsService("com.sun.star.text.GenericTextDocument") Then
             For i = 0 To .getDocumentIndexes().count - 1
                 .getDocumentIndexes().getByIndex(i).update()
             Next i
         End If
     End With ' ThisComponent

     doc.store()
     doc.close(True)

 End Sub ' UpdateIndexes  

Github

  1. Import the Macro Module
$ mv ~/.config/libreoffice/4/user/basic ~/basic_backup
$ cp basic ~/.config/libreoffice/4/user/ -r  
  1. Run the Command
$ soffice --headless "macro:///Standard.YourModuleName.UpdateIndex(/path/to/file.odt)"

2.2. Method 2: Using Unoserver

  1. Pull the Unoserver Docker Image
docker pull chanmo/unoserver
  1. Run the Unoserver container
docker run -p 5000:5000 chanmo/unoserver
  1. Update the TOC Using HTTPie
http -f POST :5000/convert/docx file@/path/to/demo.docx -o demo.docx

The disadvantage of using a LibreOffice macro is that the server needs to have LibreOffice installed. If you don't want to install LibreOffice, you can use the Unoserver method instead.