PHP Classes

PHP DOCX to XML: Extract XML files from Microsoft Word DOCX files

Recommend this page to a friend!
  Info   View files Example   View files View files (4)   DownloadInstall with Composer Download .zip   Reputation   Support forum   Blog (2)    
Ratings Unique User Downloads Download Rankings
Not enough user ratingsTotal: 164 This week: 1All time: 8,912 This week: 560Up
Version License PHP version Categories
docxtoxml 1.0.2GNU General Publi...5XML, PHP 5, Files and Folders
Description 

Author

This class can extract XML files from Microsoft Word DOCX files.

It can take the path of a Microsoft Word file in DOCX format and extract its contents to save the XML files that it contains.

The extracted XML files are saved to a given directory.

Innovation Award
PHP Programming Innovation award nominee
October 2021
Number 5
The Microsoft Word program uses the DOCX format to save all the information in a word processing document created with that program.

A file in a DOCX format is a compressed archive in the ZIP format that contains mainly XML files for the document's different parts.

This class extracts the XML files contained in a DOCX file so that applications can perform additional processing of the word processing document.

Manuel Lemos
Picture of Timothy Edwards
  Performance   Level  
Name: Timothy Edwards <contact>
Classes: 4 packages by
Country: United Kingdom
Age: ???
All time rank: 3057138 in United Kingdom
Week rank: 24 Up1 in United Kingdom Up
Innovation award
Innovation award
Nominee: 2x

Example

<!DOCTYPE html>
<html lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

</head>

<body>
<?php
require_once('wordxml.php');
$rt = new WordXML(false);
$rt->readDocument('sample.docx');
?>
</body>


Details

A php class to extract all the XML files from a Word DOCX document and save them as separate XML files

Description

This php class will take a DOCX type Word document and extract all the XML files in it. They will be then all be saved in a directory with the same name as the original DOCX file. This directory will be automatically created if it does not exist. In the normal mode this class will not provide any output to screen. A php demonstration file (xmltest.php) is included.

New in v1.0.1 - Will now save the footnote and endnote relationship XML files if they exist. Note that the Class name has been changed to WordXML - which is what it should have been originally.

New in v1.0.2 - Updated to work in php 8.

USAGE

Include the class in your php script

require_once('wordxml.php');

Normal mode to save all the XML files (no output to screen) - (note the change for v1.0.1)

$rt = new WordXML(false); or $rt = new WordXML();

Display on screen the contents of all XML files found after saving them (note the change for v1.0.1)

$rt = new WordXML(true);

Set the encoding - Only needed when displaying the XML files on screen to ensure that the displayed coding matches that of the calling php script

$rt = new WordXML(true, 'encoding');

Read docx file and save all the XML Files found

$rt->readDocument('FILENAME');

  Files folder image Files  
File Role Description
Accessible without login Plain text file LICENSE Lic. License text
Accessible without login Plain text file README.md Doc. Documentation
Plain text file wordxml.php Class Class source
Accessible without login Plain text file xmltest.php Example Example script

 Version Control Unique User Downloads Download Rankings  
 100%
Total:164
This week:1
All time:8,912
This week:560Up