Convert From One Format to Another
Wordize’s document conversion feature allows developers to easily convert documents from one format to another, ensuring high fidelity and preserving formatting, styles, and content.
With a variety of modules tailored to specific needs, Wordize offers powerful and flexible APIs for working with MS Word documents, web formats, eBooks, PDFs, images, and OpenOffice files.
Supported Conversion Modules
The following list of modules and formats is currently available:
Conversion Module | Supported formats |
---|---|
Wordize Core for .NET | Load/Save MS Word document formats (DOC, DOCX, RTF, FlatOpc, XML, TXT, XLSX) |
Wordize Web for .NET | Load/Save Web formats (HTML, MHTML, MD, CHM) |
Wordize eBook for .NET | Load/Save eBook formats (EPUB, AWZ3, MOBI) |
Wordize OpenOffice for .NET | Load/Save open office documents (ODT, OTT) |
Wordize PDF Load for .NET | Load PDF documents (PDF) |
Wordize Rendering for .NET | Save documents to fixed page formats (PDF, XPS, SVG, PS, Image, PostScript, etc.) |
Select any document conversion module to work with the required formats.You can find prices for modules on the Purchase page.
In the API Reference, all import formats supported by Wordize are listed in the LoadFormat enumeration, and saving formats are listed in the SaveFormat enumeration.
How to Work With Conversion Modules
There is a basic Wordize Core for .NET module that allows users to load and save documents in Microsoft Word format. Without purchasing this basic module, none of the other modules work.
If you want to convert to a wider range of formats, you need additional conversion modules:
- If you want to convert from DOC to DOCX, you only need the basic module Wordize Core for .NET
- If you need to convert DOCX to HTML, then in addition to Wordize Core for .NET, you will need the module Wordize Web for .NET
- If you need to convert DOCX to HTML and EPUB, you need the modules Wordize Core for .NET, Wordize Web for .NET, Wordize eBook for .NET
- And so on
What is Document Conversion?
Conversion is the process of loading a document and then saving it in the target format. You can load a document from a stream or file and also save it to a stream or file.
Wordize simplifies the process of converting your documents. You don’t need to load and save the document separately. You simply use one of the Convert or ConvertToImages methods with various overloads to perform the conversion in one action.
Find a full list of code examples for document conversion in the Wordize.Converting section of the API Reference. But in the following subsections we will show examples of simple conversions – DOCX to PDF, DOCX to HTML, and DOCX to PNG.
Convert DOCX to PDF
If you want to convert DOCX to PDF, you will need the Wordize Core for .NET and Wordize Rendering for .NET modules and the corresponding licenses.
The following code example shows how to convert DOCX to PDF using the Convert(string, string) method:
var doc = "Document.docx";
Converter.Convert(doc, "ConvertedDocument.pdf");
Convert DOCX to HTML
If you want to convert DOCX to HTML, you will need the Wordize Core for .NET and Wordize Web for .NET modules and the corresponding licenses.
The following code example shows how to convert DOCX to HTML using the Convert(string, string) method:
var doc = "Document.docx";
Converter.Convert(doc, "ConvertedDocument.html");
Convert DOCX to PNG
If you want to convert DOCX to PNG, you will need the Wordize Core for .NET and Wordize Rendering for .NET modules and the corresponding licenses.
It is important to note that when converting a multi-page document to an image, there are two options:
- When saving to a file, each page will be saved as a separate file.
- When saving to a stream, only the first page will be saved. To save all pages, you must use Converter.ConvertToImages.
The following code example shows how to convert DOCX to PNG using the ConvertToImages(string, ImageSaveOptions):
Stream[] pages = Converter.ConvertToImages("Document1.docx", new ImageSaveOptions(SaveFormat.Png));
Document Conversion Accuracy
The internal Wordize document object model is designed to work mostly with MS Word document formats such as DOC, DOCX, or RTF. Although Wordize supports a wide range of formats, many of them are not native to MS Word. Therefore, please note that converting one document format to another may result in a loss of fidelity.
Different Object Models
Conversion fidelity should be especially taken into account when converting documents whose object models are very different, and it is not always possible to ensure 100% fidelity after converting one model to another. For example, this can be observed when converting HTML-based formats to MS Word formats.
Using Fonts
Conversion fidelity can also be reduced when converting flow-layout formats (where the geometry of objects and their position on the page are not fixed, such as Word formats) to fixed-layout formats (where the geometry of objects and their position on the page are fixed, such as PDF or image formats), because conversion to fixed-page formats requires building a document layout.
For example, Wordize mimics MS Word when building a document page layout. However, to build an accurate document layout, the fonts used in the original document are required. If Wordize cannot find the fonts used in the document, the fonts are substituted. This may result in layout differences due to differences in font metrics. To minimize differences, you can configure font settings globally using Wordize.Settings.DefaultFontSettings:
Wordize.Settings.DefaultFontSettings.SetFontsFolder(@"C:\Fonts", true);
Converting PDF to Other Formats
It is worth noting that Wordize supports PDF as an input document format. PDF documents are fixed-layout format documents, and if a PDF document is converted to a flow-layout document format, Wordize converts the fixed-layout document structure to a flow-layout document object model. Unfortunately, this conversion does not guarantee 100% accuracy either.
On the other hand, converting from PDF to another fixed-layout format does not require transformation of the document model. Therefore, converting PDF to the following formats will provide better performance and accuracy: Pdf, Tiff, Png, Bmp, Jpeg, Gif, Svg, Xps, HtmlFixed, Pcl, OpenXps, WebP, Emf, Eps.