227

I have found several open-source/freeware programs that allow you to convert .doc files to .pdf files, but they're all of the application/printer driver variety, with no SDK attached.

I have found several programs that do have an SDK allowing you to convert .doc files to .pdf files, but they're all of the proprietary type, $2,000 a license or thereabouts.

Does anyone know of any clean, inexpensive (preferably free) programmatic solution to my problem, using C# or VB.NET?

Thanks!

Shaul Behr
  • 33,989
  • 61
  • 233
  • 360
  • 1
    Check if [Pandoc](http://pandoc.org/) has [bindings for your favourite language](https://github.com/jgm/pandoc/wiki/Pandoc-Extras#pandoc-wrappers-and-interfaces). The command line interface is also dead easy `pandoc manual.docx -o manual.pdf` – Colonel Panic Sep 30 '16 at 15:29
  • Also, check [GemBox.Document](https://www.gemboxsoftware.com/document) SDK. It has a free version and an inexpensive version. It not using neither a printer driver nor ms office to convert Word files to PDF. – hertzogth Feb 11 '20 at 08:47
  • You can use docx2pdf to make this conversion: https://github.com/AlJohri/docx2pdf – Al Johri Feb 18 '20 at 07:33

9 Answers9

206

Use a foreach loop instead of a for loop - it solved my problem.

int j = 0;
foreach (Microsoft.Office.Interop.Word.Page p in pane.Pages)
{
    var bits = p.EnhMetaFileBits;
    var target = path1 +j.ToString()+  "_image.doc";
    try
    {
        using (var ms = new MemoryStream((byte[])(bits)))
        {
            var image = System.Drawing.Image.FromStream(ms);
            var pngTarget = Path.ChangeExtension(target, "png");
            image.Save(pngTarget, System.Drawing.Imaging.ImageFormat.Png);
        }
    }
    catch (System.Exception ex)
    {
        MessageBox.Show(ex.Message);  
    }
    j++;
}

Here is a modification of a program that worked for me. It uses Word 2007 with the Save As PDF add-in installed. It searches a directory for .doc files, opens them in Word and then saves them as a PDF. Note that you'll need to add a reference to Microsoft.Office.Interop.Word to the solution.

using Microsoft.Office.Interop.Word;
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;

...

// Create a new Microsoft Word application object
Microsoft.Office.Interop.Word.Application word = new Microsoft.Office.Interop.Word.Application();

// C# doesn't have optional arguments so we'll need a dummy value
object oMissing = System.Reflection.Missing.Value;

// Get list of Word files in specified directory
DirectoryInfo dirInfo = new DirectoryInfo(@"\\server\folder");
FileInfo[] wordFiles = dirInfo.GetFiles("*.doc");

word.Visible = false;
word.ScreenUpdating = false;

foreach (FileInfo wordFile in wordFiles)
{
    // Cast as Object for word Open method
    Object filename = (Object)wordFile.FullName;

    // Use the dummy value as a placeholder for optional arguments
    Document doc = word.Documents.Open(ref filename, ref oMissing,
        ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing,
        ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing,
        ref oMissing, ref oMissing, ref oMissing, ref oMissing);
    doc.Activate();

    object outputFileName = wordFile.FullName.Replace(".doc", ".pdf");
    object fileFormat = WdSaveFormat.wdFormatPDF;

    // Save document into PDF Format
    doc.SaveAs(ref outputFileName,
        ref fileFormat, ref oMissing, ref oMissing,
        ref oMissing, ref oMissing, ref oMissing, ref oMissing,
        ref oMissing, ref oMissing, ref oMissing, ref oMissing,
        ref oMissing, ref oMissing, ref oMissing, ref oMissing);

    // Close the Word document, but leave the Word application open.
    // doc has to be cast to type _Document so that it will find the
    // correct Close method.                
    object saveChanges = WdSaveOptions.wdDoNotSaveChanges;
    ((_Document)doc).Close(ref saveChanges, ref oMissing, ref oMissing);
    doc = null;
}

// word has to be cast to type _Application so that it will find
// the correct Quit method.
((_Application)word).Quit(ref oMissing, ref oMissing, ref oMissing);
word = null;
w5m
  • 2,259
  • 3
  • 29
  • 42
Eric Ness
  • 8,771
  • 14
  • 46
  • 49
  • 3
    Thank you! I may just go with Aspose anyway, if it's faster than Word automation. But if I can tolerate a little bit of slowness, I'll prolly use your solution. Thanks again! – Shaul Behr Mar 04 '09 at 07:54
  • 4
    Yes, it's not the fastest but it's hard to beat the price. :-) Glad I could help. – Eric Ness Mar 04 '09 at 14:28
  • 11
    With Office 2007 SP2 you no longer need the save as PDF download. I've also used this technique successfully for Excel and Powerpoint. – RichardOD Sep 30 '09 at 09:07
  • 6
    Have you use this method on a server with a web application? I'm getting alot of issues not mention its not recommended by MS. http://support.microsoft.com/default.aspx?scid=kb;EN-US;q257757#kb2 I heard ASPose is great but its quite dear. – Prabu Dec 22 '09 at 04:43
  • I haven't tried this with a web application so I can't guarantee that it will work in that environment. – Eric Ness Dec 22 '09 at 16:19
  • Would the program need to be deployed together with the Microsoft.Office.Interop.Word assembly? – Chry Cheng Jul 21 '10 at 15:59
  • 1
    If Word isn't installed on the machine that you're deploying it on then you'd have to include that assembly. – Eric Ness Jul 21 '10 at 21:04
  • 1
    Note that with the new optional parameters in C# you can skip all those `ref missing` calls. Incidentally interop calls like this one were one of the things VB.Net used to do better than C#. – Keith Jun 23 '11 at 12:21
  • Great solution. Did a prove of concept analysis with this code against Office 2003 saving word doc as RTF and HTML. – Gary Kindel Jul 12 '11 at 20:23
  • 6
    Um... if word isn't installed, I think packaging the interop assembly will be the least of your worries. This code REQUIRES word to be installed. – BrainSlugs83 Oct 25 '11 at 02:25
37

To sum it up for vb.net users, the free option (must have office installed):

Microsoft office assembies download:

VB.NET example:

        Dim word As Application = New Application()
        Dim doc As Document = word.Documents.Open("c:\document.docx")
        doc.Activate()
        doc.SaveAs2("c:\document.pdf", WdSaveFormat.wdFormatPDF)
        doc.Close()
turbanoff
  • 2,292
  • 5
  • 39
  • 96
Elger Mensonides
  • 6,373
  • 4
  • 42
  • 60
13

PDFCreator has a COM component, callable from .NET or VBScript (samples included in the download).

But, it seems to me that a printer is just what you need - just mix that with Word's automation, and you should be good to go.

Mark Brackett
  • 81,638
  • 17
  • 102
  • 150
  • where's this COM component? And what does "mik" mean? Was that meant to be "mix"? – Shaul Behr Mar 03 '09 at 19:49
  • The COM component is included in the download, along with samples. And yes, that was supposed to be "mix". – Mark Brackett Mar 04 '09 at 11:01
  • 5
    FYI - if you go this route, PDFCreator bundles malware in the installer. This has been an ongoing issue with PDFCreator since 2009. – Phil Gorley May 28 '15 at 20:22
  • 2
    @PhilGorley Malware? and this answer is +8... – Mzn Aug 05 '15 at 10:18
  • @Mzn - FWIW, paying attention and unchecking the addon installs always works for me. I don't see it as any different than Oracle bundling crap in the Java installer; it's annoying, but not worth avoiding the software for me (yeah, OK, PdfCreator's adware is probably infinitely less useful and more intrusive than whatever Oracle is pushing these days...I still don't want either one of them). – Mark Brackett Aug 05 '15 at 16:09
  • Ah, so you can opt out from it? I thought it installs it in a sneaky way! But a library need not be installed, it's better to have it provided as a bunch of libraries or easy to build source code – Mzn Aug 07 '15 at 13:07
12

Just wanted to add that I used Microsoft.Interop libraries, specifically ExportAsFixedFormat function which I did not see used in this thread.

using Microsoft.Office.Interop.Word;
using System.Runtime.InteropServices;
using System.IO;
using Microsoft.Office.Core;

Application app;

public string CreatePDF(string path, string exportDir)
{
    Application app = new Application();
    app.DisplayAlerts = WdAlertLevel.wdAlertsNone;
    app.Visible = true;

    var objPresSet = app.Documents;
    var objPres = objPresSet.Open(path, MsoTriState.msoTrue, MsoTriState.msoTrue, MsoTriState.msoFalse);

    var pdfFileName = Path.ChangeExtension(path, ".pdf");
    var pdfPath = Path.Combine(exportDir, pdfFileName);

    try
    {
        objPres.ExportAsFixedFormat(
            pdfPath,
            WdExportFormat.wdExportFormatPDF,
            false,
            WdExportOptimizeFor.wdExportOptimizeForPrint,
            WdExportRange.wdExportAllDocument
        );
    }
    catch
    {
        pdfPath = null;
    }
    finally
    {
        objPres.Close();
    }
    return pdfPath;
}
Felix D.
  • 3,957
  • 4
  • 35
  • 61
zeta
  • 1,322
  • 1
  • 18
  • 13
  • 8
    Just a note for those that don't know that you need Office installed on the machine to use the Microsoft Interop libraries. – Sam Rueby Feb 23 '18 at 15:31
  • 1
    Nice! I suggest setting `app.Visible = false;` and adding a call to `app.Quit();` in the finally block. – Dan Korn May 05 '20 at 20:29
7

There's an entire discussion of libraries for converting Word to PDF on Joel's discussion forums. Some suggestions from the thread:

Todd Gamblin
  • 54,111
  • 13
  • 87
  • 94
  • 12
    Thanks, but all the suggestions there fall under the two categories I described above: either not programmatic, or hugely expensive. I specifically need .doc to .pdf programmatically. – Shaul Behr Mar 03 '09 at 19:41
5

I went through the Word to PDF pain when someone dumped me with 10000 word files to convert to PDF. Now I did it in C# and used Word interop but it was slow and crashed if I tried to use PC at all.. very frustrating.

This lead me to discovering I could dump interops and their slowness..... for Excel I use (EPPLUS) and then I discovered that you can get a free tool called Spire that allows converting to PDF... with limitations!

http://www.e-iceblue.com/Introduce/free-doc-component.html#.VtAg4PmLRhE

Kyle Alons
  • 6,567
  • 2
  • 31
  • 28
Ggalla1779
  • 456
  • 7
  • 18
  • Thanks for this - great solution without using Interop. Why is it so hard to find a free docx to PDF converter? – mbdavis Feb 14 '18 at 20:03
  • I had high hopes for this but the free version is limited to 3 pages of PDF output. The full version is very expensive if you need unlimited deployments. – grinder22 Feb 27 '19 at 00:28
  • grinder22 GemBox.Document also has a free version with size limitation and a paid version. However, it includes a royalty free deployment so you're able to build and publish an unlimited number of projects for no extra cost. – hertzogth Feb 11 '20 at 08:51
2

Easy code and solution using Microsoft.Office.Interop.Word to converd WORD in PDF

using Word = Microsoft.Office.Interop.Word;

private void convertDOCtoPDF()
{

  object misValue = System.Reflection.Missing.Value;
  String  PATH_APP_PDF = @"c:\..\MY_WORD_DOCUMENT.pdf"

  var WORD = new Word.Application();

  Word.Document doc   = WORD.Documents.Open(@"c:\..\MY_WORD_DOCUMENT.docx");
  doc.Activate();

  doc.SaveAs2(@PATH_APP_PDF, Word.WdSaveFormat.wdFormatPDF, misValue, misValue, misValue, 
  misValue, misValue, misValue, misValue, misValue, misValue, misValue);

  doc.Close();
  WORD.Quit();


  releaseObject(doc);
  releaseObject(WORD);

}

Add this procedure to release memory:

private void releaseObject(object obj)
{
  try
  {
      System.Runtime.InteropServices.Marshal.ReleaseComObject(obj);
      obj = null;
  }
  catch (Exception ex)
  {
      //TODO
  }
  finally
  {
     GC.Collect();
  }
}
daniele3004
  • 10,770
  • 9
  • 55
  • 63
  • Is it necessary to call GC.Collect? Isn't there a different way to only mark the part of memory that is related to this for freeing on the next automatic GC? – Preza8 Jan 30 '20 at 11:48
2

Seems to be some relevent info here:

Converting MS Word Documents to PDF in ASP.NET

Also, with Office 2007 having publish to PDF functionality, I guess you could use office automation to open the *.DOC file in Word 2007 and Save as PDF. I'm not too keen on office automation as it's slow and prone to hanging, but just throwing that out there...

Community
  • 1
  • 1
MikeW
  • 5,442
  • 32
  • 42
1

Microsoft PDF add-in for word seems to be the best solution for now but you should take into consideration that it does not convert all word documents correctly to pdf and in some cases you will see huge difference between the word and the output pdf. Unfortunately I couldn't find any api that would convert all word documents correctly. The only solution I found to ensure the conversion was 100% correct was by converting the documents through a printer driver. The downside is that documents are queued and converted one by one, but you can be sure the resulted pdf is exactly the same as word document layout. I personally preferred using UDC (Universal document converter) and installed Foxit Reader(free version) on server too then printed the documents by starting a "Process" and setting its Verb property to "print". You can also use FileSystemWatcher to set a signal when the conversion has completed.

Arvand
  • 3,793
  • 3
  • 27
  • 36