Search this blog ...

Friday, December 24, 2010

Batch Convert Images to PDF using PDFill and AutoIt

I’m in the processing of completely digitalizing my filing cabinet and shredding all but the most important paper-based records. I recently purchased a Canon PIXMA MX870 Multifunction printer/scanner unit purely for its Auto Document Feeder with duplex scanning support. It turns out, the ADF duplex support isn’t all I had hoped with it regularly complaining about paper jams.  This really irritates me Canon! I’m furious actually.

Anyway back to the digital filing cabinet …

I’m leveraging three really cool pieces of software:

  1. FileCenter by Lucion http://lucion.com/filecenter-overview.html
  2. PDFill PDF tools (FREE) by PlotSoft  http://www.pdfill.com/pdf_tools_free.html
  3. AutoIt  http://www.autoitscript.com/autoit3

FileCenter is providing the interface to my new digital filing cabinet providing file management capabilities along with scanning and OCR (optical character recognition) features. It has a number of really useful bells and whistles including smart file (re)naming capabilities that really help with the overall organization of the filing cabinet. The standard edition which I initially purchased (on special now for $50) seems to do its job quite well.  The only downside is the supplied OCR engine is sometimes not up to the task. The FileCenter Pro Plus edition (on special now for $199) ships with an Advanced OCR engine that seems to have a much better job at extracting text from the various scans/images I supply it. I may persist with Standard Edition for a little longer as money is tight – but if Lucion want to give me a free upgrade please contact me :). Lucion provides a 30 day trial of FileCenter Pro Plus with no limitations whatsoever – so check it out. It’s a great piece of software.

PDFill PDF tools are FREE and are without doubt some of the most useful and powerful tools I have ever come across. I use these tools regularly to do things like:

  • Merging PDF files together
  • Reordering pages of PDF files
  • Converting images to PDF
  • Rotating PDF pages

Whilst FileCenter appears to provide many of these capabilities, the level of customization and tweaking that the PDFill PDF tools provide is unmatched. The only downside to these tools is they are all GUI based requiring lots of clicks and user input and don’t appear to provide any command-line based control for doing tasks like batch conversion/processing.

This is where the final piece of software “AutoIt v3” comes in to its own. The guys that wrote this are studs to release it for free.  This tool/scripting language allows one to construct little batch file/shell script like programs that can automatically control and interact with Windows applications without any end-user input required. Thus, I can write a little program for example that simulates mouse clicks and keyboard input from a user.

I have a number a JPG image files that are essentially scans of various paper-based documents that I want to convert to PDF. PDFill provides a tool "Convert Images to PDF" that allows me to manually navigate to the appropriate source image, and then specify the output PDF file along with the PDF output page size / margins / image layout etc etc. This is quite a laborious repetitive task that I decided to automate by using an AutoIT v3 script.

The script essentially searches for JPG files in a particular source directory, then iterates through each image file returned supplying it automatically to the PDFill tool. Appropriate output options / output file etc get set using a combination of automated mouse gestures and keyboard input. Upon conversion completion, the script detects Adobe Reader automatically launching to preview the output and closes it down.

Without further adieu – I give you the automated batched PDFill conversion script :-

; Script by Matt Shannon - Dec 2010

$srcDirectory = 'C:\Documents and Settings\Administrator\Desktop\scans'
$destDirectory = 'C:\Documents and Settings\Administrator\Desktop\output'

; change current directory to src directory
FileChangeDir($srcDirectory)
$search = FileFindFirstFile("*.jpg") 

If $search = -1 Then
  MsgBox(0, "Error", "No files/directories matched the search pattern")
  Exit
EndIf

While 1
  $file = FileFindNextFile($search)
  If @error Then ExitLoop
  ; MsgBox(4096, "File:", $file)

  $dest = StringReplace($file, ".jpg", -1, 0) & ".pdf"

  ConvertToPDF($srcDirectory, $file, $destDirectory, $dest)
 
WEnd

; Close the search handle
FileClose($search)

Func ConvertToPDF($srcdir, $srcfile, $destdir, $destfile)

  Run('C:\Program Files\PlotSoft\PDFill\PDFill_PDF_Tools.exe', "", @SW_MAXIMIZE)
  WinWaitActive("[REGEXPTITLE:PDFill PDF Tools 7.0.*]")

  ControlClick("[REGEXPTITLE:PDFill PDF Tools 7.0.*]", "", "[CLASS:Button; TEXT: 9. Convert Images to PDF]")
  WinWaitActive("[TITLE:Free PDF Tools: Convert images to PDF]")

  ; set paper size output to A4
  ControlClick("[TITLE:Free PDF Tools: Convert images to PDF]", "", "[CLASS:ComboBox; INSTANCE:1]")
  Send("A{ENTER}")

  ; set margins to 0
  ControlClick("[TITLE:Free PDF Tools: Convert images to PDF]", "", "[CLASS:Edit; INSTANCE:3]")
  Send("{HOME}{SHIFTDOWN}{END}{SHIFTUP}{DEL}0")

  ControlClick("[TITLE:Free PDF Tools: Convert images to PDF]", "", "[CLASS:Edit; INSTANCE:4]")
  Send("{HOME}{SHIFTDOWN}{END}{SHIFTUP}{DEL}0")

  ControlClick("[TITLE:Free PDF Tools: Convert images to PDF]", "", "[CLASS:Edit; INSTANCE:5]")
  Send("{HOME}{SHIFTDOWN}{END}{SHIFTUP}{DEL}0")

  ControlClick("[TITLE:Free PDF Tools: Convert images to PDF]", "", "[CLASS:Edit; INSTANCE:6]")
  Send("{HOME}{SHIFTDOWN}{END}{SHIFTUP}{DEL}0")

  ; add an image
  ControlClick("[TITLE:Free PDF Tools: Convert images to PDF]", "", "[CLASS:Button; TEXT:Add an Image]")
  WinWaitActive("[TITLE:Select Image files to add into PDF]")

  ; send image location followed by enter
  ControlClick("[TITLE:Select Image files to add into PDF]", "", "[CLASS:Edit; INSTANCE:1]")
  Send($srcdir & "\" & $srcfile & "{ENTER}")

  ; wait for window to return
  WinWaitActive("[TITLE:Free PDF Tools: Convert images to PDF]")

  ; click save-as
  ControlClick("[TITLE:Free PDF Tools: Convert images to PDF]", "", "[CLASS:Button; TEXT:Save As ...]")
  WinWaitActive("[TITLE:Save all the images as ... ]")

  ; send pdf output location followed by enter
  ControlClick("[TITLE:Save all the images as ... ]", "", "[CLASS:Edit; INSTANCE:1]")
  Send($destdir & "\" & $destfile & "{ENTER}")

  ; wait for adobe to open - and then close it down
  WinWaitActive("[TITLE:" & $destfile & " - Adobe Reader]")
  WinClose("[TITLE:" & $destfile & " - Adobe Reader]")

  ; wait for pdf image tools to return - and then close it down
  WinWaitActive("[TITLE:Free PDF Tools: Convert images to PDF]")
  WinClose("[TITLE:Free PDF Tools: Convert images to PDF]")

  ; wait for main pdf tools to return - and then close it down
  WinWaitActive("[REGEXPTITLE:PDFill PDF Tools 7.0.*]")
  WinClose("[REGEXPTITLE:PDFill PDF Tools 7.0.*]")

EndFunc

No comments:

Post a Comment