Convert HTML to image

ghostscriptgvimimagemagicksyntax highlighting

Background

Batch convert various syntax-highlighted source files (C, SQL, Java, PHP, batch, bash) into high-resolution images (600dpi), suitable for an eBook and printed book.

Failed Solutions

A number of attempts so far:

  • OpenOffice or LibreOffice – Have to re-import source code into the document every time the source file changes. (That is, the solution cannot be easily automated for hundreds or thousands of source files.)
  • enscript. Cannot easily change colours, imperfectly renders output, not comprehensive.
  • LyX / LaTeX. Imperfectly renders output.
  • gvim to HTML — HTMLDOC to PostScript — GhostScript to PNG. HTMLDOC ignores font tags.
  • gvim to HTML — html2ps — GhostScript to PNG. RGB colours are not recognized by html2ps.
  • Firefox to PostScript — GhostScript to PNG. Obnoxiously circuitous.
  • gvim to HTML — OmniFormat to anything. Free version unsuitable for batch processing; lots of advertising pop-ups.
  • pygments. Cannot easily change image resolution; does not have gvim's range of colour schemes.

Closest Solution

The solution that almost works is:

  • gvim to HTML — wkhtmltopdf to PDF. Will require post-processing with ImageMagick (wkhtmltoimage cannot set image resolution, only page width).

Requirements

  • Windows and Linux, but either is acceptable.
  • Free or OSS
  • Command line only (suitable for batch processing)
  • Easily change colour scheme
  • Support: PHP, batch, bash, Java, JavaScript, R, C, and SQL

Question

Any other ways to convert syntax-highlighted source code to a high-resolution (600dpi) image?

Thank you!

Best Answer

Software Requirements

The following software packages are available for both Windows and Linux systems, and are required for a complete, working solution:

  • gvim - Used to export syntax highlighted source code to HTML.
  • moria - Colour scheme for syntax highlighting.
  • wkhtmltoimage - Used to convert HTML documents to PNG files.
  • gawk and sed - Text processing tools.
  • ImageMagick - Used to trim the PNG and add a border.

General Steps

Here is how the solution works:

  1. Load the source code into an editor that can add splashes of colour.
  2. Export the source code as an HTML document (with embedded FONT tags).
  3. Strip the background attribute from the HTML document (to allow transparency).
  4. Convert the HTML document to a PNG file.
  5. Trim the PNG border.
  6. Add a small, 25 pixel border around the image.
  7. Delete temporary files.

The script generates images that are all the same width for source files containing lines that are all under 80 characters in length. Source files with lines over 80 characters long result in images as wide as necessary to retain the entire line.

Installation

Install the components into the following locations:

  • gvim - C:\Program Files\Vim
  • moria - C:\Program Files\Vim\vim73\colors
  • wkhtmltoimage - C:\Program Files\wkhtml
  • ImageMagick - C:\Program Files\ImageMagick
  • Gawk and Sed - C:\Program Files\GnuWin32

Note: ImageMagick has a program called convert.exe, which cannot supersede the Windows convert command. Because of this, the full path to convert.exe must be hard-coded in the batch file (as opposed to adding ImageMagick to the PATH).

Environment Variables

Set the PATH environment variable to:

"C:\Program Files\Vim\vim73";"C:\Program Files\wkhtml";"C:\Program Files\GnuWin32\bin"

Batch File

Run it using:

src2png.bat src2png.bat

Create a batch file called src2png.bat by copying the following contents:

@ECHO OFF

SET NUMBERS=-c "set number"
IF "%2" == "" SET NUMBERS=

ECHO Converting %1 to %1.html...
gvim -e %1 -c "set nobackup" %NUMBERS% -c ":colorscheme moria" ^
  -c :TOhtml -c wq -c :q

REM Remove all background-color occurrences (without being self-referential)
sed -i "s/background-color: #......; \(.*\)}$/\1 }/g" %1.html

ECHO Converting %1.html to %1.png...
wkhtmltoimage --format png --transparent --minimum-font-size 80 ^
  --quality 100 --width 3600 ^
  %1.html %1.png

move %1.png %1.orig.png

REM If the text file has lines that exceed 80 characters, don't crop the
REM resulting image. (The book automatically shrinks large images to fit.)
REM The 3950 is the 80 point font at 80 characters with padding for line
REM numbers.
SET LENGTH=0
FOR /F %%l IN ('gawk ^
  "BEGIN {x=0} {if( length($0)>x ) x=length()} END {print x;}" %1') ^
DO (
  SET LENGTH=%%l
)
SET EXTENT=-extent 3950x
IF %LENGTH% GTR 80 SET EXTENT=

REM Trim the image height, then extend the width for 80 columns, if needed.
REM The result is that all images will be resized the same amount, thus
REM making the font size the same maximum for all source listings. Source
REM files beyond the 80 character limit will be scaled as necessary.
ECHO Trimming %1.png...
"C:\programs\ImageMagick\convert.exe" -format png %1.orig.png ^
  -density 150x150 ^
  -background none -antialias -trim +repage ^
  %EXTENT% ^
  -bordercolor none -border 25 ^
  %1.png

ECHO Removing old files...
IF EXIST %1.orig.png DEL /q %1.orig.png
IF EXIST %1.html DEL /q %1.html
IF EXIST sed*. DEL /q sed*.

Improvements and optimizations welcome.

Note: The latest version of wkhtmltoimage properly handles overriding the background colour. Thus the line to remove the CSS for background colours is no longer necessary, in theory.