Site Crunch

What is Site Crunch?

A while ago I had a discussion with Billy Hoffman of Zoompf! about optimizing web site files in an automated manner on unix systems. I wrote a quick script to wrap some intelligence around a few common programs, and Site Crunch was born. Since then, it's gotten a bit more polished and easier to use, but the fundamental program hasn't changed.

Why Optimize?

In the early days of the Internet--when 56k modems were lining store shelves--keeping sites and images small was almost a necessity. No one wanted to wait forever while pulling content through AOL, CompuServe and eWorld!

As faster broadband became available, the massive hordes of users and sites on the 'net seemed to forget that things should be efficient... but the cycle is coming back. There are a lot of reasons for this resurgence:

  • Faster sites have an inherently better user experience,
  • users are impatient--they will go elsewhere rather than wait for your slow content,
  • less data transmitted means lower bandwidth expenses,
  • and perhaps most importantly, search rankings are starting to take speed into account.

How does it work?

Site Crunch uses freely available software to compress or optimize a site's content. It does this by calling these programs on files it can handle (JPG, PNG, CSS, HTML and JavaScript) and outputting them in a more optimized form.

Is it safe?

Mostly. It is highly recommended that you run Site Crunch in a development or test environment. It is also recommended that you output to a new directory and not replace your source files. See the end of this document for known bugs or problems before running. That's pretty important, so I'm going to repeat: it is highly recommended that you run Site Crunch in a development or test environment.

How much smaller will my site be?

This depends on a lot of factors, and certainly there are many ways not covered here. In testing, the average compression seems to be around 10% without any noticable difference in image appearance or site operation.

Usage/Options

  • checksetup  check for required components
  • -dir+*          directory to start in
  • -htmltidy     run Tidy against HTML files
  • -logfile+       flog file
  • -mirror+       mirror start directory to here (*all* files copied)
  • -skipregex+  don't process files/dirs that match this regex
  • -type+           type of file to process (or csv list) (jpg/png/css/htm/js or all)

+ requires value
* required option

Requirements

Customization

For PNG and JPG, custom options can be specified inside the site_crunch.pl program, on these lines:

$PROG{'png'}->{'options'} = "";
$PROG{'jpg'}->{'options'} = "-copy none -progressive -outfile *INFILE*";
Note: *INFILE* will be replaced as appropriate.

If you wish to change the options to HTML::Tidy, JavaScript::Minifier, CSS:Minifier or HTML::Clean, they are specified elsewhere in the program.

If you make options changes which result in smaller files, please email the author!

Bugs/Problems/Limitations

  • HTML Tidy may actually create larger files because it also attempts to fix validation errors.
  • HTML Tidy may not handle UTF8 encoded files properly--this appears to be a module bug.
  • JavaScript::Minifier croaks on certain .js files, often in a Wordpress install (unfixed).
  • There is no GIF support. Convert GIFs to PNG.
  • Subsequently trying to patch CSS/JS/HTML may cause failure since they text is no longer the same. This really can't be avoided--as a workaround, replace the optimized version with a full update and compress it again.

License

This code is licensed under the Reciprocal Public License (RPL) 1.5.

Where do I get it?

Download: sitecrunch_1.0.tar.bz2