The Art of Writing Software

Easy sanity testing of a webapp with Python

Tags [ Cactus, CruiseControl, HTMLParser, integration test, Maven, Python ]

Needed to whip up some automated sanity testing of a webapp, and Cactus seems to have a somewhat steep learning curve. Eventually, yes, this might be the way to go, because we should be able to integrate it nicely with Maven and CruiseControl, but I needed something quick.

In particular, I needed to check whether a certain number of images were showing up in a portion of a JSP-generated page.

Step 1. Prep the JSP to make it testable. Primarily, we just want to generate a unique HTML id attribute for the <img> tags in question:

<c:forEach items="${items}" var="item" varStatus="itemStatus">
  <c:set var="imageId">
    specialImage<c:out value="${itemStatus.index}"/>
  <img id="${imageId}" .../>

This numbers the images with ids like specialImage0, specialImage1, etc.

Step 2. Fetch and parse the page with Python’s HTMLParser library. Basically, we write a parser that only pays attention to the tags we’ve marked in the JSP above:

import HTMLParser
import re

class TestSpecialImageCount(HTMLParser.HTMLParser):

    def reset(self):
        self.imageIds = []

    def handle_startendtag(self, tag, attrs):
        if tag == "img":
            for key, val in attrs:
                if key == "id":
                    if re.match("specialImage[0-9]+",val):
                        if val not in self.imageIds:

    def testPassed(self):
        return (len(self.imageIds) == 6)

Then running the test is as easy as:

import urllib
import TestSpecialImageCount

test = TestSpecialImageCount.TestSpecialImageCount()
page = urllib.urlopen("http://localhost:8080/webapp/home.htm").read()
if test.testPassed():
    print "OK"
    print "FAILED"

You could pretty easily batch this up and instantiate several parsers for different tests so you have one uber Python script to run all the tests (this is what we did). But the point here is that you don’t have to write very much code at all to get the sanity test written, because you can write the HTML parsers so compactly in Python.