Checking confluence with pythons

Heaven and hell.

Being a new starter at my new job, eager to absorb as much information as I can in as short a time span as possible, my attention immediately gravitated towards the Confluence Wiki with its plethora of pages (1800+) as a starting point.

There is a lot of information there, but the organisation of it could be better, so in line with the take-the-ball-and-run-with-it culture, I did.

First concern: how to improve things without making them worse?
Broken links suck.
Broken links I created, suck even more.

Sadly, and inexplicably, Confluence to this day does not include any tools to check for broken links, beyond the basics of “Orphaned pages” and “Links to new pages that haven’t been created yet”.

A quick search on the internet found a tool by an Atlassian employee that seemed promising: BustedStuffReport. Point it at a Wiki and it will scan all the pages and do some regex-magic to heuristically determine if all is in order.

Sadly: it only works on public Wikis, it does not follow any links to check them, it seems to target a somewhat older version of Confluence, it uses Python 2.

Having most of a solution already there, I decided I could hackimprove it to make it useful enough. Just let me get the Python language reference out and see what happens!?

After a week of playing around after work and in between tasks, I have a mostly rewritten script that is converging on the target I want to hit. I’ll post on Sciurus with the full details once I get there.

In the mean time, the experience has taught me the following:

  • The XML-RPC API to Confluence is very rich and regular, and reaches into almost all the corners I need (Yay!)
  • This XML-RPC API was deprecated in favour of REST,… while the REST API has not yet reached functional parity (D’oh!)
  • Confluence very helpfully does *some* classification of links through CSS classes… no idea why this isn’t visually represented by default (Huh?)
  • Python makes it incredibly easy to access the XML-RPC (Yay!)
  • Python still makes my skin crawl with its lack of type-safety, and no, I don’t want to write unit tests for a small tool like this (Boo!)
  • List comprehensions are awesome (just like LINQ is (double-Yay!))
  • Why do I need to end my conditional statement lines with a colon? I guess I can live with this, but for a language that strives for visual sparsity it seems like an odd requirement (*shrug*)

I think I can see why people love Python for scripts. But I’m still not convinced the productivity gained by the flexible typing system isn’t overshadowed by the extra test-cases you’d need to code in a non-trivial application. So, that leaves trivial for now, for me.