How to reverse engineer a spreadsheet

How to reverse engineer a spreadsheet

Every business I’ve worked with has at least one spreadsheet taken as gospel, yet how it works is a mystery. Usually these are simple tracking tools, used to record entries and made sure actions happen. Occasionally they’re Excel workbooks that churn out predictions and analysis the business relies on. The worst example I found calculated various solvency margins and took so long to work it was often left to run overnight. How it worked was jokingly called “magic”.

Over the years I’ve become adept at breaking Excel workbooks apart. Usually that’s a little different from what it actually does. Often the person who put it together has either left, or can’t remember quite what they were doing when they put it together.

This is the broad approach I take. The example I’m using is a pricing engine used to calculate premiums for insurance. It’s based on one I pulled apart a few years ago a broker had relied on for pricing a specialist scheme. 

Example Excel spreadsheet being used for analysis and breaking down.
The pricing spreadsheet being used throughout this article. It’s formatted to print on an A4 letterhead.

What is the spreadsheet supposed to do?

The first question to ask is what is it supposed to do. Look at how it fits into the way everyone works, not just what people think it does. You should be able to find how data feeds it, what people do with that data and where the outputs go.

The obvious users of the pricing spreadsheet were the salespeople quoting for insurance. Less obvious was the pricing analyst who fed it the numbers it used.

Start with the “User Interface”

I find the easiest place to start is the worksheet everyone uses most often. This could be full of charts or data entry fields. Start picking it apart, identifying cells that have formulas or validations, and how charts and diagrams are fed. Typically, I’ll colour-code cells, using blue for data entry, yellow where there’s also a validation and green for a calculation. If the spreadsheet has rows of calculations, I’ll only do the first row to save time.

Color-coded worksheet showing where data is entered, there is validation and formulae

Start naming cells

Cell references aren’t easy to read. B13 makes a lot less sense than “Date of Birth”. I give every cell that’s used a meaningful name that starts with the worksheet where it’s found. Later this will make the formulas much easier to read and correct.

Pull apart the formulas

As with cell references, Excel’s formulas are not easy to read. Copy and paste them into a text editor, then split them out with each function on a new line. If there’s embedded functions (such as IF( ) logic) you may find it easier to split these out too. Adding a tab before each block of functions can make them easier to read.

The pricing spreadsheet has a single formula that calculates the premium. In Excel it looks like this:

=B22*(VLOOKUP(ROUNDDOWN((TODAY()-B13)/365,0),’Age Table’!A2:C49,IF(B14=”Male”,3,2))*VLOOKUP(B15,’Relationship Table’!A1:B4,2)*VLOOKUP(B17,’Occupation Table’!A1:B8,2)*VLOOKUP(B18,’Employment Table’!A1:B10,2))+VLOOKUP(IF(B16>8,8,B16),’Dependents Table’!A1:B8,2)

Breaking it into structured code it becomes:

=B22*(

    VLOOKUP(

        ROUNDDOWN((TODAY()-B13)/365,0)

        ,’Age Table’!A2:C49,

        IF(B14=”Male”,3,2))

    *VLOOKUP(B15,’Relationship Table’!A1:B4,2)

    *VLOOKUP(B17,’Occupation Table’!A1:B8,2)

    *VLOOKUP(B18,’Employment Table’!A1:B10,2)

    )

    +VLOOKUP(

        IF(B16>8,8,B16),

        ‘Dependents Table’!A1:B8,2)

This structured version is easier to read. It becomes even easier after I’ve swapped cell references for the names I gave them:

=B22*(

    VLOOKUP(

        ROUNDDOWN((TODAY()- Data_DOB )/365,0),

        ‘Age Table’!A2:C49,

        IF( Data_Gender =”Male”,3,2))

    *VLOOKUP(Data_Relationship,’Relationship Table’!A1:B4,2)

    *VLOOKUP(Data_Employment,’Occupation Table’!A1:B8,2)

    *VLOOKUP(Data_ServiceLength,’Employment Table’!A1:B10,2)

    )

    +VLOOKUP(

        IF(Data_Dependents>8,8,Data_Dependents),’Dependents Table’!A1:B8,2)

For complex formulas that involve a lot of logic or interconnected formulae, I’ll create an influence chart or a flowchart.

Simple Explanations

A challenge for spreadsheet design is we get lost in the formula and lose sight of what it is supposed to do in business terms. With the formula broken apart and structured, it’s a relatively simple task to describe what they do in plain language. This helps non-spreadsheet users validate they’re doing the right thing and explain where changes are needed.

In the pricing spreadsheet, I could describe the premium calculation as “add the age/gender, relationship, employment and service length rating factors together, multiply this by the sum insured and add the additional premium for their dependents”. This was a lot easier to explain than trying to walk through Excel functions.

Split the calculation

Splitting out complicated calculations into individual formulae makes re-engineering a spreadsheet a simpler task. It also helps with identifying problems in the existing calculations. I create a separate worksheet and populate it with plain language explanations of what the calculations are doing. This approach makes it easier to adjust formula without breaking an entire spreadsheet or introducing difficult to find bugs.

Premium calculation split apart. Each element of the calculation is in its own cell and can be modified independently.
Premium calculation split apart. Each element of the calculation is in its own cell and can be modified independently.

Check everything

By this stage I have:

  • an understanding of how the spreadsheet is used, populated and maintained
  • a clear picture of where data entry, validation and calculations are happening
  • easy-to-understand formula with plain language descriptions
  • the spreadsheets main formula in one place

From this I can verify the spreadsheet is doing what’s expected. I can validate the business logic by checking the plain language descriptions. I can also validate the formulae by running through and making sure the right function is being used in the correct way.

The pricing spreadsheet has three issues. The first is a bug associated with the age vlookup. If the prospect is male, the spreadsheet returns the female rating factor and vice versa. This means it’s been quoting incorrect premiums since it was introduced. However, the second issue is gender should have been removed as a rating factor entirely. Finally, the postcode of the addresss should be used as an additional factor, but this hasn’t happened.

Rebuild from here

I don’t recommend rebuilding or correcting a spreadsheet as you go, even if the bugs are obvious. It’s far better to have a full understanding of the current state of play than to fix on the fly. Taking a hacking approach invariably introduces new bugs and problems.

Bottom line

Almost every business has a spreadsheet or two whose purpose has been lost in the mists of time. There will be a time when you have to lift the lid, look inside and try to work out what it does. If you take a disciplined approach mix Excel formulae with plain language and split things apart, you stand a much better chance of success.

Got stuck? Drop me a line

If you’re having difficulty or want to talk through getting a spreadsheet taken apart or put together, get in touch.

Privacy: I'll only use this information to reply to you.