File SizeEarlier this year, I asked members of the Financial Modelling Group for their best tips for reducing financial model file size and got some fantastic tips. I’ve edited my original list of personal favourites to include some of the best tips and tricks and credited the authors below:

  1. File Type Save the file as .xlsx or even better, .xlsb instead of .xls. Saving your model as an Excel binary workbook will compress the file making it smaller and calculate faster. The only disadvantage is that you cannot zip the file or read the data from Power Query -which might be a deal breaker.
  2. Data Model Use it! In some cases it’s possible to drop the data sheet and just keep the (invisible) pivot cache which saves no end of space.  (Martin Goerlitz) You can prevent duplicate data storage in your PivotTable by deselecting the option “Save source data with file”. (Gary Knott) [1]
  3. Formatting Remove any unnecessary formatting or formulas, and only format the range. Check for unused conditional formatting as this can create a real mess. Inserting or copying formats can create multiple copies of a conditional format over different ranges instead of creating one conditional format over the ranges. (Anton Dworzak) Using styles instead of manual formatting is recommended. Apart from reducing storage demands, there is a real payoff when it comes to adjusting a workbook to conform to a new corporate identity. (Peter Bartholomew)
  4. Minimise the Used Range If a cell contains data which has since been removed Excel can remember the previously used range which can cause bloating. [2]
  5. Named Ranges & External Links Keep an eye on range names and make sure they don’t have external links as this can really slow things down. [3] Also remove unnecessary external links, which can often be found in named ranges causing the phantom link Keep the source file open if you must use external links.
  6. DRY Follow the #DRY Principle (Don’t Repeat Yourself) which is popular in software development and certainly applicable to modelling, not to mention other areas of life! If any calculation has to be repeated more than twice, it should either be kept in a separate cell or in a name (if the calculation is likely to return an array) and should be referenced. It should not be repeated over and over again. (Viswanathan M)
  7. Rationalise Formulas If your model has a long formula repeated many times, include a defined name that evaluates to the formula rather than using the formula directly. (Levi Bailey) The new LET function can help with this! Especially don’t repeat volatile functions. For example, if the TODAY() function is used, then you should put the today function in a cell and refer to that cell instead of repeating the function.
    Make your file grow mostly from top to bottom, or vertically. Avoid, as much as possible, letting it grow horizontally. (Djalma Rodrigues). Long formulas work faster than when split between several cells; but the obvious cost of it is reduced readability and increased risk of errors (Andrei Okhlopkovf)
  8. Keep Function Simple Functions such as LOOKUPS, INDIRECT, OFFSETS, INDEX and MATCH can use more memory than simple formulas. INDEX with a row MATCH column, which goes straight to the relevant cell, is much more efficient than VLOOKUP which has to search down the column for the first match. (Tim Allison) In a virtual meetup we ran back in May, Jonathon Power ran some calculation speed tests (see video below) to compare INDEX MATCH functions against VLOOKUP and XLOOKUP, and Wyn Hopkins also ran some tests with dynamic arrays which gave some surprising results.
  9. Manually Calculate Data Tables If your model contains a lot of data tables, which take a while to calculate, so set the Excel calculations to “Automatic except for data tables”
  10. Clean up Rubbish Remove logos, images and unused shapes or styles. Unhide hidden rows or columns, and find the end cell in each worksheet and delete everything past the last used row and column. (Yaswanth K) You can find hidden properties and other unnecessary “rubbish” in a file by using Inspect Document which you can find in the Info tab under the File menu. [4]


[1] When creating a Pivot it creates a copy of the original data that you can’t see (the pivot cache). Therefore, if you want to reduce the file size when working with Pivot tables you can either choose to delete either (1) the pivot cache or (2) the original source data. With the first option you uncheck the box ‘Save source data with file’ which then doesn’t save the pivot cache when saving the workbook. Then you can check the box to ‘Refresh data when opening the file’ which temporarily recreates the cache each time – alternatively you can refresh it to recreate it manually. With the second option, if the source data isn’t going to change and all you want are the pivot tables, then you can delete the original source data, which sometimes saves a lot of space. If you ever need to recreate the source data after deleting it then you can do this by double clicking on the grand total in the pivot anyway. (Adam Burch)
Reply: Just make sure you don’t do step (1) as well as (2). In other words, do one or the other 🙂 If you do both, your file will be tiny, but it will no longer be a PivotTable. You won’t be able to make any changes to it and you also won’t be able to drill down to the detail. (Nico Mellett)

[2] Cycle through the worksheets and see if there is excess formatting at the bottom of any of them. Sometimes the user will inadvertently apply formatting or paste something all the way down to row 1,000,000+ which can blow out the size of the file. Press Ctrl-End on each sheet, and it will take you to the last mapped cell. If this happens to be, say, cell AA1000000 then you know you have a problem. Scroll back up to where your data really ends (using Ctrl-Up Arrow). Then select the range running from the first blank row just underneath your data, all the way to the bottom of the sheet, and DELETE it all. (Press Ctrl-Shift-End, Then press Ctrl-[minus sign], and select Delete Entire Row. Save the file, close it and reopen it… and watch your 96Mb suddenly become 2Mb. Voila! (Nico MellettAndrea Lynch says she uses a macro to delete unused rows and columns.

[3] It is important to realise that Names do not hold values; they hold the formula by which the values may be recreated.  If you apply the formula to 100000 cells there is still one formula; one point of potential error, rather than 100000 copies which may or may not be the same.  That is a massive gain in terms of risk management. On the other hand, it makes little difference to computational efficiency.  If you have an expensive computational step (a search of unsorted data, say) that is used more than once, it is committing the calculation to the cells that reduces the number of evaluations. There is also a difference between referencing the name once from an array formula and multiple times from each cell of the range.  Whereas the array formula will evaluate the name once, a ‘copy right’ formula will evaluate it once per cell. (Peter Bartholomew)

[4] I would typically duplicate the model and troubleshoot with the copy by deleting one sheet per time, checking the new file size after every delete. That way I can pinpoint the trouble-maker(s). (Joseph Adewoyin) A few other considerations that may be useful; compress / remove high resolution images if applicable, delete all rows below the active section, and all columns to the right of the active section. When in trouble, delete one worksheet at the time to locate the issue (monitor files size when saving), reduce/remove repeated calcs (common for lookups etc). No calculation should be done more than once. There are a few add-ins which may be useful too, but generally speaking a few best-practice rules keep you out of trouble. Last resort tip is to start a new workbook and carefully transfer calcs without formatting, graphics or functionality. (Rickard Wärnelid)