A Weird Imagination

Formulas in Gnumeric

Posted in

The problem#

I mostly use Gnumeric for my personal spreadsheets, which are generally quite simple. But recently I wanted do some analysis that required writing a formula more complicated than just using the SUM() function and found Gnumeric's function documentation quite lacking in sufficiently detailed explanations and examples to be able to figure out how to use any of the more complicated functions.

The specific problem I was trying to solve was that I had some data on payments over time labeled with categories and wanted to summarize payments by category for each time period.

The solution#

The short version is that Gnumeric's formula language is nearly the same as Microsoft Excel's formula language, so there's no need to look for Gnumeric-specific help on writing formulas. There's plenty of advice online on writing formulas for Excel, and the information transfers without modification to writing formulas for Gnumeric.

For my specific problem, my layout was that each column was a time period and each row was a payee with column A being the name of the payee and column B being the category. I made entries further down in column A of the categories and wanted the cells in those rows to be the sum of the payments for the corresponding column's time period but only for the payees matching the category for the row. The function to do that is called SUMIF(), but that documentation page from Gnumeric is very unhelpful except for the line

This function is Excel compatible.

The Excel help page on SUMIF() has examples and a lot more explanation. Here's what the formula looks like in cell P50:

=SUMIF($B$2:$B$40,$A50,P$2:P$40)

where B2:B40 is the range where the categories are given for each payee (named in A2:A40), A50 contains the category being summed, and P2:P40 contains the actual values being conditionally summed (i.e., the values for the time period named in P1).

The details#

Excel compatibility#

The main observation is that for most spreadsheet functions, most spreadsheet programs, including Gnumeric, act the same as Microsoft Excel. Since it's the standard spreadsheet program, all of the others aim for compatibility which includes copying the core function set. Gnumeric claims to support

all of the functions from the North American edition of Microsoft Excel 2000 (TM).

The same applies to LibreOffice Calc, although its documentation is much more detailed than Gnumeric's, more like Excel's. While LibreOffice Calc is significantly more popular then Gnumeric (after all, Gnumeric only runs on Linux and similar operating systems) so LibreOffice-specific help is likely easier to find, in general when trying to get help designing a spreadsheet formula, it may be worth looking for Excel-specific help even when not using Excel.

Formula details#

The formula is a little unintuitive. The main issue is that when thinking of SUMIF() as a modified SUM(), the range being summed is both the source of the values and the cells being conditioned on. But for this usage, the cells being conditioned on are actually the categories in column B and there's no way in the condition to refer to a different cell. In a more expressive programming language, you might expect to be able to write a function that gets passed the value and index of each cell, but the "criteria" in spreadsheet formulas are much more limited than that. So, instead, SUMIF() lets you optionally specify two separate ranges: the first range is the cell to condition on, and the second parallel range is the cells with the values to actually sum if the condition is true on the corresponding cell. The middle argument is the condition, for which a cell reference means to check for string equality between that cell and the cell being checked.

Making it draggable#

That means, for every sum, the condition will be on the category names in column B, so that range B2:B40 in the first argument will be identical for every sum. Adding $ before a column/row referent in a formula makes it not change when dragging the formula, so that's written $B$2:$B$40. The data does change depending on which column the sum is in, so we leave off the $ before the P since that's the column being summed, but keep it on the rows because which row the sum is just changes which category is being summed, not where the data is coming from: P$2:P$40. And that last detail is why the middle argument looks like $A50: the category to sum is in the first column so the A doesn't change when dragging, but which category does so there's no $ before the 50. To repeat the above, all together that gives us the formula

=SUMIF($B$2:$B$40,$A50,P$2:P$40)

Pivot tables#

The observant reader may have noticed the problem I wanted to solve is already solved by pivot tables. Or you may just be correctly assuming that most simple summarization tasks can be expressed using pivot tables. And I have used a pivot table to do a very similar task in a different spreadsheet.

Mainly, I didn't use a pivot table here because I wanted the summary to be displayed alongside the data.

Comments

Have something to add? Post a comment by sending an email to comments@aweirdimagination.net. You may use Markdown for formatting.

There are no comments yet.