3 minute read

My current focus at work is developing the in house linear programming solver, mostly written in GAMS. This “Generic Algebraic Modelling System” is a high level language for describing linear optimisation problems, that it can then compile into any low level solver, such as CPLEX.

As an example, say we have 5 items, $i$, and a rucksack with capacity 100, how do we decide which items to put into the bag? Each one has a different weight, $w_{i}$ and value, $v_{i}$ and we can set up the problem by introducing a new binary variable $x_i$ to represent if we pick that item or not.

We can then write the objective function to maximise as: \(Z = x_1*v_{1} + x_2*v_{2} + x_3*v_{3} + x_4*v_{4} + x_5*v_{5}\)

subject to the constraint: \(x_1*w_{1} + x_2*w_{2} + x_3*w_{3} + x_4*w_{4} + x_5*w_{5} \le 100\)

This representation obviously does not scale well with 1000s of items, so we can refactor it in GAMS like so:

Z =e= sum(i, x(i) * v(i) )

and

sum(i, x(i) * w(i) ) =l= 100

That all seems fairly innocuous, but when you write an application to handle complex business logic it can turn into a sea of parentheses…

This issue gets worse when you discover that there are no formatting rules in GAMS! Imagine my horror when trying to understand what a thousand lines like this could be doing

equation_1(a)$(condition_1(i,j,t) and value_1(x) < value_2(x)).. sum((ab(a,b),bc(b,c),cd(c,d)), value(a,b,c,d)$(condition_2(a,b,c,d))) =l= 100

Topiary

I couldn’t find a formatter, like ruff, that could understand this GAMS code, so I made the terrible decision of trying to write my own…

Things started off well after finding this very recent (Jan 2025) blog post about a new “universal formatting engine” called Topiary. Apparently, all I needed to do was use the tree-sitter package to generate a grammar.js file and I would be off and away.

This great intro to syntax trees shows how we can define rules to turn expressions like $x*y+z$ into the syntax tree below

Because we could parse this tree as (x + y) + z or x + ( y + z) we need to explicitly tell the parser that we prefer the left option (in the grammar.js this is done with prec.left() wrapped around the rule)

In GAMS you can assign a set like x = 1, a subset of the set x(i) = 1 or an attribute x.lo(i) = 1. We can set up these rules like so:

parameter_reference: $ => seq(
	$.identifer,
	optional(seq(token("."), $.attribute)),
	optional($.indexing)
)

$.identifer: $ => /[a-zA-Z_][a-zA-Z0-9_]*/, //matches letters

$.attribute: $ => choice(
	"l",
	"lo",
	"up",
	"scale",
	"fx"
),

$.indexing: $ => seq(
	"(",
	$.identifier,
	optional(seq(",", $.identifier)),
	")"
)

which should be able to handle:

  • x as identifer
  • x(i) as identifier(index)
  • x.lo(i) as identifier.attribute(index)

However, I couldn’t vibe code my way past the issue that tree-sitter reads from left to right, meaning it reads the x in x.lo(i) and instantly assigns it as its own parameter_reference before waiting to read the whole x.lo(i)!!!

I tried changing the precedence of rules (as explained above) but nothing worked. Weirdly if I have something like x.lo(i) = y.lo(i) it parses properly on the RHS but not the LHS!!

At this point I gave up, because GAMS has a lot of syntax rules and I couldn’t even get past x.lo(i)

…and then we got access to GitHub copilot at work. I asked copilot to format my files for me and hey presto, I got something like this

equation_1(a)$(
	condition_1(i,j,t) 
	and value_1(x) < value_2(x)
	)
	.. 
	sum(
		(
			ab(a,b),
			bc(b,c),
			cd(c,d)
		),
		value(a,b,c,d)$(condition_2(a,b,c,d))
	) =l= 100

So, who needs to meticulously build a parser these days when all the languages on the internet have been modelled in an LLM?

At least I now have more respect whenever I see syntax highlighting.

As an aside, these types of syntax trees are useful for understanding grammar in natural languages (as I read in The Sense of Style)

Tags:

Published: