I was amused to read that a mathematical method that I first learned as an undergraduate has been found to help make AI models more processing efficient. The jump is pretty significant, if the theories hold in practice: a drop in 50x power consumption. This translates into huge cost savings: some estimate that the daily electric bill for running ChatGPT v3.5 is $700,000.

The method is called matrix multiplication and you can find a nice explanation here if you really want to learn what it is. MM is at the core of many mathematical models, and while I was in school we didn’t have the kind of computers (or built-in to our digital spreadsheets or in Python code) to make this easier, so we had to do these by hand as we were walking miles uphill to and from school in the snow.

MM dates back to the early 1800s when a French mathematician Jacques Binet figured it out. It became the foundational concept of linear algebra, something taught to math, engineering and science majors early on in their college careers.

The researchers figured out that, with the right custom silicon, they could run a billion-parameter model for about 13 watts. How do you make the connection between the AI models and MM? Well, your models are using words, and each word is represented by some random number, which are then organized into matrices. You do the MM to create phrases and figure out the relationships between adjacent words. Sounds easy, no?

Well, imagine that you have to do these multiplications a gazillion times. That adds up to a lot of processing. The researchers figured out a clever way to reduce the multiplications to simple addition, and then designed a special chipset that was optimized accordingly for these operations.

It is a pretty amazing story, and just shows you the gains that AI is making literally at the speed of light. It also shows you how some foundational math concepts are still valid in the modern era.