Hi and welcome to our May update on AI for Accounting.
It’s been an incredible month (again) for AI and generative AI specifically, as it’s able to generate impressive images, text, music and much more from human (mostly language) inputs. What a time to be alive!
For images, remember the hype started with DALL·E from OpenAI, but it really exploded with the release of the free and open-source stable diffusion model. This led to a cambrian explosion of use cases and integration in games, image software, design, technical drawing and so much more. There seems to be no bound on the creativity and applications once these open models are available.
Then came the language models. LARGE language models (LLM’s). It turns out, quite surprisingly, that if you keep scaling up these LLM’s to billions, tens of billions and, yes, hundreds of billions of parameters, then at one point, spontaneously, some capabilities emerge. These capabilities do not grow linearly or even exponentially. No, the model was not able to do it in smaller (yet still huge) models before, but at a certain threshold, some remarkable properties appear in more than 100 abilities so far. In physics, this behaviour is called emergence, more popularly known as ‘the whole is more than the sum of its parts’.
Source: Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for Breakthrough Performance
Source: 137 emergent abilities of large language models
OpenAI clearly demonstrated this with GPT-4 that was able to pass many exams and tests. Microsoft even called it ‘sparks of artificial general intelligence’. Interestingly, OpenAI surprised the market not only with its model quality, but as well by also offering an API and even fine tuning capabilities, putting the raw power in the hands of developers. This ultimately forced Google’s hand, as the original inventor and frontrunner in LLM, to open up their LLM’s as well and release PaLM 2.
These are exciting times for developers, because of these great powers now easily available, but with great power comes great responsibility. Or in this case, unique challenges for deploying these models with respect to versioning, monitoring, testing bias, interoperability, etc, now coined as LLMOps. In some cases you even need a second LLM to keep an eye on what your first LLM is doing in production. Just imagine, models watching models.
But how do you apply these LLM to a domain specific task, such as accounting, or have these general LLM’s use your customer’s data or internal knowledge? Several options are available:
- Prompt engineering: you can give your general LLM a few examples of questions and expected answers before the actual user prompt and it will generalise to this new situation. But you’ll also need to very precisely describe the output you expect. It’s a powerful and widely used approach, but can be somewhat frustrating at times.
And of course, you can now have other LLM’s help you prompt your original LLM or even use custom-designed computer languages for prompt engineering. Yeah, it’s getting pretty crazy out there. - Fine-tuning: if you have hundreds of examples or a specific static knowledge base, you may use that to tune the parameters of the LLM directly and thereby create your own, unique LLM (although in the background it will actually be a superposition of the general and your custom parameters). In contrast to prompt engineering, you can reuse this model without having to prepend some custom prompt and you can use prompt engineering on top of this fine-tuned model. However the API costs for creating and running fine-tuned models are usually quite a bit heftier.
- Embed (dynamic) knowledge in a vector database: well, a lot of AI lingo to unpack here! Suppose you have a database of customer data or accounting documents that may change over time and you want to have your LLM reason over these in real time, eg. to summarise, retrieve, or use them to fill in other documents. What you can do is encode these documents in a text embedding, which you may think of as breaking them up in small text snippets and have a language model add a lot of semantic tags to them. These embeddings are then stored in a vector database which are designed for fast approximate retrieval (note that an embedding is represented by a vector, basically a list of numbers). So when your LLM wants to use some relevant information, it simply queries from that vector database any text snippets that are close to its answer and uses that to give an informed result. Sounds complicated, but it’s nothing more than allowing the LLM to use real-time (private) information to give the user a better-informed answer.
- Other solutions exist as well: Prompt tuning (think of it as something between prompt-engineering and fine-tuning), multimodal (including pictures, spoken text or other modalities in the prompt – think invoices directly), longer input prompts (Claude with up to 100k tokens, which allows a full book as input) and others will no doubt be invented soon.
With all of this, let’s not forget to typical challenges with AI:
1) a focus on data quality, where we actually use AI to assist our users in standardising their data
2) data availability requires a strong cloud platform,
3) security and privacy remain key to build trust, and
4) as always, the value is not in the AI itself, but the value we are able to bring to the user through integration in the product.
So how does all of this impact accounting? Well, GPT-3 wasn’t able to pass the CPA exam, but GPT-4 does pass with the use of a calculator. This shows the strength of an LLM as a general reasoning agent, because it can use simple tools like calculators or a python coding environment to answer questions, as well generalise remarkably well from just a few examples.
At our Silverfin Fast Forward event in early May, we showed a GPT-4 proof of concept coming out of our Silverfin Labs and that’s now live with a few of customers – more on that in a follow-up blog post. We expect there will in a few years be a divide between those technology providers that embrace AI to the core and those left behind, which is exactly why we are exploring these technologies.
Furthermore, we’re already witnessing a direct impact of generative AI on business. For example, Chegg, a billion dollar company, lost more than half (!) of its market value when the impact of generative AI became clear on its business model that provides homework help with textbooks in education. We also noticed AI companies raising hundreds of millions or billion dollar valuations, with the clear winner being Nvidia that sells AI chips – the classic shovel salesman during the gold rush.
I don’t believe in such a dramatic evolution in accountancy. If anything, there’s a growing lack of qualified accountants and AI will not replace them, but it will allow them to spend more time with their clients and focus on the human aspect instead of the tedious compliance tasks that will be more and more assisted by AI.