Should You Be Banking on Open Source Analytics?
Banks see open source as a hotbed of innovation – and a governance nightmare. Do the rewards outweigh the risks?
This week Finance Monthly hears from Caroline Hermon, Head of Adoption of Artificial Intelligence and Machine Learning at SAS UK & Ireland, on the adoption of open source analytics in the finance sector and beyond.
Open source software used to be treated almost as a joke in the financial services sector. If you wanted to build a new system, you bought tried and tested, enterprise-grade software from a large, reputable vendor. You didn’t gamble with your customers’ trust by adopting tools written by small groups of independent programmers. Especially with no formal support contracts and no guarantees that they would continue to be maintained in the future.
Fast-forward to today, and the received wisdom seems to have turned on its head. Why invest in expensive proprietary software when you can use an open source equivalent for free? Why wait months for the official release of a new feature when you can edit the source code and add it yourself? And why lock yourself into a vendor relationship when you can create your own version of the tool and control your own destiny?
Enthusiasm for open source software is especially prevalent in business domains where innovation is the top priority. Data science is probably the most notable example. In recent years, open source languages such as R and Python have built an increasingly dominant position in the spheres of artificial intelligence and machine learning.
As a result, open source is now firmly on the agenda for decision makers at the world’s leading financial institutions. The thinking is that to drive digital transformation, their businesses need real-time insight. To gain that insight, they need AI. And to deliver AI, they need to be able to harness open source tools.
The open source trend encompasses more than just the IT department. It’s spreading to the front office too. Notably, Barclays recently revealed that it is pushing all its equities traders to learn Python. At SAS, we’ve seen numerous examples of similar initiatives across banking domains from risk management to customer intelligence. For example, we’re seeing many of our clients building their models in R rather than using traditional proprietary languages.
A fool’s paradise?
However, despite its current popularity, the open source software model is not a panacea. Banks should still have legitimate concerns about support, governance and traceability.
The code of an open source project may be available for anyone to review. But tracing the complex web of dependencies between packages can quickly become extremely complex. This poses significant risks for any financial institution that wants to build on open source software.
Essentially, if you build a credit risk model or a customer analytics application that depends on an open source package, your systems also depend on all the dependencies of that package. Each of those dependencies may be maintained by a different individual or group of developers. If they make changes to their package, and those changes introduce a bug, or break compatibility with a package further up the dependency tree, or include malicious code, there could be an impact on the functionality or integrity of your model or application.
As a result, when a bank opts for an open source approach, it either needs to put trust in a lot of people or spend a lot of time reviewing, testing and auditing changes in each package before it puts any new code into production. This can be a very significant trade-off compared to the safety of a well-tested enterprise solution from a trusted vendor. Especially because banking is a highly regulated industry, and the penalties for running insecure or noncompliant systems in production are significant.
What use is power without control?
When it comes to enterprise-scale deployment, open source analytics software also often poses governance problems of a different kind for banks.
Open source projects are typically tightly focused on solving a specific set of problems. Each project is a powerful tool designed for a specific purpose: manipulating and refining large data sets, visualising data, designing machine learning models, running distributed calculations on a cluster of servers, and so on.
This “do one thing well” philosophy aids rapid development and innovation. But it also puts the responsibility on the end user – in this case, the bank – to integrate different tools into a controlled, secure and transparent workflow.
As a result, unless banks are prepared to invest in building a robust end-to-end data science platform from the ground up, they can easily end up with a tangled string of cobbled-together tools, with manual processes filling the gaps.
This quickly becomes a nightmare when banks try to move models into production because it is almost impossible to provide the levels of traceability and auditability that regulators expect.
Language doesn’t matter
The good news is that there’s a way for banks to benefit from the key advantages of open source analytics software – its flexibility and rapid innovation – without exposing themselves to unnecessary governance-related risks.
The language a bank’s data scientists choose to write their code in shouldn’t matter. By making a clean logical separation between model design and production deployment, banks can exploit all the benefits of the latest AI tools and frameworks. At the same time, they can keep their business-critical systems under tight control.
SAS plus open source
One SAS client, a large financial services provider in the UK, recently took this exact approach. The client uses open source languages to develop machine learning models for more accurate pricing. Then it uses the SAS Platform to train and deploy models into full-scale production. As a result, model training times dropped from over an hour to just two and a half minutes. And the company now has a complete audit trail for model deployment and governance. Crucially, the ability to innovate by moving from traditional regression models to a more accurate machine learning-based approach is estimated to deliver up to £16 million in financial benefits over the next three years.