Teaching the Language of Finance Machinery

Financial services organizations like wealth managers, insurance companies, credit card providers, and investment/commercial banks are under more pressure than ever to digitize their operations and build consumer confidence. clients. Over the past few years, things have changed for this industry, including the best initiatives of these companies. According to Salesforce’s 2020 survey of nearly 2,800 global business leaders, implementing new technologies and improving customer loyalty have emerged as top priorities. More is expected of these organizations to offer increased personalization while providing 24/7 customer care.

Additionally, while the global COVID-19 pandemic has demonstrated just how productive a remote workforce can be and how much work a distributed team can accomplish, there is a flip side.

Remote work, isolation and self-reflection have sparked a wave of quits, transfers and turnovers. More and more, subject matter experts (SMEs) are leaving organizations and senior leaders stepping down to explore their passions. These changes may leave financial institutions at risk on several fronts, which may include:

  • Loss of market expertise, creating knowledge gaps
  • Higher work intensity, leading to errors and burnout
  • relationship attrition when client managers leave

One way to solve this problem is to grab the issue knowledge and expertise before it’s too late. The problem is that 80% of corporate knowledge lives in unstructured content created by SMEs. Formats include Word documents, PowerPoint, PDF document decks, emails and web pages, all of which are difficult to compile, so organizations rely on the minds of these experts. If there was a way to “read” and “understand” this knowledge, the risk of losing could be greatly mitigated.

How can natural language processing be used to teach the language of finance to machines?

However, teaching a machine to read is difficult – teaching it to understand is even more difficult. This is where natural language processing (NLP), a branch of artificial intelligence that seeks to give machines the ability to understand and respond to natural language, comes in handy.

The financial services industry uses complex language and concepts, so let’s look at how NLP can be used to teach machines the language of finance.

Creating a language model

The goal of a language model is to create a statistical representation of words that occur in close proximity to each other. When working with numbers, it’s easy to tell what comes next given a sequence. For example, if you are asked to guess which number comes next in the sequence 2, 4, 6, 8, it is relatively simple to teach a machine to predict the correct answer. If the machine predicts 9.5, you know it’s close and a little less than the correct answer. If he predicts 10.2, it’s even closer, but a bit high. With this feedback, the machine will eventually figure out that the correct answer is 10.

With words, the problem is a little more complex. The words do not follow a game sequence and if the machine predicts a word, it is difficult to tell if it is close or far from the desired word. For example, if you were asked to guess this word completes the sentence “Bank of America slash discovered _____,” there are potentially multiple correct answers, but context is important.

The correct assumption here is “fresh”. If we were to randomly pick words from the dictionary and apply them in this sentence, “feel” is very close to “fresh”, but that’s not a good guess because alphabetical proximity is irrelevant. However, if we were to guess “load”, then we would be close to the actual answer. In the financial literature, there is a much higher probability that “Overdraft/expense” or “Overdraft/charge” appeared next to each other than “Overdraft/feeling.” In the case of words, proximity is defined by context and a word is best understood by the society it keeps.

Language models help define the statistical probability of two words occurring near each other in a given domain.

Language models help define the statistical probability of two words occurring near each other in a given domain. The concept of domain is very important in the field of geology, the words “bank” and “river” are close to each other, but in the field of finance they are not.

A language model is built in the vector space where a word or phrase is converted into its equivalent vector, based on all the other words or phrases around it. A language model for finance can be built from scratch if a large corpus of financial texts is available or an existing language model can be extended to learn financial jargon by recycling it.

Development of a symbolic model

Consider two questions:

  1. Which bank has Jamie Dimon as CEO?
  2. Which bank has Judith Kent’s husband and Harvard Business School graduate as CEO?

The answer to both of these questions is JPMorgan Chase but the way the answer is deduced for each of these questions is different. In the first question, there is a direct link between Jamie Dimon and JPMorgan Chase via the CEO link. In the second question, we first determine Judith Kent’s husband, confirm that he went to Harvard Business School, and then use that person’s name to make the connection to the CEO of JPMorgan Chase.

Symbolic knowledge models are composed of taxonomies and recent advanced ontologies in NLP allow them to be built automatically.

A language model can answer the first question but not the second question – this is where symbolic models come in handy. Symbolic knowledge models are composed of taxonomies and ontologies. A taxonomy defines a hierarchy of entities, for example, JPMorgan Chase > Investment Bank > Bank > Financial Institution. An ontology defines the relationships between entities, for example, JPMorgan Chase > Headquarters > New York.

Traditionally, ontologies and taxonomies are usually handmade by experts working in the industry and are a labor-intensive process to build. Recent advances in NLP do allow taxonomies and ontologies to be built automatically. Patterns can pick up patterns in sentences, as shown in the examples below.

  • The average rate for a 30-year fixed mortgage is an annual percentage rate (APR) of 3.61%.
    1. APR – is_same_as – Annual Percentage Rate
  • Solana prioritizes scalability, but a relatively less decentralized and secure blockchain.
    1. Solana – HIGH_PRIORITY – scalability
    2. Solana – LOW_PRIORITY – decentralization
    3. Solana – LOW_PRIORITY – secure blockchain
  • Solana and other blockchains could grab Ethereum market share.
    1. Solana – is_a – blockchain
    2. Ethereum-is_a-blockchain

These taxonomies and ontologies may not be as robust as those handcrafted by humans, but they are a good first pass and can digest huge amounts of knowledge in a short amount of time.


Despite the challenges facing financial institutions today, the demand for personalized customer service through online or in-person experiences continues to grow. With an increasing number of the financial workforce either retiring or simply resigning due to burnout or other reasons, the industry will need to apply innovative technology to keep up with this demand. This can only be fully realized when subject matter expertise and experience is accurately captured through knowledge management and machine learning processes such as NLP.

Read more on the topic of Nesh Avatars material.

Disclaimer: This blog post was written by a third-party contributor and does not necessarily reflect the opinion of FactSet. The information in this article is not investment advice. FactSet does not endorse or recommend any investment and disclaims any liability for any consequences related directly or indirectly to any action or inaction taken based on the information in this article.

James G. Williams