Retrieval-Augmented Generation on Financial Statements
It is now the beginning of big unstructured data. No one talks about it, but it is happening with the advancement of LLMs.
Despite their popularity, most LLMs can only provide information up to the date they were trained on, and they cannot access internal knowledge, as such limiting use cases in most corporations.
Retrieval-Augmented Generation (RAG) has been a widely used technique that conditions a large language model (LLM) to generate output based on a given document. It’s found out this prompting technique could effectively provide a model with the latest information that it is trained on and reduce hallucination.
Technically, RAG involves a retriever that retrieves top k relevant documents from a database via similarity metrics, and passes them to a LLM. With the capability of LLM to understand user intention, and synthesise answers from different sources, what commonly prevents LLM from achieving its full potential is usually the quality of retrieval.
Challenges in RAG and in Financial Statements
In general, there are a few factors that impact the quality of retrieval:
Quality of Data: is the data clean, and in a format that LLM can understand?
Quality of Retrieval: is all relevant information retrieved?
Translating these factors into the settings of financial statements:
Extraction: Financial statements are commonly printed in hierarchical layout with complicated tables, and text.
Retrieval of Table: while embedding of text is natural to retriever, table is not a natural input to retriever.
Complex Linkage of Financial Statements: a section in a financial statement cannot be read alone because information is interconnected. For example, when one wants to perform financial analysis, one cannot just look at the balance sheet without looking at the income statement.
How SuperAcc Can Help
SuperAcc brings over 5 years of experience in parsing various financial statements. Not only does SuperAcc perform document understanding extraction with states of the arts computer vision and natural language processing model, but also does SuperAcc analyse a document, with the help of machine learning models with domain knowledge.
Superacc understands a document like a human, and also understands financial statements like an accountant. Using standard solutions fails to recognise tables as a key format in financial statements, and fails to respect the domain knowledge required when reading financial statements. By using SuperAcc to perform ETL (Extract, Transform, Load) on financial statements, we can pass along these knowledge over to LLM.
Challenges | Solutions |
---|---|
Extraction | Using computer vision based approach to understand semantic structure of a document, including complicated table structure. |
Retrieval of Table | Instead of treating a table as text, we capture the title of a table, and table key items to form a description of a table. |
Complex Linkage of Financial Statements | With our proprietary machine learning models to understand financial statements, we build linkages among sections, which are used to augment the retrieval process. |
Unleashing the Potential of RAG in Financial Documents
Our demo unveils the potential of applying RAG on complex documents like financial documents as long as the ETL pipeline is appropriately designed and implemented.
Instruction (Company used is CK Asset Holding Limited in 2021.) | Raw Response (No edit is made) |
---|---|
Perform financial analysis of the company. | Based on the information provided, here are some key financial analysis points for the company: 1. Revenue: The company's revenue increased from $57,623 million in 2020 to $62,094 million in 2021, representing a growth of 7.8%. The share of revenue from joint ventures also increased significantly from $13,709 million in 2020 to $21,147 million in 2021. 2. Profitability: The company's profit before taxation increased from $24,594 million in 2020 to $30,061 million in 2021, representing a growth of 22.3%. The profit for the year also increased from $17,789 million in 2020 to $21,943 million in 2021, representing a growth of 23.4%. The earnings per share also increased from $4.42 in 2020 to $5.77 in 2021. 3. Assets: The company's non-current assets decreased slightly from $327,968 million in 2020 to $322,590 million in 2021. The fixed assets and investment properties also decreased from $95,101 million in 2020 to $76,444 million in 2021 and from $128,683 million in 2020 to $132,324 million in 2021, respectively. 4. Liabilities: The company's current liabilities increased from $70,823 million in 2020 to $78,515 million in 2021, representing a growth of 10.8%. The non-current liabilities also increased from $82,662 million in 2020 to $91,921 million in 2021, representing a growth of 11.2%. 5. Cash flow: The company's net cash from operating activities decreased slightly from $24,297 million in 2020 to $25,358 million in 2021. The net cash used in investing activities decreased significantly from $7,264 million in 2020 to $3,771 million in 2021. The net cash used in financing activities also decreased from $19,380 million in 2020 to $17,807 million in 2021. The company's cash and cash equivalents increased from $58,214 million in 2020 to $62,567 million in 2021. Overall, the company's financial performance improved in 2021 compared to 2020, with significant growth in revenue and profitability. However, the company's assets decreased slightly, and its liabilities increased, which may indicate some financial risk. The company's cash flow also improved, with a significant decrease in net cash used in investing activities. |
Calculate quick ratio of the company | To calculate the quick ratio of the company, we need to use the following formula: Quick Ratio = (Current Assets - Inventory - Prepayments) / Current Liabilities From the given information, we can see that the company has not provided the exact values for inventory and prepayments separately. However, we can estimate the quick ratio by assuming that the entire amount of properties for sale and aircraft assets for sale are inventory and deducting them from the current assets. Therefore, the quick ratio of the company can be calculated as follows: Quick Ratio = (Current Assets - Inventory - Prepayments) / Current Liabilities Quick Ratio = ((234,121 - 159,230) - 8,086) / 78,515 Quick Ratio = 66,805 / 78,515 Quick Ratio = 0.85 Therefore, the quick ratio of the company is 0.85. This indicates that the company may have some difficulty in meeting its short-term obligations as it has only $0.85 of liquid assets available to cover each dollar of current liabilities. |
What is the ESG policy of the company? | Based on the information provided, the company has a strong commitment to sustainability and has established a board-level Sustainability Committee to oversee the Group's overall sustainability strategy. The company also takes into consideration the impact of its operations on employees, customers, the community, and the environment. During the COVID-19 pandemic, the company implemented various precautionary measures to maintain the stability of its operations while protecting the health and safety of its employees. The company also provides quality food options that support a healthy balanced diet and collaborates closely with its suppliers to ensure eco-friendly and sustainable products are preferred in its procurement activity. The Board of Directors is ultimately responsible for sustainability and is engaged in formulating related strategies and policies which provide direction and guidance to the Group. The company publishes an annual standalone Sustainability Report to provide transparency and accountability to stakeholders. In terms of corporate governance, the company believes that good corporate governance provides a framework that is essential for effective management, a healthy corporate culture, successful business growth, and enhancing shareholders' value. The company has applied the principles and complied with all code provisions (except as stated below) and, where applicable, the recommended best practices of the Corporate Governance Code. The company has a quality Board, sound internal controls, and transparency and accountability to all shareholders. The positions of the Chairman of the Board and the Managing Director are held by the same individual, but the Board is of the view that this is the most appropriate arrangement in the interest of the shareholders as a whole at present. The Board comprises six Independent Non-executive Directors who will continue to provide their views and comments to the Chairman and Managing Director. Furthermore, the Senior Advisor of the Company, Mr. Li Ka-shing, has continued to contribute to the Group on significant matters. |
The Full Potential of RAG
The power of AI always lies in its ability to augment human productivity with scalability. With SuperAcc and RAG, thousands of financial reports across multiple years are pre-processed and stored in a vector database. This process needs to be done once, and does not require users to specify any requirements beforehand.
Users can query whatever information on demand and what it takes is a meaningful question. It is very relevant to equity analyst/ credit analyst/ auditor/ CFO who wants to understand quantitatively or qualitatively the industry trend in no time.
Users can also collate a standard set of questions for feature extraction for quantitative modeling, for example, credit scoring, and predicting return. With this dataset, a quant analyst can easily perform feature engineering by prompt engineering and explore features that had been hard to explore in the past before the era of LLM.
Big data is a term coined in the early 2000s, and has empowered the big success of loads of social media and e-commerce. It’s mostly about big structured data. However, big structured data has lost its comparative advantage over time, because everyone is doing it, and everyone knows how to do it well.
It is now the beginning of big unstructured data. No one talks about it, but it is happening with the advancement of LLMs. It will empower another generation of companies who could know how to use LLM to mine hidden information at scale.