A pair of researchers from the University of Innsbruck in Austria have developed a method to determine how well an artificial intelligence (AI) system is at understanding ‘temporal validity,’ a benchmark that could have significant implications for the use of generative AI products such as ChatGPT in the fintech sector.
Temporal validity refers to how relevant a given statement is to another statement over time. Essentially, it refers to the time-based value of paired statements. An AI being evaluated on its ability to predict temporal validity would be given a set of statements and asked to choose the one most closely related through time.
In their recently published pre-print research paper titled “Temporal Validity Change Prediction,” Georg Wenzel and Adam Jatowt use the example of a statement wherein a person is declared to be reading a book on a bus.
The researchers created a labeled dataset of training examples, which they then used to build a benchmarking task for large language models (LLMs). They chose ChatGPT as a foundational model for testing due to its popularity with end users and found it underperformed by significant margins compared to less generalized models.
CHATGPT ranks among the lower-performing models, which is consistent with other studies on TCS understanding. Its shortcomings may be due to the few-shot learning approach and a lack of knowledge about dataset specifics traits."
This indicates that situations where temporal validity plays a role in determining usefulness or accuracy — such as in generating news articles or evaluating financial markets — are likely to be handled better by targeted AI models than the more generalist services such ChatGPT.
The researchers also demonstrated that experimenting with temporal value change prediction during an LLM's training cycle has the potential to lead to higher scores on the temporal-change benchmarking task.
Related: Looking ahead: Industry insiders predict 2024 AI legal challenges
While the paper doesn’t specifically discuss implications beyond the experiment itself, one of the current limitations of generative AI systems is their lack of ability to distinguish between past and present events within a body of literature.
Teaching these systems how to determine the most relevant statements across a corpus, with timeliness being a determining factor, could revolutionize the ability for AI models to make strong real-time predictions in massive-scale sectors such as the cryptocurrency and stock markets.