How AI, alternative data and insights from psychology resulted in an innovative stock market indicator

A short walkthrough on how the combination of three exciting fields; behavioural finance, alternative data and machine learning leads to new insights in to the stock market

Did you ever notice that you are more cautious in traffic after there has been a large accident in the news? Then you are aware that sometimes the human brain takes shortcuts that behavioural economists call heuristics. Additionally, you are probably aware that alternative (big) data is giving rise to new real-time insights. Finally, the fact that artificial intelligence, AI or machine learning is rapidly changing the world probably also hasn’t gone by unnoticed. In this paper I am going to describe how these three components (behavioural psychology, alternative data and AI) can be used to create an indicator for future stock market movements. This paper is divided in to the three components already mentioned a few times, a behavioural psychology part, an alternative big data part and the AI part. Or just scroll down to see the final result compared to the S&P 500.

The psychology part

The fact that you are more cautious when you just witnessed an accident is called availability bias in the school of behavioural finance. It describes the notion that people judge the probability of something happening more likely when an example more easily comes to mind. So for example you probably overestimate the probability of a shark attack when you have just read about in the news. You could argue that the same goes for economic principles. People are more likely to believe inflation is coming when the news is full of it. Or that a recession is coming when everyone is tweeting about it. Or that a stock is going up when its all over Reddit. Point is that news can drive the market, rather than just report or follow on it. In this respect also the recent work of Nobel Prize winner Robert Shiller is relevant. His book Narrative Economics describes how stories can propel economic events. But how can one assess what people are reading, or what news they are exposed to? This is where alternative data and AI play their part.

The alternative data part

Traditional finance data is usually nice to work with, it is clean, nicely structured and most data is readily available. Unfortunately that is also it’s drawback, since everyone is looking at it, it is hard to gain an edge. Alternative data however usually is instructed and messy, but it is unique. One of the largest alternative data sources is the internet. Terabytes of data are generated every minute and with this indicator we want to take advantage of that.

Therefore at Alternative-Analytics.eu we have a datasets with financial news articles from well know websites, financial tweets and other sources of financial news combined to one large dataset with unstructured alternative data. With this data we can see to what news investors are exposed to. And here is the cool thing, if we think about our behavioural flaws discussed in the first part, we can make a prediction as to where investors’ availability bias is headed to. Put differently, is we know if investors are reading more negative or positive news, we can try to predict if their investment decisions will be positive (i.e. long) or if they will be negative (i.e. short). Unfortunately going through all this data and seeing for every news article, tweet etc whether it’s positive or negative is quite undoable. But here is the good news, we can let a computer do that and that is where the AI part comes in.

The AI part

AI is getting smarter and smarter almost daily it seems. And more importantly, big AI libraries are available to everyone with a computer running Python. Google (TensorFlow) and Facebook (PyTorch) are so kind to make their AI libraries open source, which puts some really powerful tools at the tips of your fingers. In this case we will let AI do two tasks:
- Read headlines (ML zero shot sentiment classification)
- Predict returns over a certain horizon
For this paper we will only discuss the first task, the next one is for another post.

To summarize the first two steps, first we discussed how people are influenced by what they read, second we showed that with alternative data we obtain what they read. The next (and hardest) part is actually reading and classifying. Since we have about 1000 daily news articles we have to let a computer to this, which we do by means of AI/Machine learning. We can use a model that is trained to classify a string of text as either optimistic (scoring it with a max of +1) to pessimistic (scoring it with a minimum of -1). How we do this is quite technical and will be explained in a different post, but the end result is a dataset of more than a million articles classified from +1 (optimistic) to -1 (pessimistic). Now the final step, checking whether the theory we proposed in the first part holds any truth when checking the data.

Tying it all together

So to recap, academic research shows how news can influence investor sentiment, we have alternative (big) data that contains all this news, and we have a computer that can interpret that news faster than any human can. Lets see how this all looks like in an interactive plot.

The dashed line shows the S&P500, the other one our sentiment indicator. The vertical axis shows sentiment (negative to positive) and the horizontal axis is time. You can clearly see how sentiment was already decreasing before the market collapsed under COVID fear. Interestingly you also sentiment improving before the market recovered. So at the least we can see from that this plot that there is a relation between our indicator and S&P stock prices. This shows how new technology and improvement in AI is giving us a new way for looking at the market.

Update — Since publishing of this article we have rebranded our website to www.narrative-investing.io . More info and background can be found there!

Hoping to add some creative ways of looking at quantitative finance