Overview on How Algorithmica’s Machine Learns
Preliminaries
Machine learning is a branch of artificial intelligence that uses statistical models to enable computers to learn from patterns in data. These patterns allow computers to make decisions without being explicitly programmed.
This method is especially potent in the dynamic realms of financial markets where traditional static models often fall short. Predicting asset price direction using Machine Learning algorithms requires a lot of number crunching, efficient coding, and computer power.
This is a very demanding process that must be carefully designed with attention to detail to produce results that are scientifically sound, practically applicable, and robust to various endogenous / exogenous risks all while possessing a high success rate through time.
From pulling raw data to deriving signals
To predict an asset’s price, we use historical data, assuming the historical behavior encapsulates the necessary dynamics we need. We then use a multi factor approach to predict price direction by analyzing underlying macroeconomic and / or fundamental factors associated with the asset to be predicted.
To predict an asset’s price, we use historical data, assuming the historical behavior encapsulates the necessary dynamics we need. We then use a multi factor approach to predict price direction by analyzing underlying macroeconomic and / or fundamental factors associated with the asset to be predicted.
The nature of factors
Our analysis focuses on macroeconomic variables, sector specific factors, and / or company specific factors. This means any factor included in the system must have an economic rationale to serve as a predictor of an asset’s price. Under no circumstances would we allow a factor to be included in the system solely for its statistical properties or purely statistical reasons.
To predict, for example, the price of gold, we have selected a dozen underlying factors which are classified into three main groups:
- Economic growth factors
- Inflation predictors
- Monetary variables
At times, some of these factors may exhibit various types of interdependencies, and the relationship between the gold price and each underlying factor may evolve over time. For instance, the well-known negative correlation between the Dollar Index and Gold Prices has temporarily turned positive several times over the past two years.
In all cases, once our team has determined the appropriate set of underlying factors, the optimization problem is handed over to our algorithm to solve. The algorithm subsequently follows a structured sequence of steps to generate prediction signals.
The nature of factors
One historically common challenge when analyzing multi-factor systems is the so called, “curse of dimensionality”. Multi-factor systems require an exponential increase in the vast amount of dimensional data volume as the number of factors expand. When the number of factors is too high, the necessary huge volume of data required to achieve solutions of reliable statistical significance, would render the problem unsolvable.
Therefore, the first step of the algorithm is to scan through the available pool of factors and perform what we internally call “sectorization” to determine groups of factors with high correlation. The algorithm would only need one representative of each group therefore immediately reducing the number of required factors.
Factor non-linear transformation and cluster identification
The algorithm’s solution to the challenge of reducing the large number of representative factors is achieved by non-linearly combining them to a smaller subset while maintaining the essential relationships between them.
This transformed factor universe often reveals clusters of high information density, which are crucial for predictive analytics. Clustering, in a reduced dimensional space, allows for the identification of groups of data points that exhibit similar financial behaviors allowing for more precise and robust predictions. This reduction helps to discern the homogeneity of the clusters, ensuring each identified cluster is statistically significant and relevant for the predictive model. The algorithm’s discovered high-density clusters have been backtested by up to three decades worth of historical data factor behavior.
Once the predictive power of the universe of clusters is confirmed, “the prediction universe” is ready for the asset’s prediction.
Cluster characterization
The algorithm uses techniques to characterize each cluster based on its properties. These clusters will be classified as active if they exceed the auto calculated threshold for possessing predictive power or dormant if they do not. This characterization allows for a predictive signal label to be associated with each cluster. These signal labels are long, short, or neutral with a corresponding confidence level of 1, 2, or 3, based on the strength of the signal. Statistically stronger clusters have stronger confidence levels.
The algorithm performs a check to assess the total number of active clusters, the areas they cover, as well as their density. The algorithm subsequently will make the final decision to label a particular universe of clusters a robust engine for signal generation of a specified asset.
This process is followed for all assets to be predicted and all the prediction horizons. Currently our horizons are weekly, monthly, quarterly and semi-annually.
Committees of prediction engines & portfolio construction
The process renders a committee of signal generation engines independent of one another. Each engine is then normalized so the confidence in the distinct generated signals can be compared. This is achieved via an auto adjustment of the cluster characterization threshold.
The committee of signal generation engines will remain intact until the system indicates it needs to be retrained. This is done when the clusters reach the characterization threshold. When the threshold is achieved, the algorithm repeats the cluster characterization process with new sets of data available since the last characterization was completed.
Having generated signals for each asset, the algorithm determines the weights for the portfolio strategies according to the risk appetite profile of the client. The risk appetite is determined in terms of maximum return, minimum drawdown, or maximum Sharpe. The associated asset weights are calculated considering the historical volatility of the asset and the cross correlation between the constituents of each strategy.
The signal generation for a new strategy is backtested for the longest possible period contingent upon data availability. Typically for a weekly prediction framework, the data goes back to 20-25 years, and for a monthly framework, the data extends to 30 years. To assess the predictive capability of the system, the last 24-36 months of data is used as an out-of-sample testing period.
Our methodology’s unique advantages
As tested and evidenced, the algorithm is very sustainable through time. It stands out not just for its innovation, but for its enduring relevance and adaptability in a rapidly evolving world.
First and foremost, the algorithm is semi-independent from external systems. What does this mean? It means that while it does require data to function, this data is readily accessible from open sources or purchased from a variety of reliable providers. This semi-independence ensures that the algorithm remains robust and flexible, capable of functioning effectively across different environments and conditions.
Moreover, the algorithmic approach is entirely data-driven and non-discretionary. This is a critical aspect because it eliminates subjectivity from the process. Every decision made by our algorithm is based on data, not opinions. This ensures fairness, consistency, and accuracy in its operations, making it an effective tool in any application.
Lastly, the algorithm is self-sustaining. It learns autonomously and operates automatically. This means, once deployed, it does not require continuous human oversight or intervention to improve. The model learns from new data, adapts, and subsequently evolves. This ensures the algorithm remains at the cutting edge of technology without necessitating constant updates or constant maintenance from our team.
In conclusion, the algorithm is not just a tool for today but a foundation for the future. Its semi-independence, data-driven nature, and self-sustaining capabilities result in a resilient and effective projection engine in any setting.
