Forecasting crashes of the stock markets model-based log-periodic law and machine learning methods
The main parameters for the evaluation of machine learning methods, their classification. A study of the influence of these parameters on the probability of collapse of stock markets, the possibility of their use as input for machine learning methods.
Рубрика | Экономика и экономическая теория |
Вид | дипломная работа |
Язык | английский |
Дата добавления | 30.11.2016 |
Размер файла | 355,9 K |
Отправить свою хорошую работу в базу знаний просто. Используйте форму, расположенную ниже
Студенты, аспиранты, молодые ученые, использующие базу знаний в своей учебе и работе, будут вам очень благодарны.
Размещено на http://www.allbest.ru/
The contents
Abstract
Introduction
1. Literature review
1.1 Crashes
1.2 Log Periodic Power Law model
2. Methodology
2.1 Research intuition
2.2 The data
2.3 The LPPL model
2.4 Parameters estimation and basic binary regressions inference
2.5 Machine learning based classification
2.5.1 Basic binary logistic regressions
2.5.2 Stepwise binary regression estimation based on Information criterion
2.5.3 Random Forests
2.5.4 Support Vector Machine with linear kernel
2.5.5 Gradient Boosting Machine
2.6 Confusion matrices and predictive power metrics
2.7 Modeling strategy
3. Empirical results
3.1 LPPL parameters estimates significance
3.2 Test of predictive power of the model
Conclusion
References
Abstract
This paper is dedicated to the problem of prediction of crashes or more precisely to crash prediction using log periodic power law (LPPL) model parameters estimates and machine learning techniques, such is random forest, gradient boosting machine, linear support vector machine and generalized linear model (simple logits and stepwise with AIC criteria). The main question we want to answer with this research is if he LPPL parameters can be used as the precursors of crash. To answer this question we are going to check with the help of simple logits if the LPPL parameters influence the probability of crash significantly or not and if yes, then can these parameters be used as an input data for the machine learning techniques to get stable and useful predictions. Finally, we are going to compare and check if the results are true for both indices or only for developed/emerging one or the hypothesis have to rejected.
Keywords: Dow Jones crashes, MICEX crashes, LPPL parameters, stock price forecasting, machine learning
1. Introduction
Financial markets are complex systems, where different agents worldwide make decisions whether to buy or sell shares and bonds, thus directly influencing the demand and supply of securities. All of the market participants are supposed to be free with their choice of the strategy, however, actions they do affect the total price dynamics, so an individual can observe it, construct a new model, make a new forecast and correct (or even change) his strategy. In this example independence is not so obvious.
Stock market crashes influence our lives significantly. Policies, corporations and banks may lose hundreds of billiards of dollars if stock market collapses. Of course, such losses will stimulate the incentives (firms in this case) to cost minimization, causing unemployment and wage reduction. All of the above are the parameters of crisis caused by a bubble burst.
An ability to predict crashes can be useful for any decision maker. It is an opportunity to earn large profits or to smooth business cycle and manage the market in order to avoid bubble burst. If the crash is inevitable, agents at least can get ready for it and save the major amount of their financial assets.
The arguments above explain why so many papers concerning the problem of crash prediction appear. It has to be said that much work has been done on this field. Different models and a number of suggested solutions to the problem crash forecasting appeared. First, behavioral models were introduced [Sornette, Johansen, Bouchaud, 1995], then the papers dedicated to statistics and probability based models were invented, which were supposed to predict rational expectations bubbles [Johansen, Ledoit, Sornette, 2008; Sornette D., 1999] and then the researchers managed to describe the traders behavior with single agent-based model [Harras, Sornette, 2011], which had a game theoretical background. To evaluate these models new techniques were developed such as machine learning, power and exponential laws parameters fitting techniques [Filimonov, Sornette, 2013] and others.
In this paper we are going to estimate the log-periodic power law model (LPPL) parameters and use them as precursors of crash. The parameters estimates will serve as an input data for different machine-learning techniques and classifiers to detect ex-ante crashes of Russian Moscow stock exchange index (MICEX) and Dow Jones Industrial Average (DJIA). Thus, the main question we state in this paper is whether these parameters estimates have a predictive power and can be used as precursors of the crash and can we predict crashes using these parameters on Russian emerging stock market and Dow Jones sustainable one.
To answer the research question we state the following hypothesis:
We expect that the LPPL parameters estimates have the predictive power and influence significantly on the probability of the crash on the next day.
The model will provide stable crash predictions when using classifiers and machine-learning techniques with the LPPL parameters estimates as an input data.
The model will give correct predictions both on emerging MOEX and developed DJIA. Otherwise the model can not work on emerging market or does not work at all.
The paper has the following structure: in the first section (Literature review) we will discuss crashes and the LPPL model itself in order to explain the choice of LPPL as the basis of our work. For these reasons it is important to understand the nature of the model and the cases where it is particularly useful, so we will refer to the origin of the model and its evolution (here we mean methods of parameter estimates, endogenization processes, extension of the geography of predicted crashes and the model's parameters interpretation issues). The second part will be dedicated to the methodology, including data description, the methodological issues, the details of parameters estimation techniques and the machine-learning techniques usage. The rest of the paper will include results, their interpretation and their discussion with a few suggestions for future research.
stock market machine learning
2. Literature review
2.1 Crashes
We will start this chapter with a set of definitions and clarifications, which are important assumptions for modeling crashes. The problem is that there is no universal definition for a bubble and crash, and the common opinion on why crashes appear and how a bubble develops is absent as well.
For our purposes we will refer to the concept of traders' rationality, which consist of two assumptions: 1) crash is a result of local interactions among traders and a stronger an interaction is the faster a bubble appears and develops. This is also called as “herding behavior”, when one trader observe the action of his colleagues, experts, friends or any authorities who can influence his decision and act in accordance with their choices. 2) No one knows for sure will the crash happen or not (sometimes it may happen no crash at the end of the bubble), so traders continues to invest until the bubble is done [Sornette D., 1999].
The conditions above are necessary for the market to exist; the profits for some traders are supposed to exist and to be positive (not all of the traders leave the market because of the risk averse behavior). Moreover, even if all the traders know the critical date, when the bubble will end, it is still not determined what will happen then or whether to expect smooth transition into anti-bubble or abrupt crash [Brйe, Joseph, 2013; Harras, Sornette, 2011].
As for the definition of bubble itself, we will use one suggested by the same authors in JSL (Johansen-Ledoit-Sornette) studies 1999 [Sornette D., 1999]. According to them, bubble is a situation, when the market growth can be considered as faster-than-exponential. This definition seems to be much more appropriate than its competing definition concerning fundamental value of an asset (the bubble appears when the price exceeds the fundamental value of an asset). The last one is considered as classical, however it has significant drawback - to use this definition fundamental value has to be properly defined but there are no transparent constraints to what is called fundamental value.
The crash does not always happen as a result of a bubble. To identify the crash we use the value called hazard rate. This implies to the probability of crash to happen under condition that there was no crash yet. The notation for the hazard rate is , where tstands for time parameter, as this is probability at the moment of time [Sornette D., 1999; Sornette, Zhou, 2002].
A definition for a crash can be given as abnormal fall on the stock market that happens in rather short period of time (from one to seven days) and is not accompanied with unusually bad news associated with public events of high impact [Hong, Stein, 2003; Jacobsson, 2009]. The fact that crashes do not depend on news was proven by theoretically with a sample of 50 greatest crashes of 20th century [Brйe, Joseph, 2013; David M. Cultler, James M. Poterba, 1988]. The amplitude of the crash is naturally unconstrained and is proportional to the current price level; this arises from the standpoint of standard theory, where the dependent variable is not price, but its logarithm [Johansen, Sornette, 1997; Sornette D., 1999].
2.2 Log Periodic Power Law model
In this part we will observe the log periodic power law (LPPL) concept and the model development from the point of assumptions and empirical evidence for its effectiveness without discussing fundamental mathematical issues of the LPPL. This is necessary to do in order to justify our choice of the LPPL as the main instrument for crash prediction in this paper.
We will start from a short historical reference, which is necessary to include here in order to substantiate the link between the physical model (LPPL) and the economics. For the first time log-periodicity was noted in space industry and was used for engineering purposes connected with heating liquids [Sornette, Johansen, 2001]. Then, log-periodicity existence theory was the basis for a model, which was successfully applied to the earthquake prediction [Selзuk, 2004; Sornette, 2002; Sornette, Johansen, Bouchaud, 1995]. D. Sornette in his book “Financial markets and crashes” compared earthquakes with financial crashes, calling them “black swans”, which means extremely rare extraordinary events. This allowed him to construct a model based on log-periodic super-exponential price growth and to successfully apply it [Cajueiro, Tabak, Werneck, 2009; Droїdї и др., 2003; Johansen, Ledoit, Sornette, 2008; Johansen, Sornette, 1997; Johansen, Sornette, 1999; Kaizoji, 2006; Matsushita и др., 2006; Moura de, Tirnakli, Lyra, 2000; Sornette, Johansen, Bouchaud, 1995; Sornette, Zhou, 2002; Vandewalle и др., 1999].
The LPPL model can predict the end of the bubble (or so called critical date) relying on the super-exponential price dynamics inside it, however it is important to distinguish the day of the crash and the day of the bubble end. The crash may happen actually long ago or after the critical date, because the critical date stays only for the most probable crash date [Johansen, Ledoit, Sornette, 2008].
The first reason to choose the LPPL model as the main instrument for crash prediction in this paper is the model underlying assumption: it ignores the fundamental value of an asset issues. Other models were constructed with the assumption that a bubble is just the case, when an asset price is much higher than its fundamental value. This causes a serious problem, because there are no strict constraints and unambiguous definition for the fundamental value of an asset. Thus, the resulting predictions excessively depend on how one would determine fundamental value and normal price variance.
The second important feature of the LPPL model that influenced our choice is that it satisfies martingale condition, which guarantees consistence of the prediction. Extreme value theory models, for example, [Bali, 2007; Marinelli, D'Addona, Rachev, 2007] does not satisfy even semi-martingale condition, thus the predictions are often not appropriate and inconsistent [Brйe, Joseph, 2013].
An important argument for the LPPL is that it can detect bubble continuation and predict actually the most probable date of the crash. Moreover, the LPPL treats crashes as endogenous phenomena; here we mean that crashes are the result of speculative bubble termination, not the event with high impact such as war. The model relies on log-periodic distribution and super-exponential growth rate of the price, which allows us to detect invisible for other models type of bubbles - speculative endogenous bubbles. A price is considered to grow up during a bubble meeting small fluctuations on its way, so the LPPL does not contradict martingale condition. Finally, in latest research the algorithm for calibration of LPPL parameters was suggested and improved, so the goodness-of-fit increased [Sornette и др., 2013].
There is also strong empirical evidence that the LPPL can perform well on developed markets despite the data we use. It works both well, when the predictions are post factum, or the date of the crash is already known (old data) [Filimonov, Sornette, 2013; Yan и др., 2011] and ex-ante, when the crashes were foreseen by the model (documental proofs are included in referenced papers) [Bartolozzi и др., 2005; Jiang и др., 2009; Sornette и др., 2013; Sornette, Zhou, 2003; Zhou, Sornette, 2002a].
However, the LPPL is not perfect as well. First of all, it contains seven parameters and their estimation is not an easy thing to do. Then, the model is nonlinear, which also the drawback from computational issues. Moreover, the LPPL parameters have to be restricted with certain ranges in order to make the model work correct. This mechanism is called “model fit” and it can be difficult to fit ranges for some markets, especially emerging [Cajueiro, Tabak, Werneck, 2009; Filimonov, Sornette, 2013].
The latest research of Sornette and Johansen (2013) was more theoretical than empirical, where they suggested a scheme to faster model fitting. The ranges were specified and now they are easier to confine. Moreover, computational problem was solved as well by changing the model in equivalent form and reducing the number of linear parameters substituting them with linear [Filimonov, Sornette, 2013].
According to all of the above, we come to a conclusion that LPPL is the most appropriate and adequate model to crash prediction as it satisfies all the assumptions we declare in our paper (martingale condition holds true, crash endogenity, ex-ante crash prediction, different from fundamental value theory definition of a bubble) and decided to fit it to Russian stock exchange.
3. Methodology
3.1 Research intuition
The intuition for using such a strategy can be justified by mentioning two things: first is that for the LPPL model fractal patterns can be spotted [Sornette, Zhou, 2002; Sornette, Zhou, 2003; Zhou, Sornette, 2002b]. This simply means that the LPPL model can work both well for 7-10 year periods and for 0.5-2 years periods. Longer periods sometimes do not make sense as to the crash, it almost always occurs within this period. Shorter periods were not proven yet, however, we will test them as well. The second thing is that the parameters estimates seem to be very different for the bubble and pre-crash cases and for the cases, when the market just fluctuates or goes down.
Figure 1. DJIA super-exponential growth and August 24, 2015 “flash crash”
There are a lot of pictures and empirical evidences for the 4-7 and 7-10 year long estimations, so they can be found in many articles [Bolonek-Lason, Kosinski, 2011; Johansen, Ledoit, Sornette, 2008; Sornette, 2003]. Here we will show a graph that describes August 24, 2015 crash of DJIA, which is constructed on 2 years interval and is considered as one of the most powerful crashes in DJIA history.
This picture stands for the correctly calibrated model predicting the serious drawdown of DJIA 24 August, 2015, because as soon as the price predicted by the LPPL model began growing with the faster than exponential rate the crash occurred. This is why we suppose that the parameters estimates of the LPPL have significantly different values in crash and no crash periods.
The parameters of the model at the model of crash are: A~18.586; B~ -8.589; C~ -0.018; phi~ 0.439; m~ 0.006; w~ 7.780. Optimal for this model is 1.3 and R2~ 0.667. These are so-called pre-crash parameters values of the model.
According to our methodology, we are going to complete a number of such estimations on every day starting from the initial date plus 400 (maximum number of observations we take). So, we start from the initial data and estimate model for a number of times always moving for one day in the future. Thus, we are going to have vectors with the parameters estimates and identify, whether there is significant difference between pre-crash parameters values and the rest of them.
To find out that the parameters estimates of the LPPL make sense as precursors of crashes we use logit models. Moreover, we will continue our research by testing the predictive power of logits themselves.
3.2 The data
For our research we use two time series: DJIA and MICEX as was already mentioned. These indices have much in common starting from the calculation issues and finishing with the type of companies in them. MICEX is the major index of Moscow stock exchange. It consists of 50 bonds of the biggest Russian companies from 18 of December 2012. Data is being renewed once a second. MICEX is that type of indexes that is calculated by weighing the capitalization considering free-float. The MICEX value is calculated as:
(1) ,
is standing for the price for bond at the time period for the current day , and is the close price for day , is the amount of bonds being traded, is the bond weight in index, is the free-float coefficient, and is the total amount of different bonds (i.e. currently ).
Dow Jones industrial average is composed from 30 largest companies of the US. They can do their business in different spheres and industries, however, can be included in the index. The index is calculated as the sum of all 30 stocks divided by Dow divisor, which is now equal to 0.14602128057775. Thus, every 1$ change in price causes approximately 6.8 point movement. The formula for its calculation is give as follows:
(2) ,
here stands to the price of stock and is the Dow divisor.
Other characteristics of the time series used can be derived from the table below.
Table 1: Time series summary for Dow Jones (DJIA) and MICEX
Index name |
Dates |
Length (observations) |
Category |
Number of stocks |
On the market |
|
DJIA |
Sep 22, 1997 -Mar 16, 2016 |
4603 |
Developed |
30 |
May 26, 1896 |
|
MICEX |
Sep 22, 1997 -Mar 16, 2016 |
5306 |
Emerging |
50 |
Sep 22, 1997 |
The number of observations is different despite the indices are taken for the same time periods. This is the matter of holydays and other time and working issues. Generally speaking it does not affect our estimations, as we do not try to connect and synchronize the estimations of these two indices together.
3.3 The LPPL model
First, we need to describe the behavioral mechanism that Log-periodic power law is based on. It means that the LPPL is not only a kind of regression formula but as it is supposed to describe real life (sometimes irrational behavior) we need to do some admissions and explain how do we expect traders to act.
All the traders are playing in the situation of incomplete information, so there are only two things to orient: the time series of price and the other traders. As for time series, any trader in period t choose a strategy to buy or to sell and how much. The selection of a strategy goes from maximizing the expected profit. The movement in price depends on all the trader's actions or strategies . As soon as this sum is positive, the majority of traders have a buy strategy and there is an exceeded demand on an asset that means the rise in price to return to equilibrium, so the best reaction of a trader will be to buy this asset now and sell in when the price will grow higher. Totally opposite happens when the sum is negative: the price seems to be overestimated at and it will decrease by , so we have to sell an asset now by it's highest price[Sornette, 2003].
The problem here is that the traders have no information on the sign of the sum until the period has come when it's too late. So, maximum they can do (if they do not have any inside information) is to focus on their “neighbors'” actions[Harras, Sornette, 2011; Sornette D., 1999; Sornette, 2003]. Any trader has a certain amount of people (relatives, colleagues, friends etc.) who can advise him or share their opinion. Imagine that the number of such “neighbors” of trader is equal to . According to this, a strategy of trader [Harras, Sornette, 2011], which means that if the majority of neighbors plays “buy” thinking that the price will grow a trader may conclude that it the most likely will.
To take into consideration such factors as interpersonal relationships and individual qualities (respect or strength of interactions between traders) we can specify it
(3) ,
- a certain trader strategy,
- coupling strength to interact between traders,
- a group of neighbors of j trader,
- their strategies,
- trend to herding (idiosyncratic) behavior,
- random extraction from normal distribution with zero mean unit variance.
D. Sornette divides traders on rational and noise (book how to predict financial crashes) and the reason for any bubble is noise. The process goes the way when some clusters of traders with over evaluated positive expectations appears and act like a herd (herding behavior) (book how to predict financial crashes). They make a price too high and then they begin to understand it and as soon as the opposition (here a number of rational traders is more than a number of noise traders ) wins the crash occurs by sharp decline in price[Filimonov, Sornette, 2013; Harras, Sornette, 2011; Kaizoji, 2010; Sornette и др., 2013].
The second thing that is necessary to describe is the admission of martingale condition in our model. It have to be said that despite the noise behavior and bounded rationality of groups clusters and masses, individuals are thought to be rational relatively to profits they expect to gain (at least no one wants to loose money starting a game on stock exchange). This means that or expected profit, which depend on price at the future periods of time and risk factor have to be higher than the initial amount of money an individual has before the investment was done. Martingale condition for this situation (model) was formulated in the work of Johansen and Sornette [Sornette D., 1999] and looks like:.
In the limit
(4) ,
- expected change in price under the condition of no crash in (investors will play “buy” if they expect no crash in the next period and even more the expected profits have to be more than the risks associated with crash),
- strength of possible crash (or how deep will the price drop if there will be crash). Later in our model (Crash Level or clevel),
- chance that the crash will occur at ,
- price at (is already known),
- means “during the time” a period prom to .
This condition was formulated to describe a situation of bubble (a whole model works only when predicting bubbles). Since is supposed for any market, probability goes from the model as it is the percentage measure of drop in price and as there are no time machines the is expected to be always positive as well. This is very important assumption as we expect price to rise constantly when growing a bubble. This is a kind of condition of existing a bubble.
Then, in our paper we will omit detailed explanation of field theory and Ising model as Goldfeld had already made it. Moreover, we will use already derived hazard rate from JLS (Johansen-Sornette-Ledoit) model to finish LPPL derivation[Johansen, Ledoit, Sornette, 2008].
(5) ,
- constant and (because since it is negative the price will go infinity to cover such and 1 just restrict our parameter to save price from being ~0).
Then, transforming (2) we get:
(6) .
By integrating both sides we get a differential equation. Using an information about neighbor's game from above and we get.
(7)
.
We may rewrite this in easier way by changing . As at the point , this implies that , which is simple “faster than exponential growth” model [Brйe, Joseph, 2013] .
To modify this model to more difficult log periodic we have to add oscillations by applying to more serious type of interactions between neighbors in our game described and so obtaining new . As in the previous situation we are not going to describe details here, as the algorithm is absolutely the same.
Log periodic model itself looks like
(8) ,
- the price when the crash occurs (),
- is meaningful when and since it shows the decrease in per the unit of time before crash (this is the model which supposed to deal with reality and the price here may decrease in contrast to the model of expectations described above),
- the magnitude of fluctuations,
- the exponent of growth,
- the frequency of fluctuations,
- a shift parameter,
- time of crash parameter.
3.4 Parameters estimation and basic binary regressions inference
To estimate the parameters of the model (8) we use quite different from the standard one approach. First, we deal with a set of subsamples, assuming the probability of crash on the next day, not the standard crash estimation. So, we have no need in estimation of parameter, because it is already assumed. Second, we use another methodology of the parameters assessment itself.
Dealing with the first issue is easy: we substitute component with another parameter that we know: . This new parameter indicates how much time before assumed crash is left. It is easy to show, how this parameter works: we know the subsample size and is the next day, after the upper bound date of the subsample. So, in the last day of our estimations the value is 1. Then a new cycle begins as we measure a new one subsample.
The second issue is that we have chosen non-traditional optimization method for LPPL parameters: non-linear method of simulated-annealing. is a variant of simulated annealing method, which is based on Monte Carlo optimization technique [Corana и др., 1987; Romaguera и др., 1995]. The choice can be justified by convenience of this method without loosing in the quality of the estimations. Moreover, this methods is based on stochastic processes and takes into account only the values of the function, so can be classified as numeric stochastic method, which we can adjust in terms of preciseness on the results we want to get.
Because of its stochastic nature the results are not always the same and sometimes the LPPL can be estimated better and sometimes worse. We estimate the parameters on each interval several times using this method and then chose the best one.
The last one change is an implementation to the regressions a new parameter, which we considered significant in this model - (residual sum of squares). The intuition for it is obvious - the closer we are to the crash that is caused by the bubble, the better our model fits the trend, the lower should be in this case. An implementation of this parameter, however, causes no changes in the initial (8) LPPL function. Moreover, the parameter is used to choose the best estimated model from several ones estimated using method. This is not only an important crash precursor, but also a measure of the best-fit model.
The rest of this part of the methodology will be dedicated to logits and machine learning techniques, which will measure quality of the results and the predictive power of the model.
3.5 Machine learning based classification
3.5.1 Basic binary logistic regressions
Logit is a sort of binary choice model based on logistic cumulative distribution function. Analyzing logit we can derive a probability that binary dependent variable . In our case this dependent variable is crash, or in other words and explanatory variables are the parameters of the LPPL (6 of them) and the of the model that was estimated as well as the parameters on a set of rolling subsamples and intuitively is a precursor of crash too as the smaller is than higher is the probability of crash in classical model application.
Probability of crash in this model is describe with the logistic cumulative distribution function , where and ; , are the coefficients for each of the seven input parameters and stands for the constant.
This model is to be estimated using maximum likelihood method. The parameters can be interpreted as follows: the higher is the more impact the parameter has on the probability of crash. If then the higher the parameter itself is the higher is the probability of crash. Final estimates can be seen in the appendix 1.
3.5.2 Stepwise binary regression estimation based on Information criterion
The underlying mechanism of estimation for the stepwise logits is the same as for simple logits with one slight difference. Stepwise AIC logit takes into consideration AIC information criterion, when the model is computed and then creates an imaginary situation, where the least significant variables (orienting on statistics and standard errors) are thrown away, then it reassess the imaginary model and measure its AIC as well. Finally, the algorithm compares both models choosing one for which AIC is higher. The process loops until all possible combinations are compared.
3.5.3 Random Forests
Random forest (rf) is an improved decision making mechanism based on decision trees. Decision trees consist of leaves, branches and nodes. Leaves holds the values of the decision tree, branches determine the paths among leaves and connect them with nodes. There are 3 types of nodes: decision nodes (stands for the final decision to be made), probability nodes (mark probabilities for different outcomes) and end nodes (finish the tree). Simple decision making process based on trees is to just compute expected earnings from each of possible decisions to be made, compare them and choose the best.
The model (rf) is also based on decision trees, however it also has the other underlying mechanism called bagging (Bootstrap AGGregatING). Bagging is the procedure, which randomly selects from the original training set (a set of examples used for learning, which is used to fit the parameters of the classifier) of the dimension of subsets of the dimension of , , thus so called bootstrap samples are to be formed. Finally, models are fitted basing on bootstrap samples and for the classification problem combined by voting (or choosing the most popular class form the estimated models). Bagging fights in some ways an overfitting problem by decreasing the variance of the model and leaving bias untouched.
The procedure for rf is very alike, it also splits the training set randomly, compute the outcomes for each of the models and then selects the most popular class by voting. However, for the rf algorithms there exist one difference: it randomly selects not only subsamples from the training sample, but also randomly splits the set of features. For the classification problem with features the model will use only randomly selected features. The procedure of fitting rf is based on subdivision of bootstrap samples. In the appendix 3 Table 1 you can see an example of how crashes are identified from no crashes basing on and parameters. The subdivision process goes in accordance with Gini algorithm and continues until the bottom leaves of the tree have only one class crash or no crash situation. The parameters chosen could be more than 2 or 3, actually, their number is random and they are estimated randomly on subsamples from the training sample chosen by chance. The most common problem with rf is still overfitting. It happens if the model is too complex or too many regressors are involved in the process of classification with too low number of observations. Overfitted model has poor predictive power, because it just “memorize” the training data instead of learning and on the test sample fails to predict anything.
To avoid overfitting several techniques can be used. In this paper we use principal components base preprocessing of the data, which reducers the number of input independent variables taking only those of them that together explains of the total model variance [Denil, Matheson, Freitas De, 2014; Kumar, Thenmozhi, ].
3.5.4 Support Vector Machine with linear kernel
Support vector machine with linear kernel (svm Linear) is a method designed specially for binary classification. The underlying algorithm is based on the attempts to construct such hyper plane with the dimension equal to the number of features that will best of all divide one objects that we want to classify from the other. The hyper plane is to be built according to the following principle: the program weights the features trying to maximize the distance between the hyper plane and each of the end points of different class objects.
To restate this mathematically we need to introduce the following problem. Let be the space of all features that combined all together can separate one class from the other. The expression denotes mapped sample, where is the notation for one of 2 possible classes (in our case crash and no crash situation). Training data can be written as
.
The hyper plan can be denoted as , where is the vector perpendicular to the plane. If we state two parallel hyper planes through the end points of each of the classes, then find the distance between this two hyper planes and divide it by 2, then we would be able to draw our separating hyper plane the model will use for classification. The slope of this plane should maximize the distance between the two hyper planes drawn through the end points of classes [Furey и др., 2000; Tong, Koller, 2001].
The main advantage of support vector machine is that it can almost in all the cases avoid overfitting without any specific methods.
3.5.5 Gradient Boosting Machine
Gradient boosting machine (gbm) model is based on boosting, the procedure that is very similar to bagging, however with some important advances. The underlying mechanism can be described as follows:
Assume the model: , is the space of functions containing all classification trees; is the number of trees.
Our objective is: . Here is the loss function, which shows how well the function fit on the points (an analogue of for regression), is the complexity of the trees or equivalently the number of splitting points (regularization).
To find we use additive training or boosting method. Starting from 0 moment where the function is 0 we move to the moment , where the model looks like . The notation for the function added the round before is and the new function is Thus our objective at round is to find such that minimize the following .
The last equation is for continuous function, but the logic for binary is the same, but here we just use numeric optimization methods and grow trees.
The algorithm of splitting is takes just the best split solution by trying all of the variants of splitting and comparing losses. The depth of trees is regulated by greedy algorithm [Friedman, 2001; Friedman, 1999].
Finally, the number of trees is limited, so the full optimization process is impossible, which reduces the chance of overfitting.
3.6 Confusion matrices and predictive power metrics
To represent the results in the easiest form we are going to use a simple tool such as confusion matrices. You can see a typical confusion matrix in Table 2.
Table 2. Confusion matrix structure
Predicted classes |
||||
Actual classes |
||||
In the case of binary classification problem, the confusion matrix is matrix, where rows stand for actual classes and columns for predicted classes. In our case denotes crash and no crash cases. To read the confusion matrix lets discuss what each element of the matrix means:
- number of cases, when the crash was predicted correctly;
- mistakes (number of cases, when crash was not predicted, but actually occurred);
- mistake (number of cases, when crash was predicted but was not occurred);
- number of cases, when the no crash was predicted correctly;
is the number of all possible cases.
To measure the predictive power of the model the following commonly used metrics can be calculated:
;
;
.
In these metrics stands to all positive and all negative instances respectively. This metrics are bounded in the interval from 0 to 1 and the higher is the value of each of this metrics, the better is the predictive power of the model.
In this paper we selected index as a tool to measure the predictive power of our models. This choice was made because we want to measure how can we trust to the predictions of the models ( also can be interpreted as a probability that the model predicts crash correctly if it predicts it). We do not use as it is not relevant and hard to interpret here. An was omitted because this index is usually high as the models prefer to do “cautious” predictions and avoid giving crash output often.
3.7 Modeling strategy
To summarize the steps described earlier we include “modeling strategy” part in this paper. In this paragraph we are going to state the steps we plan to do while answering research question and testing hypothesis. We suppose that the LPPL parameters estimates could be thought as the precursors of crashes and can be used as an input data for the algorithms of crash prediction. Our modeling strategy can be described as follows:
We estimate the LPPL model and save its parameters values on a set of rolling subsamples starting for both indices from 29-April-1999 and shifting 1 day ahead for each new estimation and supposing that the crash is to occur the next day after the last date in the subsample. To take into account time scaling and properly define what is crash for each of the scales we repeat the parameters estimation process for different sizes of subsamples (20,30, 40 … 400 days) and for each subsample size we define 8 possible crash powers from to daily. Thus, in the end of this step we are going to receive the vectors of parameters estimates for each stated subsample size.
To show that these parameters estimates are significant and can be used for the crash prediction we construct 144 logit regression tables (for each of 9 of the subsample sizes and for each of 8 crash levels in each of the subsample sizes), where the dependent variable is crash and the independent variables are the vectors with the LPPL parameters estimates of one of the subsamples size. Here we include crash power gradation: for each subsample size we take all the declared above crash levels, so the regression table looks like in Table 2 and Table 3.
Now we see that the vectors of parameters estimates can be used as crash precursors as they significantly influence its probability to occur. To test the predictive power of these parameters estimates we are going to divide our sample into 3 parts (training sample, validation sample and test sample) and apply machine learning techniques. Training sample is s part of our time series where the model is to be trained (on this sample classification trees are to grow and weights are to be assigned) and it is taken from 1999 to 2009. Validation sample is needed to correct and advanced the work of the model. For example the architecture of the model can be changed, however weights remain constant. Validation sample is taken for the period form 2009 to 2014. Finally, working on test sample model is supposed to predict unknown future, so in the model structure nothing changes. We are going to test the model on the past 2 years period from 2014 to 2016. To do the predictions we are going to use 5 classifiers: such as simple logits, stepwise logits with AIC criteria of choice (glmStepAIC), linear support vector machine (svm linear), gradient boosting machine (gbm), random forest (rf). In order to check all the possible combinations, diminish overfitting probability and to avoid unnecessary possible noise we are going to do the procedure above in four different combinations: Using PCA data Preprocessing and excluding 2008 crisis time; using PCA data Preprocessing and including 2008 crisis time; avoiding PCA data Preprocessing and excluding 2008 crisis time; avoiding PCA data Preprocessing and including 2008 crisis time.
Finally, it have to be repeated once again that we do this for both MICEX and DJIA indices in order to understand whether everything is true and correct for both indices or only for developed/emerging markets or does not work at all for both indices. The subdivision on training, validation and test samples remains the same for both indices as they are taken for the same period.
4. Empirical results
4.1 LPPL parameters estimates significance
After analyzing 144 (2 indices each constructed over 9 different subsample sizes and in each subsample size 8 possible crash levels) logit regression tables we found out that some of the parameters estimates (sometimes different depending on the subsample size and crash power) are significant in most cases for both indices.
The first thing after the parameters estimates significance to be noticed is that the model much better fits to DJIA than to the MICEX. Despite the LPPL parameters estimates are also significant for Russian market, they explain much more variance of crash probability on DJIA. We suppose that this can be explained by the fact that DJIA is much older and sustainable index than MICEX, which is still developed and categorized as an emerging one market. However, for both indices it seems to be true that the LPPL model parameters values play a significant role in explanation of the crash probability.
Another outcome to be noticed is that for larger subsample sizes greater crashes are to be caught better. This result appears to happen because the LPPL model does not notice so small daily drops on such large subsample size, which results in poor fitting of the model and this unsatisfactory parameters estimations does not make sense and does not influence the probability of crashes.
Table 3: Logit regression for DJIA subsample size 20 and crash levels from 0.5% to
Dependent variable: crash occurrence () |
|||||||||
(1) cl = 0.5% |
(2) cl = 1.0% |
(3) cl = 1.5% |
(4) cl = 2.0% |
(5) cl = 2.5% |
(6) cl = 3.0% |
(7) cl = 3.5% |
(8) cl = 4.0% |
||
-0.265*** |
-0.271*** |
-0.158*** |
-0.141*** |
-0.104*** |
-0.075*** |
-0.074*** |
-0.050*** |
||
(0.061) |
(0.049) |
(0.038) |
(0.029) |
(0.022) |
(0.018) |
(0.015) |
(0.013) |
||
-0.266*** |
-0.271*** |
-0.159*** |
-0.142*** |
-0.105*** |
-0.075*** |
-0.074*** |
-0.050*** |
||
(0.061) |
(0.050) |
(0.038) |
(0.029) |
(0.022) |
(0.018) |
(0.015) |
(0.013) |
||
-0.164*** |
-0.153*** |
-0.097*** |
-0.068*** |
-0.039** |
-0.009 |
-0.021* |
-0.006 |
||
(0.050) |
(0.040) |
(0.031) |
(0.024) |
(0.018) |
(0.014) |
(0.012) |
(0.010) |
||
-0.001 |
0.002 |
0.002 |
0.002 |
0.001 |
0.0003 |
-0.0002 |
0.0002 |
||
(0.003) |
(0.002) |
(0.002) |
(0.001) |
(0.001) |
(0.001) |
(0.001) |
(0.001) |
||
0.046 |
-0.002 |
0.013 |
0.007 |
-0.013 |
-0.009 |
-0.005 |
-0.008 |
||
(0.045) |
(0.036) |
(0.028) |
(0.022) |
(0.016) |
(0.013) |
(0.011) |
(0.009) |
||
-0.007 |
-0.009* |
-0.006* |
-0.002 |
-0.003 |
-0.0004 |
-0.001 |
-0.002 |
||
(0.006) |
(0.005) |
(0.004) |
(0.003) |
(0.002) |
(0.002) |
(0.001) |
(0.001) |
||
-0.002*** |
-0.002*** |
-0.001*** |
-0.001*** |
-0.001*** |
-0.0005*** |
-0.0005*** |
-0.0003*** |
||
(0.001) |
(0.0004) |
(0.0003) |
(0.0002) |
(0.0002) |
(0.0001) |
(0.0001) |
(0.0001) |
||
Constant |
2.758*** |
2.711*** |
1.578*** |
1.362*** |
1.005*** |
0.712*** |
0.702*** |
0.484*** |
|
(0.565) |
(0.459) |
(0.351) |
(0.273) |
(0.206) |
(0.164) |
(0.138) |
(0.118) |
||
Obs |
2768 |
2768 |
2768 |
2768 |
2768 |
2768 |
2768 |
2768 |
|
Log Lik |
-1619.695 |
-1046.494 |
-306.206 |
396.810 |
1173.596 |
1804.344 |
2273.728 |
2721.963 |
|
AIC |
3255.390 |
2108.988 |
628.412 |
-777.620 |
-2331.193 |
-3592.687 |
-4531.456 |
-5427.926 |
|
Note:*p<0.1**p<0.05***p<0.01 |
Now we need to say a few words about how the change in the parameters values influences the probability of crash. The probability of crash in this logit model increases if the coefficient in the logit regression is positive and the parameter grows and if the coefficient in the logit regression is negative and the parameter drops, otherwise the probability of crash decreases.
In general for both indices and are the most sensitive parameters on low subsamples (20-60) days. Then also make sense and starts to be significant more and more often and looses its explanatory power.
Table 4: Logit regression for MICEX subsample size 30 and crash levels from 0.5% to 4%
Dependent variable: crash occurrence () |
|||||||||
(1) cl = 0.5% |
(2) cl = 1.0% |
(3) cl = 1.5% |
(4) cl = 2.0% |
(5) cl = 2.5% |
(6) cl = 3.0% |
(7) cl = 3.5% |
(8) cl = 4.0% |
||
-0.006 |
-0.020* |
-0.022** |
-0.020** |
-0.017** |
-0.008 |
-0.009 |
-0.004 |
||
(0.011) |
(0.010) |
(0.009) |
(0.008) |
(0.007) |
(0.006) |
(0.006) |
(0.005) |
||
-0.004 |
-0.019 |
-0.021** |
-0.019** |
-0.016** |
-0.004 |
-0.006 |
-0.003 |
||
(0.012) |
(0.011) |
(0.010) |
(0.009) |
(0.008) |
(0.007) |
(0.007) |
(0.006) |
||
-0.033 |
-0.051 |
-0.084 |
-0.046 |
-0.027 |
-0.030 |
-0.029 |
-0.044 |
||
(0.063) |
(0.059) |
(0.053) |
(0.047) |
(0.042) |
(0.037) |
(0.033) |
(0.030) |
||
-0.002 |
-0.002 |
-0.002 |
-0.002 |
-0.004 |
-0.001 |
-0.003 |
-0.002 |
||
(0.003) |
(0.003) |
(0.003) |
(0.003) |
(0.002) |
(0.002) |
(0.002) |
(0.002) |
||
0.00001 |
-0.037 |
-0.014 |
-0.044 |
-0.019 |
-0.004 |
0.007 |
0.021 |
||
(0.045) |
(0.042) |
(0.038) |
(0.034) |
(0.030) |
(0.026) |
(0.024) |
(0.021) |
||
0.005 |
0.003 |
-0.0002 |
-0.002 |
-0.003 |
-0.003 |
-0.003 |
-0.002 |
||
(0.007) |
(0.006) |
(0.006) |
(0.005) |
(0.005) |
(0.004) |
(0.004) |
(0.003) |
||
-0.0004 |
-0.0004 |
-0.0004 |
-0.0003 |
-0.0002 |
-0.0001 |
-0.0002 |
-0.0001 |
||
(0.0004) |
(0.0004) |
(0.0003) |
(0.0003) |
(0.0003) |
(0.0002) |
(0.0002) |
(0.0002) |
||
Constant |
0.358*** |
0.379*** |
0.346*** |
0.300*** |
0.248*** |
0.157*** |
0.155*** |
0.100** |
|
(0.082) |
(0.076) |
(0.069) |
(0.061) |
(0.054) |
(0.048) |
(0.043) |
(0.039) |
||
Obs |
2484 |
2484 |
2484 |
2484 |
2484 |
2484 |
2484 |
2484 |
|
Log Lik |
-1676.630 |
-1486.789 |
-1241.731 |
-935.785 |
-637.789 |
-328.910 |
-96.070 |
190.337 |
|
AIC |
3369.260 |
2989.578 |
2499.461 |
1887.571 |
1291.579 |
673.820 |
208.139 |
-364.674 |
|
Note:*p<0.1**p<0.05***p<0.01 |
To explain this we need to once again restate what the parameters mean. A parameter stands to the price of the index at the moment of crash, is the decrease in per the unit of time before crash, is the magnitude of fluctuations, is the exponent of growth, stands to the frequency of fluctuations and is a shift parameter.
Thus, for low small size subsamples the variance of the crash probability is explained mostly by the price itself, the magnitude of fluctuations and ; the smaller is the higher is the crash probability, the same is true for the magnitude of fluctuations. As to the price, this parameter is significant but its sign is difficult to explain. We suppose it alternates because of the way we estimate the LPPL: it is not always estimated on only bubbles, there are a lot of cases when the price drops insignificantly and the model explains future possible changes basing on the previous drops.
As to the high size of subsamples, we can notice that and sometimes price makes less sense, however more and more often LPPL trend manages to catch changes in shift, frequency of fluctuations and the growth exponent, thus these parameters become significant. In most cases also stays significant and influence the probability of crash.
4.2 Test of predictive power of the model
In order to test how well our models can predict crashes in the unknown future we use the (positive predictive value) indicator. This can be understood as the conditional probability of the actual crash occurrence if the model predicted it. In other words, if the model predicts that the next day the market will drop down than with probability this will be a true one prediction.
To represent the results in the most convenient way we have chosen for each crash level from daily drop in the market price the best of the results we managed to get with the constructed models. We included somewhere more than one result, because in those places top models were overfitted or there were a number of almost same good models, and we wanted to show all of them.
The selection of the model was based on the test sample results for crashes with depth from 0.5% to 1.5% and on the validation sample for crashes with the depth from 2% to 4%. This was done for not only convenience, but in order to select the best models that have real predictive power for the unknown future (models with the crash depth from 0.5% to 1.5%) and to show that there is a good potential to predict deeper crashes, however some improvements have to be done.
Table 5. Classification results: best test PPV classifiers for 0.5%-1.5% crash levels on both m
Index |
cl, % |
Size |
Method |
2008 crisis exclusion |
PCA pre-processing |
Positive Predicted Value, % |
|||
training |
validation |
testing |
|||||||
DJIA |
0.5 |
120 |
rf |
+ |
- |
100% |
13.33% |
57.14% (7*) |
|
0.5 |
200 |
rf |
- |
- |
100% |
13.33% |
50% (4) |
||
0.5 |
60 |
rf |
+ |
+ |
100% |
13.33% |
40% (20) |
||
1.0 |
120 |
rf |
+ |
- |
100% |
4.76% |
20% (5) |
||
1.0 |
90 |
rf |
+ |
+ |
100% |
0% |
50% (2) |
||
MICEX |
0.5 |
150 |
gbm |
+ |
- |
81.25% |
27.27% |
100% (2) |
|
0.5 |
60 |
rf |
+ |
- |
100% |
44.19% |
50% (12) |
||
0.5 |
40 |
rf |
- |
- |
100% |
7.5% |
47.62% (21) |
||
0.5 |
40 |
rf |
+ |
- |
100% |
6.5% |
43.48% (23) |
||
0.5 |
150 |
rf |
+ |
+ |
100% |
11.5% |
42.62% (61) |
||
0.5 |
40 |
rf |
+ |
+ |
100% |
13.2% |
40.27% (72) |
||
0.5 |
40 |
gbm |
+ |
+ |
68.42% |
1% |
40% (5) |
||
1.0 |
20 |
rf |
+ |
+ |
100% |
13% |
38.71% (31) |
||
1.0 |
20 |
rf |
+ |
- |
100% |
14% |
33.33 (6) |
||
1.0 |
60 |
rf |
+ |
+ |
100% |
24% |
31.82 (22) |
||
1.0 |
120 |
rf |
+ |
+ |
100% |
25% |
31.58 (19) |
||
1.5 |
20 |
rf |
+ |
+ |
100% |
1% |
33.33%(6) |
||
1.5 |
200 |
rf |
+ |
+ |
100% |
2% |
33.33%(3) |
||
1.5 |
200 |
rf |
- |
+ |
100% |
1% |
22.22%(9) |
||
* Number times the model made an attempt to classify an event as crash |
From Table 5 we can observe that MICEX is generally better for predictions than DJIA, however it was an expected result as DJIA was much less volatile and much more sustainable during the period covered by our test sample, thus just fewer crashes occurred with this index than with MICEX. Another important piece of information is that for both indices predictions for low power crashes are much more accurate and trustful than for high power crashes. This result is also an expected one, because in general a few crashes occurred during the period covered by our test sample. This made crashes such a rear events that are uneasy to classify for machine learning techniques. Moreover, training and validation samples for both indices also covered fewer crashes that it was needed to machine learning algorithms, which resulted in such an often rf model overfitting.
Table 6. Classification results: best validation PPV classifiers for 2%-4% crash levels on both market
Подобные документы
Entrepreneurial risk: the origins and essence. The classification of business risk. Economic characteristic of entrepreneurial risks an example of joint-stock company "Kazakhtelecom". The basic ways of the risks reduction. Methods for reducing the risks.
курсовая работа [374,8 K], добавлен 07.05.2013The stock market and economic growth: theoretical and analytical questions. Analysis of the mechanism of the financial market on the efficient allocation of resources in the economy and to define the specific role of stock market prices in the process.
дипломная работа [5,3 M], добавлен 07.07.2013The influence of the movement of refugees to the economic development of host countries. A description of the differences between forced and voluntary migration from the point of view of economic, political consequences. Supply in the labor markets.
статья [26,6 K], добавлен 19.09.2017The air transport system in Russia. Project on the development of regional air traffic. Data collection. Creation of the database. Designing a data warehouse. Mathematical Model description. Data analysis and forecasting. Applying mathematical tools.
реферат [316,2 K], добавлен 20.03.2016Law of demand and law of Supply. Elasticity of supply and demand. Models of market and its impact on productivity. Kinds of market competition, methods of regulation of market. Indirect method of market regulation, tax, the governmental price control.
реферат [8,7 K], добавлен 25.11.2009Antitrust regulation of monopolies. The formation and methods of antitrust policy in Russia. Several key areas of antitrust policy: stimulating entrepreneurship, the development of competition began, organizational and legal support for antitrust policy.
эссе [39,2 K], добавлен 04.06.2012Financial bubble - a phenomenon on the financial market, when the assessments of people exceed the fair price. The description of key figures of financial bubble. Methods of predicting the emergence of financial bubbles, their use in different situations.
реферат [90,0 K], добавлен 14.02.2016The essence of Natural Monopoly. The necessity of regulation over Natural Monopoly. Methods of state regulation over the Natural Monopolies. Analysis and Uzbek practice of regulation over Monopolies. Natural Monopolies in modern Economy of Uzbekistan.
курсовая работа [307,7 K], добавлен 13.03.2014Prospects for reformation of economic and legal mechanisms of subsoil use in Ukraine. Application of cyclically oriented forecasting: modern approaches to business management. Preconditions and perspectives of Ukrainian energy market development.
статья [770,0 K], добавлен 26.05.2015Estimate risk-neutral probabilities and the rational for its application. Empirical results of predictive power assessment for risk-neutral probabilities as well as their comparisons with stock-implied probabilities defined as in Samuelson and Rosenthal.
дипломная работа [549,4 K], добавлен 02.11.2015