Главная Коллекция "Otherreferats" Банковское, биржевое дело и страхование The stock market price forecasting based on combined methods of machine learning

The stock market price forecasting based on combined methods of machine learning

Analysis of the problems of developing adequate trading strategy based on predicting the future values of stocks and indices, using linear and nonlinear models ARIMA, ANN. Assessing the impact of the merger on the method of projections for both indices.

Рубрика	Банковское, биржевое дело и страхование
Вид	курсовая работа
Язык	английский
Дата добавления	30.08.2016
Размер файла	669,4 K

посмотреть текст работы

скачать работу можно здесь

полная информация о работе

весь список подобных работ

Отправить свою хорошую работу в базу знаний просто. Используйте форму, расположенную ниже

Студенты, аспиранты, молодые ученые, использующие базу знаний в своей учебе и работе, будут вам очень благодарны.

Страница:

Размещено на http://www.allbest.ru/

Introduction

In recent years, due to the development of online trading, the stock market has become a key pattern for earning profit for many investors. Availability of having successful assets and making money can provide comfortable way of living. Thus, it is obvious that so much importance dedicated to the future trends and values on financial markets. A lot of models and techniques were proposed and implemented to variety of markets.

There are different market players who are interested in the reliable forecasted data due to high return on investment, as even small investors can earn good profits. Banks, entrepreneurs, international corporations every day face with a problem of uncertainty in financial values and making the predictions. Such uncertainty can appear due to unstable trends, as there are too many factors, affecting stock's prices and indices. It is appealing therefore if we can create the appropriate model with low probability of mistake and offer the best trading strategy that based on created model. Taking into account the proposed decisions, investors can decide when and where to invest their money.

To construct the appropriate model, scientists developed mathematical rules in order to predict future value of the stock. These rules are called technical analysis. Technical analysis is the term given to the general construction of forecasting the future movements of equity prices through the study of historical market price data. Technical analysis relies on the principle that patterns and trends exist in markets, and that they can be identified and exploited to predict price movements in the near future. Nowadays, the most widespread and effective approach that presents the most accurate results is fusion machine learning technique.

However, despite the huge amount of various models in technical analysis that can achieve quite accurate predictions, there is a few evidence of developing effective trading strategy. Trading strategies are based on results from technical analysis, the individual features of each trader, and allow determining the most profitable point of entry into the market and the best time to exit from it. Thus, technical analysis is a useful tool for price forecasting and decision support in financial markets.

Nowadays the most common trading rule is the buy-and-hold strategy. The `buy-and-hold' strategy is, for a given trading period, to buy the stock at the beginning of the period, and sell at the end - hence, always a good strategy in an upwardly moving market, and far simpler than using technical indicators and trading systems.

Nevertheless, it is known that buy-and-hold strategy is a passive investment strategy without active share purchase after the bag is formed, before the end of the investment horizon. Thus, new trading rules are required to invent. The most widespread trading rules use technical indicators. Commonly, a basis of oscillators and trend-following indicators is taken. Type of indicator depends on the preference setting. Most existing studies in financial markets use moving average trading rules and oscillators. Nevertheless, a small amount of research in this area seems to be able to find strategies that outperform buy-and-hold due to trading barriers, like transaction costs.

To sum up, as it was written above, forecasting trading data and developing trading strategies are quite discussed and relevant topics nowadays not only for scientists, but for many market players too. It was found that there is too short amount of studies that discover predictions from machine learning to develop the best trading strategy. In addition, the results of outperforming buy-and-hold strategy are not quite promising. Moreover, no research was found that investigates Russian stock exchange in terms of building trading strategies for stocks or indices. The stock behavior on each market is different and has its own features, thus the investigation of this topic is needed. Thus, our general research question is:

Does the new trading strategy based on technical indicators and predicted values from machine learning technique outperform simple buy-and-hold strategy for Russian and American markets?

Moreover, it is crucial for us to investigate the prediction ability of machine learning technique, despite the huge amount of research related to this topic. The models of machine learning technique did not prove their accuracy for Russian data.

As a result, we expect that the new trading strategy based on technical indicators and predicted values from machine learning technique outperform simple buy-and-hold strategy for Russian market. In addition, we compare our results for Russian data with similar American index to investigate the behavior of same trading strategy on different market. As for prediction ability, we expect that forecasts from fusion machine learning are more accurate than individual predictions.

The rest of the paper is composed of five sections. Section 2 presents the related studies on using machine learning techniques and its implementation to trading strategies. Section 3 introduces the fundamentals of chosen machine learning techniques as well as technical indicators and trading rules. Section 4 presents the results of the experiments conducted on two data frames. Section 5 gives a summary of the study, discusses the results and concludes with the future directions.

1. Literature review

Stock markets are quite complicated real world areas where a huge number of factors influences the movement of the stocks and prices. Basically, there are two methods to predict trends in markets and decide the timing of buying and selling stocks, i.e., fundamental analysis and technical analysis. Fundamental analysis is based on the financial statement reported by companies, economic trends of domestic and international environments, international relationships, and so on.

Technical analysts believe that most information about the stocks is reflected in recent prices and so if trends in the movements are observed then prices can be easily predicted. In other words, instead of usage security's intrinsic value and some macroeconomic patterns the market behavior is predicted by past stock prices and indices. It seems quite effective, as some uncertain factors like political situation or corporation's image will reflect in future data.

However, considerable scientific research was provided to investigate effectivity of technical analysis of the stocks and future markets. For instance, supporters of random walk theory supposed that price fluctuations occur randomly; therefore, it is futile for technical analysts trying to predict the future based on previous price action [Balsara, Chen, Zheng, 2007]. P. Samuelson and E. Fama made the first works in this field [Fama, 1965; Fama, 1970; Samuelson, 1965]. Furthermore, mechanical trading rules that are implemented to stock prices do not outperform a simple buy-and-hold strategy (to buy the stock at the beginning of the period, and sell at the end) [Anderson, 1989].

In contrast, many controversial arguments were made in favor of technical analysis. One of the first supporters was P. Cootner, who wrote about the reflecting barriers model for price changes [Cootner, 1962]. His model ruins the theory that price changes are purely random. S. Anderson said in his research that Coonter's model got the new opportunities for technical analysis and the use of past price information may not be as futile as many believe. In addition, more late studies by R. Sweeney [Sweeney, 1988] and Brock et al. [Brock, Lakonishok, LeBaron, 1992] suggested that these arguments on the futility of technical analysis might have been premature and not entirely accurate. All in all, a lot of present articles are based on technical analysis.

Moreover, technical analysis includes a lot of basic models for forecasting future market data. They can be divided on linear and non-linear models. When technical analysis was firstly implemented and discovered, the most popular model was autoregressive integrated moving average (ARIMA) [Balsara, Chen, Zheng, 2007; Khashei, Bijari, 2011; Khashei, Bijari, Raissi Ardali, 2009; Pai, Lin, 2005b; Taskaya-Temizel, Casey, 2005]. It was used, for instance, in the research of N. Balsara to prove the superiority of technical analysis [Balsara, Chen, Zheng, 2007]. In this article, the autoregressive techniques were compared to naпve model, based on the random walk assumption. The result was that the ARIMA forecasting model presented more accurate forecasts as compared to the naпve Bayes modeling.

With the performance of ARIMA modeling the researchers started to compare it with other methodologies, non-linear techniques. There are a lot of them and now it is a perspective field of knowledge. The most widespread techniques nowadays are artificial neural networks (ANNs), support vector machine (SVM) and random forests (RF). J. Patel, S. Shah, P. Thakkar et al. used ANN, Support Vector Machine (SVM), random forest and naive-Bayes with two approaches for input to these models. The results show that RF and naпve - Bayes techniques outperformed others and accuracy of all these models was improved significantly when they were learnt through trend deterministic data. M. Hassan, B. Nath, M. Kirley proposed and implemented a fusion model by combining the hidden Markov model (HMM), ANN and Genetic Algorithms (GA) to forecast financial market behavior [Hassan, Nath, Kirley, 2007]. Moreover, a lot of studies used various types of ANN to predict accurately the stock price return and the direction of its movement. ANN provided promising results in forecasting the stock price return [Karaatli и др., 2005; Olson, Mossman, 2003; Yoon, Swales, 1991]. In the research by [Leung, Daouk, Chen, 2000], various prediction models were compared: techniques based on multivariate classification techniques and techniques with a number of parametric and nonparametric models, which predict the trends of the index. Empirical results suggested that the classification models outperformed the level estimation models (adaptive exponential smoothing, multivariate transfer function and multilayered feed forward neural network) in terms of predicting the direction of the stock market movement and maximizing returns from investment trading. E. Altay compared the forecast results of neural network models with the Ordinary Least Square (OLS) regression model for ISE-30 and ISE-All Indexes [Altay, Satman, 2005]. Although the prediction performance of neural network models for daily and monthly data failed to outperform the liner regression model, these models are able to predict the direction of the indexes more accurately.

Random forest (RF) is the widespread algorithm used for classification tasks, regression and clustering [Creamer, Freund, 2004]. C. Creamer and K. Freud in their study chose random forest regression technique for predicting performance and evaluating corporate governance risk in the case of Latin American markets (Creamer & Freund, 2004. They spend tenfold cross-validation experiments on one sample of Latin American Depository Receipts (ADRs), and on another sample of Latin American banks. From the comparison of random forest with logistic regression, more accurate results were taken from fandom forest model [Poel den, Lariviere, 2004]. D. den Poel, B. Lariviere, used random forest regression technique for investigating both customer retention and profitability outcomes. The authors analyzed a real-life sample of 100,000 customers taken from the data warehouse of a large European financial services company. The research findings demonstrate that random forests techniques provide better fit for the estimation and validation sample compared to ordinary linear regression and logistic regression models.

In addition, since various of forecasting models have been developed, it is crucial to achieve better prediction results. In order to do this, hybrid and combined models are implemented. Recently the fusion machine learning models have appeared. T. Chen, S. Lee wrote about a weighted Least Squares Support Vector Machine (LS-SVM) with the concept of k-nearest neighbors and mutual information. A two-stage architecture was developed by S. Hsu, J. Hsein, T. Chin et al. The results suggested that the two stage architecture provided a promising alternative for stock price prediction [Hsu и др., 2009]. G. Huang, S. Song et al. provided a review about extreme machine learning (ELM) [Huang и др., 2015] and W. Zhang, H. Ji, G. Liao et al. proposed a novel ELM called ELM + which introduces the privileged information to the traditional ELM method. This privileged information, which is ignored by the classical ELM but often exists in human behavior, would optimize the training stage by constructing a set of correcting functions. J. Patel, S. Shah, P. Thakkar et al. created fusion SVR-ANN, SVR-RF and SVR-SVR for two indices namely CNX Nifty and S&P Bombay Stock Exchange (BSE) Sensex from Indian stock markets. Ten technical indicators were selected as the inputs to each of the prediction models. The results showed that two stage hybrid models perform better than the single stage prediction models. The performance improvement is significant in case when ANN and RF are hybridized with SVR.

However, despite the various of presented models, it is still not justified when and how we can use predicted values to make our decisions on the market. There were just several works when the autoregressive modeling was applied to the market data and the best trading rule was computed. The past works suggested that the researchers had a problem to find optimal strategy on the market. For instance, S. Anderson developed simple P.H. Cootner model of reflecting barriers. He compared PRO strategy (investors purchase only when share price is low relative to NAV (net asset value) and sell when the price rises relative to NAV), REVERSE strategy (non-professional investors adopt a random strategy) and buy-and-hold strategy. The three separate tests were made to investigate the profitability of the strategies, and returns for the PRO strategies exceeded a buy-and-hold strategy in 166 of the 168 trials [Anderson, 1989]. However, the transaction costs were not included in this topic.

In addition, technical indicators have been applied to various financial markets, i.e., stock markets [Zhang, Wu, 2009] exchange markets [Cialenco, Protopapadakis, 2011] to describe the price features or to forecast price trends. Among these indicators, moving averages are the most popular indicators widely used in trading strategy optimization for financial markets [Boboc, Dinic\ua, 2013; Cheung, Lam, Yeung, 2011; Dewachter, Lyrio, 2005] because they can help predict the price changes and are easy to implement with actual investments. Traders use two moving averages of different lengths to forecast the price trends. This paper chooses moving average and momentum indicators as the basic indicators to describe the price changes of oil markets and to inform trading decisions.

D. Lohpedh and D. Corne claimed that today the most successful trading rules are moving averages (the mean price for a given stock or index over a given recent period) and relative strength indicators (a function of the ratio of recent upward movements to recent downward movements) [Lohpetch, Corne, 2010]. In their research, they tried to find trading strategy that can outperform simple trading rule. As a result, they investigated that superiority of buy-and - hold can be achieved even for daily trading, but as there was a movement from monthly to daily trading, the performance of evolved rules becomes increasingly dependent on prevailing market conditions.

Recently H. Zhu, Z. Jiang, S. Li et al. have made their research using Shanghai Stock Exchange Composite Index (SHCI) from May 21, 1992 through December 31, 2013 and Shenzhen Stock Exchange Component Index (SZCI) from April 3, 1991 through December 31, 2013. The t-test was adopted to check whether the mean return is higher for moving average (MA) and trading range break (TRB) compared to buy-and-hold strategies. The researchers have found that TRB rules outperform MA rules and short-term variable moving average (VMA) rules outperform long-term VMA rule. In addition, the best trading rule outperforms the buy-and-hold strategy when transaction costs were not taken into consideration. If transaction costs were included, trading profits will be eliminated completely.

In addition, the same results were found for European and American markets by R. Hudson, M. Dempsey, K. Keasey. They applied technical trading rules to the Financial Times Industrial Ordinary Index from 1935 to 1994 and found that these strategies outperformed buy-and-hold, but trading profits would be eliminated by the inclusion of transaction costs [Hudson, Dempsey, Keasey, 1996]. Moreover, T. Coe and K. Laosethakul used the arithmetic moving average, the relative strength index, a stochastic oscillator and its moving average for 576 stocks, included S&P 100, the NASDAQ 100 and the S&P Midcap 400 indices. As a result they found that none of these technical trading rules could surpass the market [Coe, Laosethakul, 2010]. Choe et al. had the strong arguments against technical trading rules in G-7 stock markets (Canada, France, Germany, Italy, Japan, United Kingdom and United States) [Choe, Krausz, Nam, 2011].

In Asian markets, it was found that trading rules have stronger predictive power in the emerging stock markets, for instance for Malaysia, Thailand, Indonesia, and the Philippines. P. Ahmed, K. Beck, and E. Goldreyer discovered the accuracy of short-term variable moving average (VMA) trading rules in three volatile and declining Asian stock markets s from 1994 to 1999 [Ahmed, Beck, Goldreyer, 2005]. The investigation supported the predictive ability even with the inclusion of transaction costs, despite the results for the developed US or Japan markets were different.

As the trading strategies and rules are often realized in Forex, several research investigates Forex data. For example, M. Ozturk, I. Toroslu presented the heuristic trading system, which is developed using popular technical indicators. EUR/USD is the most traded currency pair in FX market and authors took this type of data in their research. 24 technical indicators were used that related to different types of indicators. As for trading rules, they were crossover rule, bollinger bands rule and divergence. The selection of the trading rules is realized by using Genetic algorithm and a greedy search heuristic. A weighted majority voting method is proposed to combine the technical indicator based trading rules to form a single trading rule. The experiments are conducted on 2 major currency pairs in 3 different time frames where promising results were achieved [Ozturk, Toroslu, Fidan, 2016].

In the related literature, there is no common view on the effectiveness of strategies, based on predictions using technical analysis. Some researchers found that trading rules based on indicators outperform simple buy-and-hold strategy [Lohpetch, Corne, 2010; Shilling, 1992; Tian, Wan, Guo, 2003] in terms of returns and profits, although others insist on the evidence that indicators often have a mistakes and it is risky to rely on them. The results also are different due to chosen data: using more stable indices and prices commonly demonstrates more accurate future predictions and, thereafter, it causes adequate trading strategies.

2. Methodology

2.1 Time series forecasting technique

In this section, there is a brief description of used individual models. In our research, we constructed fusion modeling based on both linear and non-linear models. Nowadays, the typical proposed models are ARIMA and ANN, their application to time series forecasting has been estimated with good prediction performance. In paper, written by P.F. Pai and C.S. Lin real data sets of stock prices were used to examine the forecasting accuracy of the hybrid ARIMA and support vector machines model [Pai, Lin, 2005b]. M. Khashei, M. Bijari wrote successful paper about a novel hybridization of artificial neural networks and ARIMA models for time series forecasting.

Thus, we expect that the fusion modeling based on predictions from both ARIMA and ANN models get the accurate results and provide effective trading strategy. In our research, firstly we implemented individual ARIMA and ANN models to get future predictions. Next, trading indicators should be constructed that can indicate future trends. In the end, all this data was input to random forest as regression and the technique showed the final forecasts that were used in constructing trading strategies.

The rest of the section describes the theoretical framework of chosen methods and concludes with the description of random forest, regression that was chosen to create fusion technique.

The auto-regressive integrated moving average models

The ARIMA model was proposed by J. Box and G. Jenkins and it has dominated many areas of time series forecasting [Khashei, Bijari, 2011; Цmer Faruk, 2010; Tan и др., 2010]. In an ARIMA model, the future value of a variable is assumed a linear function of several past observations and random errors.

The description of ARIMA modeling is used quite often in various books and studies. In our work, we used the description of the model presented in the articles written by Khashei & Bijari, 2011; Pai & Lin, 2005b; Zou, Xia, Yang, & Wang, 2007.

Firstly, we should define the time series of data where is an integer index and the are real numbers, then an ARMA model is given by equation 1.

(1)

In equation 1, is the function of time shift operator at one time period back, the are the parameters of the autoregressive part of the model, the are the parameters of the moving average part and the are error terms. The error terms are generally assumed to be independent, identically distributed variables sampled from a normal distribution with zero mean.

Assume now that the polynomial has a unitary root of multiplicity. Then the equation 1 can be rewritten as in equation 2.

(2)

An ARIMA process expresses this polynomial factorization property with , and is given by equation 3.

(3)

Equation 3 is a particular case of an ARMA process having the autoregressive polynomial with unit roots.

(4)

Equation 4 defines an ARIMA process with drift

Neural network time series modeling

In this research, the Bayesian methods for neural networks were used. These methods are called Bayesian regularization for feed-forward neural networks. The feedforward neural network was the first and simplest type of artificial neural network devised. In this network, the information moves in only one direction, forward, from the input nodes, through the hidden nodes (if any) and to the output nodes. There are no cycles or loops in the network.

The BRNN function fits a two layer neural network as described in MacKay (1992) and Foresee and Hagan (1997) [Foresee, Hagan, 1997; MacKay, 1992]. It uses the Nguyen and Widrow algorithm (1990) to assign initial weights and the Gauss-Newton algorithm to perform the optimization [Nguyen, 1990].

The BRNN model is presented in equation 5.

(5)

In equation 5, , is the number of neurons, is the weight of neuron, , is a bias for the neuron, , is the weight of the input to the net and is an activation of function, in this implementation .

The software will minimize , that presented in equation 6.

(6)

In equation 6, is the error sum of squares, is the sum of squares of network parameters (weights and biases), , , is a dispersion parameter for weights and biases.

Technical indicators

Technical indicators represent mathematical formula(s) which are applied to price time series data to produce another time series data. They can be divided into three groups: trend, momentum and volatility based indicators. Different variants of moving average are examples of trend indicators. Momentum indicators show the rate of change in price and commonly referred to as leading indicators. RSI and Stochastic Oscillator are examples of momentum indicators. Volatility based indicators are based on the rapid changes in volatility in price.

In our trading system, 8 technical indicators were used as the basis of trading rules. These technical indicators are: Simple Moving Average (SMA), Exponential Moving Average (EMA), Smoothed Moving Average (SMMA), Linear-Weighted Moving Average (LWMA), Momentum, Relative strength Index (RSI), Stochastic Oscillator (STCK%), Larry Williams' percent range (Williams % R). These indicators were chosen due to their effectiveness in previous research. N. Balsara, G. Chen used the dual moving average crossover rule, that based on the values of two moving averages with differing lag length [Balsara, Chen, Zheng, 2007]. L. Wang, H. An took six calculation methods to perform their trading rule; these calculations are: simple moving average (SMA), weighted moving average (WMA), exponential moving average (EMA), typical price moving average (TPMA), triangular moving average (TMA) and adaptive moving average (AMA) [Wang и др., 2015]. Technical rules of J. Patel's study are based on moving averages indicators and oscillators [Patel и др., 2015b]

Moving average

The moving average (MA) is simple technical analyses tool that smoothies out price data by creating a constantly updated average price. In this paper, we used 2, 6, 15 days intervals for chosen moving averages. Mathematical formulas of each indicator are presented in table 1. As a general guideline, if the price is above the moving average, then the trend is up. If the price is below a moving average the trend is down [Patel и др., 2015a].

Table 1. Selected technical indicators & their formulas*

Name of indicators

Formulas

Simple Moving average

Linear-Weighted moving average

Exponential Moving Average

Smoothed Moving Average

* is the closing price, is a period of smoothing,

Momentum

Momentum measures the rate of rise and fall of stock prices. Positive value of momentum indicates up trend and is represented by `+1' while negative value indicates down trend and is represented as `-1'. The formula is presented in equation 7.

(7)

In equation 7, is the closing price, is the period. Momentum is numerically equal to the profit that could be achieved by investing in a unit of this instrument for the period under review. In this paper, we use 2, 6, 15 days intervals for Momentum.

(8)

In equation 8,means upward price change while is the downward price change at time .

RSI is generally used for identifying the overbought and oversold points. The level of the RSI is a measure of the stock's recent trading strength. The slope of the RSI is directly proportional to the velocity of a change in the trend. The distance traveled by the RSI is proportional to the magnitude of the move. If the value of RSI exceeds 70 level, it means that the stock is overbought, so, it may go down in near future (indicating opinion `-1') and if the value of RSI goes below 30 level, it means that the stock is oversold, so, it may go up in near future (indicating opinion `+1') [Patel и др., 2015b].

(9)

In equation 9, is the closing price of period , is the highest price of chosen period, is the lowest price of chosen period.

Williams % R, or just % R, is a technical analysis oscillator showing the current closing price in relation to the high and low of the past days (for a given ). It was developed by a publisher and promoter of trading materials, Larry Williams. Its purpose is to tell whether a stock or commodity market is trading near the high or the low, or somewhere in between, of its recent trading range.

(10)

In equation 10, and lowest low and highest high in the last days, respectively.

Stochastic oscillator is a technical analysis indicator that shows the position of the current price relative to the price range over a given period in the past. Measured as a percentage. According to the interpretation of the author George Lane, the main idea is that when the price growth trend (rising trend) occurs, the closing price of the next timeframe tends to stay close to previous highs. With the trend of price reduction (falling trend), the closing price of the next timeframe tends to stay close to previous lows. In fact, the indicator shows the difference of the current period's closing price relative to the prices of the previous periods within the specified time period.

Random forests technique

The first random forest model was presented by Breiman in 2001. According to Breiman, random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest [Breiman, 2001]. Random forest modeling bases on the simple decision tree making and then the predicted results are averaged. In addition, each tree is created randomly.

For the tree, a random vector is generated, independent of the past random vectors but with the same distribution. Thus, a tree is grown using the training set and , resulting in a classifier , where is an input vector [Kumar, Thenmozhi,]. After a large number of trees is generated, they vote for the most popular class. This procedure is called random forests.

Given an ensemble of classifiers ,,…,, and with the training set drawn at random from the distribution of the random vector define the margin function as written in equation 11.

(11)

In equation 11, is the indicator function. The margin measures the extent to which the average number of votes at for the right class exceeds the average vote for any other class. The larger the margin, the more confidence in the classification. The generalization error is given by equation 12.

(12)

In Eq.8, the subscripts indicate that the probability is over the space.

In random forests, . As the number of trees increases, for almost surely all sequences converges to .

In summary, for random forest model, the large amount of decision trees is created, choosing the input subset of data randomly. The top of the tree defines a simple criterion by which the data divided into several parts. Next, for each part the new criterion is proposed, and as a result of the same iterations the tree is generated. To simplify the implementation, the recursion is often used. The criterions are found with the brute force, i.e. scientists try all possible ways of partitioning the data for each of the parameters and assess how the partition was successful. To assess the division of the data, method of information entropy and Gini coefficients are used. In the end, classification is determined by a majority vote for each case over the ensemble of classification trees.

2.2 Trading system

In this section, we present our trading systems, which use the predicted daily values from the fusion modeling, discussed in previous section. The first trading system uses only buy signals, why the second strategy involves only sell signals. Such kind of selecting can show us the profit, return and other evaluation metrics of each method separately, and help to estimate combination of these strategies, when we use both buy and sell signals.

Trading rules are the main blocks of our trading systems. A trading rule is simply a rule which is based on the values of indicators or predicted values. It generates buy and sell signals according to the steps defined in. A signal is a suggestion to open a position in the market. There are three types of signals: buy, sell and hold. A buy or sell signal is active which suggests to buy or sell while a hold signal is passive which means «do nothing». A trading rule may be straightforward such as comparing the indicator value with a limit value or may be complex such as looking fora special shaped pattern in the price [Ozturk, Toroslu, Fidan, 2016].

A trading rule generates buy/sell signals in different fashions called trading strategy. As it was already mentioned, in our research we use two trading rules: «always buy» and «always sell». In «always buy» strategy, a buy signal is followed by the predicted future growth, where the sell signal does when the recession is forecasted. Moreover, to estimate the effectiveness of our trading strategies, we will compare them with simple buy-and-hold.

In our simulation of trades, we can buy or sell only one index/stock and we cannot buy second index when the previous has not already sold. The initial cash is equal zero, although, to estimate strategies, we will not have any type of «stop signal», where the trades are stopped by the system independently when price is higher or lower some set point. In addition, we assume that we have an unlimited supply of money to buy an index/stock, that is, we do not have a limited budget.

Thus, all rules have the open day , where is a number of days, and a close day , when .

Thus, the profitability of the transaction to buy is presented in equation 13.

(13)

In equation 13, is the transaction cost to buy the stock. The cost of a single buy or sell transaction is assumed to be 0.05% (i.e. 0.005) - e.g. $5 for a transaction of volume $1,000 which is an intermediate value according to previous studies [Allen, Karjalainen, 1999; Wang, 2000; Wang и др., 2015].

The profitability of the transaction to sell is presented in Eq.14.

(14)

In equation 14, is the transaction cost to sell the stock.

To identify the buy or sell signal, we should compare the predicted value of the stock with the previous price. The difference between them for the next calculations is denoted as . Varying this parameter, we can make trading strategy to be oriented only on high growth/fall of the future price. The also marked the expected profit from each transaction. Thus, we can change the level of transaction costs to estimate the expected profit. We expect that the number of transactions will fall with the higher level of .

Based on 13 and 14 expressions above, it is obvious to describe profitability of buy-and-hold rule.

(15)

Since we have divided our trading strategies, next sub section briefly describes common features of each trading system.

Trading rule based on growth prediction buy signals

This trading strategy is based only on buy signals. As it was already mentioned, we can buy only one stock/index, thus, we will have the sequence of buy/close commands for trading. The buy signal is identified when the growth is predicted and we closed the previous deal, i.e. we are not in the market. When we are in the market and the recession is forecasted, then we should close our position and fix the profit. As for other situations, we should do nothing. For each transaction, we should calculate the cumulative profit and profit from this transaction. Moreover, after each transaction we should exit from the market.

The simulation and calculation of net profit of a trading rule is given Figure 1.

Figure 1. Algorithm of trading strategy with buy signals

Trading rule based on recession prediction sell signals

In contrast with previous strategy, this strategy uses only sell signals. In this case, as we can buy only one stock/index, thus, we will have the sequence of sell/close commands for trading. The stock/index should be borrowed. The sell signal is identified when the recession is predicted and we closed the previous deal, i.e. we are not in the market. When we are in the market and the growth is forecasted, then we should close our position and fix the profit. As for other situations, we should do nothing. For each transaction, we should calculate the cumulative profit and profit from this transaction. Moreover, after each transaction we should exit from the market.

The simulation and calculation of net profit of a trading rule is given Figure 2.

Figure 2. Algorithm of trading strategy with sell signals

3. Experimental results

3.1 Evaluation metrics

Evaluation metric for prediction ability

To evaluate the performance of the proposed predictive models, the MAPPE measure was used. The Mean Absolute Predicted Percentage Error (MAPPE) is applied to the time series, when the actual values are significantly greater than one, as we have in our research. To calculate it, firstly we should take the absolute deviation between the actual value and the forecast value. Then the total of the ratio of deviation value with its actual value is computed. The percentage of the average of this total ratio is the mean absolute predicted percentage error.

(16)

In equation 16, is the actual values and is the forecasted values. This createria measures the prediction abilities of the hybrid and combined models in the research papers of P. Pai and C. Lin, M. Khashei, M. Bijari [Khashei, Bijari, 2011; Pai, Lin, 2005b].

Evaluation metrics for trading strategies

Experimental results are obtained using 7 evaluation metrics. These metrics are:

· Number of trades (): the total number of trades, where 1 trade includes 2 transactions by buying or selling depending on buying or selling trading strategy respectively. It is a single number for each strategy.

· Profit from each transaction: the positive or negative values indicating how much money we get or lose from the making a deal. The indicator is used for informational purposes only and varying from deal to deal.

· Summarized profit (): the positive or negative value that shows the sum of profits from each transaction. It is a key indicator of trading strategies' profitability and is a single number for each strategy.

· Cumulative profit: the current amount of money trader has, in both cases of having an asset or selling it. The indicator is used for informational purposes only and varying from deal to deal.

· Returns ): the positive or negative values indicating the returns from each transaction. The indicator is used for informational purposes only and varying from deal to deal.

· Drawdown ):, the degree of exposure to this risk strategy. It is a key indicator of trading strategies' riskiness and is a single number for each strategy.

· Sharpe ratio (): , profitability relation to risk. Value greater than one indicates consistency of the strategy. There can be also a negative value; it means that trading strategy is unprofitable.

3.2 Data

Data collection

In our research, we used total six years of historical data from January 2009 to December 2015 of two oil and gas indices from Russian and American stock exchange respectively. Sectoral index MICEX O&G (MICEX oil and gas) is price weighted by market capitalization index of the liquid stocks of Russian issuers admitted to trading on the MICEX Stock Exchange. The Dow Jones U.S. Oil and Gas index measures the performance of the energy sector of the U.S. equity market. The index is one of ten indices that together make up the Dow Jones U.S. Index, which represents approximately 95% of U.S. market capitalization.

The historical data refers to the daily observations of variables, with 3871 observations in total. All data was obtained from http://www.finam.ru/.

Pre-processing

The set of samples had some missing data due to the daily periodicity of the series. For example, the series of the Dow Jones Index, which represents the average profitability of a stock portfolio of the New York Stock Exchange (NYSE), due to the Brazilian holidays or other accidents can cause missing values for such days.

In addition, Russian data has missing values to due to the holidays on which there was no trading.

Establishing the training and tests sets

We divided each of sample data into three periods: one training and two testing for trading strategy. For training set, the fusion model was built, using random forests, Bayesian regularization for feed-forward neural networks and autoregressive integrated moving average model. The first testing set was used by constructed trading strategies in order to evaluate the returns, profits and other evaluation metrics depending on parameters of the strategies. In this stage, we chose the most accurate and adequate parameters for our strategies. Next, the second testing sample was needed to estimate the best trading strategies chosen from testing set one. Thus, we can declare that choosing the most accurate trading strategy in present, we can gain profits in the future.

As we can see from the Figure 4, the training dataset and first testing for Russian data are quite similar, without any sharp declines or raises. However, the second testing period has impressive increases and changing trends that can be difficult to predict, having training set on more stable data.

As for DJIAO&G, we can see its deviation and different behavior during whole periods. In training period, the graph has sharp upward and downward trends. In testing period one, the graph rises, although in second testing period it has a spot fall.

Figure 3. Samples design of the data: original DJIAO&G index data during the training and two testing periods

Figure 4. Samples design of the data: original MICEX O&G index data during the training and two testing periods

3.3 Individual ARIMA and ANN predictions

In our research, we firstly constructed individual autoregressive integrated moving average model and Bayesian regularization for feed-forward neural networks. The predictions are made for period from January 2009 to December 2011. To evaluate the prediction ability of proposed approaches, we estimated models' specifications with mean absolute predicted percent error for each year of our data frame. The proposed parameters of techniques are presented in Table 2.

Table 2. ARIMA and BRNN parameters for first stage of modeling

Techniques	Parameters	The total number of specifications, proposed for each time period
ARIMA	N	36
BRNN	Neurons=2 N	36

In our research, for ARIMA specifications, we used the log data with the parameter equals one. Moreover, it was appealing for us to choose the appropriate and parameters in order to achieve the most accurate results. To solve this problem, we developed several ARIMA specifications with varying and from 0 to 2. Next, we compared their predictive ability with mean absolute predicted percent error and selected the most adequate models.

Constructing ANN model, we chose the BRNN technique as the most appropriate. The BRNNs were used because of its advantages: the models are robust and validation process is unnecessary. Thus, the models can optimize the network architecture. Moreover, it is difficult to overfit the BRANNs models, because they calculates or train on the values with effective network weights, excluding values that are not appropriate. For BRNN models, we varied time lags from one to nine.

As a result, we had totally 72 specifications with different parameters that describe price trends and forecast future values. We compared both techniques (ARIMA and ANN) with MAPPE criterion and the results are shown in Table 3 and 4 for both indices.

Table 3. Comparison forecasting ability of two individual models (ARIMA, ANN) for MICEX O&G Index.

Period	Best ARIMA details	Best ANN details
	N	p	q	MAPPE, %*	N	Lags	MAPPE, %*
Overall	730	2	0	1, 04	730	7	1,06
2011	183	0	2	1,36	183	3	1,38
2012	183	1	2	0,91	730	5	0,9
2013	91	2	1	0,76	395	5	0,77
2014	91	1	2	1,05	730	7	1,08
2015	91	0	0	1,14	730	7	1,16

* Minimal MAPPE is shown in bold

Table 4. Comparison forecasting ability of two individual models (ARIMA, ANN) for DJIAO&G index

Period	Best ARIMA details	Best ANN details
	N	p	q	MAPPE, %	N	Lags	MAPPE, %
Overall	183	0	2	0,93	730	7	0,9
2011	91	0	0	1,02	395	5	1,03
2012	730	1	1	0,54	730	7	0,6
2013	183	0	2	0,62	395	5	0,66
2014	183	0	2	0,7	395	5	0,74
2015	91	0	0	1,37	730	7	01,39

* Min MAPPE is shown in bold

The results given in Table 3 shows, that for the majority of periods, the most appropriate specification is ARIMA with . However, for overall period the most adequate specification is ARIMA with Thus, we expected that random forest would produce the most accurate results using predictions from ARIMA modeling.

The results given in Table 4 shows, that for the DJIAO&G index the most appropriate model is ARIMA with . However, for overall period the most adequate model is ANN with Thus, we expected that random forest would produce the most accurate results using predictions from ANN modeling.

Moreover, we should mark that the results for American Dow Jones U.S. Oil and Gas index are more accurate than for MICEX O&G. We suggest that such situation appeared because American market is more developed and its trends are more predictable.

3.4 Technical indicators

For next step of prediction modeling, the technical indicators were applied to close prices of chosen data. In our trading system, 8 technical indicators are used as the basis of trading rules. These technical indicators are: Simple Moving Average (SMA), Exponential Moving Average (EMA), Smoothed Moving Average (SMMA), Linear-Weighted Moving Average (LWMA), Momentum, Relative strength Index (RSI), Stochastic Oscillator (STCK%), Larry Williams' percent range (Williams % R).

Summary statistics of proposed indicators are presented in Table 5 and Table 6.

Table 5. Summary statistics of proposed indicators for MICEX O&G

Indicator	Max	Min	Standart deviation	Mean
SMA (2)	4813,36	2545,8	503,9754231	3440,934
SMA (6)	4768,147	2631,043	500,9364509	3439,575
SMA (15)	4677,71	2673,947	493,3534057	3435,894
EMA (2)	4829,951	2559,862	503,6616772	3440,925
EMA (6)	4751,448	2630,633	499,7554336	3439,446
EMA (15)	4689,555	2698,925	490,9986519	3435,996
SMMA (2)	4799,886	2582,871	501,6020066	3439,212
SMMA (6)	4715,544	2674,29	494,1119954	3436,408
SMMA (15)	4624,928	2757,856	477,0931273	3427,732
LWMA (2)	4813,386	2545,785	503,9733351	3440,939
LWMA (6)	4768,156	2630,901	500,9315665	3439,602
LWMA (15)	4677,939	2674,278	493,3728669	3435,972
Momentum (2)	174,77	-279,75	47,89696525	1,133825
Momentum (6)	453,56	-425,67	107,5264889	5,771167
Momentum (15)	760,44	-567,48	175,9934072	17,52037
RSI	100	1,42	23,08312745	67,09311
STCK%	79,53496	20,61886	11,24520042	52,11274
Williams % R	-0,86	-100	23,08312745	-32,9069

Table 6. Summary statistics of proposed indicators for DJIAO&G index

Indicator	Max	Min	Standart deviation	Mean
SMA (2)	402,5905	221,7283	35,52421	315,6674
SMA (6)	399,7714	227,6863	35,31011	315,744
SMA (15)	399,2889	232,4102	34,90493	315,9172
EMA (2)	402,672	222,7967	35,49866	315,667
EMA (6)	402,672	222,7967	35,48059	315,6827
EMA (15)	402,672	222,7967	35,64217	315,5547
SMMA (2)	401,9721	225,2145	35,41302	315,7001
SMMA (6)	398,7442	229,8633	34,86736	315,8954
SMMA (15)	396,2543	235,0526	33,7613	316,3187
LWMA (2)	402,8175	221,3331	35,53916	315,6558
LWMA (6)	400,9467	226,9467	35,38129	315,6907
LWMA (15)	399,1878	230,0389	35,09471	315,7832
Momentum (2)	15,40952	-24,7238	3,735048	-0,06959
Momentum (6)	31,7381	-52,2	8,652133	-0,31885
Momentum (15)	41,7	-75,5333	13,74656	-0,8128
RSI	100	2,5	28,3568	57,10238
STCK%	97,79	4,08	57,75	60,5
Williams % R	0	-100	28,3568	-42,8976

3.5 Techniques fusion approach

The first stage of the fusion modeling is to employ ARIMA and ANN models individually. Next, we assessed their prediction ability with mean absolute percentage prediction error and showed the results in tables X and X. Next, we build 8 technical indicators, these technical indicators are: Simple Moving Average (SMA), Exponential Moving Average (EMA), Smoothed Moving Average (SMMA), Linear-Weighted Moving Average (LWMA), Momentum, Relative strength Index (RSI), Stochastic Oscillator (STCK%), Larry Williams' percent range (Williams % R). Technical indicators improve the quality of future forecasts.

The second stage of fusion approach is to apply random forest to received predictions from proposed models. As a result, after applying random forest, we get new predictions that can be compared with individual models. As for input data, we put into modelling different combinations of best individual predictions and technical indicators to achieve the most accurate results. Tables 7 and 8 illustrates results only for minimal MAPPE for each period. Results with all input data can be seen in Appendices A and B.

Table 7. Minimal MAPPE for different random forests models depending on input data sets for Russian MICEX O&G Index.

Input data

MAPPE*, %

training period

MAPPE*, %

testing period 1

MAPPE*, %

testing period 2

ARIMA, N=91

0,560825

0,876905

1,174283

ARIMA, N=91

SMA

Momentum

EMA

LWMA

1,767366

0,089222

0,131376

ANN, Lags=5,7

EMA

SMMA

LWMA

Momentum

1,780704

0,058621

0,300814

* Minimum MAPPE for each period is shown in bold

Table 8. Minimal MAPPE for different random forests models depending on input data sets for DJIAO&G index

Input data

MAPPE*, %

training period

MAPPE*, %

testing period 1

MAPPE*, %

testing period 2

ARIMA, N=91

0,621852

0,646168

1,134526

ARIMA, N=91

SMMA

EMA

LWMA

0,999542

0,045041

0,128398

ANN, Lags=5,7

EMA

SMMA

LWMA

Momentum

RSI

Williams % R

STCK%

1,011557

0,063835

0,107372

* Minimum MAPPE for each period is shown in bold

Based on Tables 7 and 8, we can observe that fusion modeling greatly improved the accuracy of forecasts, made be less, than 1%. This is a very significant result, comparing with similar research in the field of forecasting. To represent the comparison, we built Tables 9 and 10.

Table 9. Comparison of predictive ability among individual and fusion modeling, MICEX O&G

Data

Optimal MAPPE of ARIMA

Optimal MAPPE of ANN

Optimal MAPPE of fusion model

Training sample (2011-2013 years)

0,9

0,93

0,560825

Testing sample 1 (2013-2014)

0,76

0,77

0,58621

Testing sample 2 (end 2014-end 2015)

1,14

1,16

0,131376

Table 10. Comparison of predictive ability among individual and fusion modeling, DJIAO&G index

Data

Optimal MAPPE of ARIMA

Optimal MAPPE of ANN

Optimal MAPPE of fusion model

Training sample (2011-2013 years)

1,02

0,9

0,621852

Testing sample 1 (2013-2014)

0,54

0,66

0,045041

Testing sample 2 (end 2014-end 2015)

Страница:

1
2

курсовая работа "The stock market price forecasting based on combined methods of machine learning" скачать

Подобные документы

Financial markets and trading strategies
The behavior of traders on financial markets. Rules used by traders to determine their trading policies. A computer model of the stock exchange. The basic idea and key definitions. A program realization of that model. Current and expected results.

реферат [36,7 K], добавлен 14.02.2016

Trading algorithms in financial markets
Description of exchange stocks as financial point-of-sale platforms. Description of point-of-sale algorithm of broker trade at the financial market. Parameters of price gaps on financial auctions and optimization of currency point-of-sale algorithms.

контрольная работа [1011,9 K], добавлен 14.02.2016

Banking system and its development in the period of transition to the market
Commercial banks as the main segment market economy. Principles and functions of commercial banks. Legal framework of commercial operation banks. The term "banking risks". Analysis of risks and methods of their regulation. Methods of risk management.

дипломная работа [95,2 K], добавлен 19.01.2014

International marketing and price: grey markets
The grey market is an over-the-counter market where dealers may execute orders for preferred customers as well as provide support for a new issue before it is actually issued. Sometimes, "dark markets” are referred to as a third type of grey market.

презентация [2,1 M], добавлен 18.12.2013

Foreign exchange market in Russia
Main segments of the financial market: investment, loan, stock, insurance, foreign exchange markets. Top 10 currency traders of overall volume. Internationalization of the national currency. The ratio of US Dollar and Euro against ruble in 2009-2012.

доклад [115,0 K], добавлен 14.12.2013

Presentation to Creditors
Financial position of the "BTA Bank", prospects, business strategy, management plans and objectives. Forward-looking statements, risks, uncertainties and other factors that may cause actual results of operations; strategy and business environment.

презентация [510,7 K], добавлен 17.02.2013

E-commerce. Payment systems
History of the online payment systems. Payment service providers. Online bill payments and bank transefrs. Pros and cons for using online payment systems. Card Holder Based On Biometrics. Theft in online payment system. Online banking services, risk.

реферат [37,2 K], добавлен 26.05.2014

The activity of Islamic banking system
History of introduction of a modern banking system to the Muslim countries, features of their development and functioning in today's market economy. Perspectives of future development of Islamic banking in the world and in the Republic of Kazakhstan.

курсовая работа [1,3 M], добавлен 19.04.2012

E-Banking in Kazakhstan
The history of the development of Internet banking in Kazakhstan and abroad. Analysis of the problems faced by banks in the development of this technology. Description of statistical of its use and the dynamics of change. Security practices for users.

презентация [1,3 M], добавлен 24.05.2016

Long-term loans
Theoretical basis of long-term loans: concept, types. Characteristics of the branch of Sberbank of Russia. Terms and conditions of lending to households in Sberbank of Russia. Financing of investment projects. Risk - the main problem in the credit market.

реферат [28,0 K], добавлен 17.09.2013

Другие документы, подобные "The stock market price forecasting based on combined methods of machine learning"

главная

рубрики

по алфавиту

вернуться в начало страницы

вернуться к началу текста

вернуться к подобным работам

Рубрики

По алфавиту

Закачать файл

весь список подобных работ

скачать работу можно здесь

Работы в архивах красиво оформлены согласно требованиям ВУЗов и содержат рисунки, диаграммы, формулы и т.д.
PPT, PPTX и PDF-файлы представлены только в архивах.
Рекомендуем скачать работу.

The stock market price forecasting based on combined methods of machine learning

Analysis of the problems of developing adequate trading strategy based on predicting the future values of stocks and indices, using linear and nonlinear models ARIMA, ANN. Assessing the impact of the merger on the method of projections for both indices.

Отправить свою хорошую работу в базу знаний просто. Используйте форму, расположенную ниже

Introduction

1. Literature review

2. Methodology

2.1 Time series forecasting technique

The auto-regressive integrated moving average models

Neural network time series modeling

Technical indicators

Moving average

Momentum

Random forests technique

2.2 Trading system

Trading rule based on growth prediction buy signals

Trading rule based on recession prediction sell signals

3. Experimental results

3.1 Evaluation metrics

Evaluation metric for prediction ability

Evaluation metrics for trading strategies

3.2 Data

Data collection

Pre-processing

Establishing the training and tests sets

3.4 Technical indicators

3.5 Techniques fusion approach

Подобные документы

Name of indicators	Formulas
Simple Moving average
Linear-Weighted moving average
Exponential Moving Average
Smoothed Moving Average

Input data	MAPPE*, % training period	MAPPE*, % testing period 1	MAPPE*, % testing period 2
ARIMA, N=91	0,560825	0,876905	1,174283
ARIMA, N=91 SMA Momentum EMA LWMA	1,767366	0,089222	0,131376
ANN, Lags=5,7 EMA SMMA LWMA Momentum	1,780704	0,058621	0,300814

Data	Optimal MAPPE of ARIMA	Optimal MAPPE of ANN	Optimal MAPPE of fusion model
Training sample (2011-2013 years)	0,9	0,93	0,560825
Testing sample 1 (2013-2014)	0,76	0,77	0,58621
Testing sample 2 (end 2014-end 2015)	1,14	1,16	0,131376