Identifying non-price factors affecting beer products sales in Russia, assessment of their influence, analysis
Research of the mechanisms of regulation of the Russian beer market. The influence of weather conditions on the volume of beer sales in Russia; conditions of beer sales regression, mechanisms of weather factors action on sales in different regions.
Рубрика | Маркетинг, реклама и торговля |
Вид | дипломная работа |
Язык | английский |
Дата добавления | 23.08.2020 |
Размер файла | 2,1 M |
Отправить свою хорошую работу в базу знаний просто. Используйте форму, расположенную ниже
Студенты, аспиранты, молодые ученые, использующие базу знаний в своей учебе и работе, будут вам очень благодарны.
Another stream of research studies study unexpected deviations from seasonal patterns and explore deviations in daily temperature. Bertrand et al. (2015) show that seasons have different exposure to temperature anomalies, and men, women, and kid's clothing show different results in response to the same weather risk. They develop a method to assess weather-related risk in sales and show how managers can use weather derivatives to hedge against such risk. In a similar setting, Changhui et al. (2011) find that apparel companies can minimize the cost that is associated with weather risks by studying consumer behavior. Through a demand prediction model, they calculate a cost minimizing demand level. Babongo et al. (2018) analyze how apparel demand forecasts for the upcoming season could be made more accurate by taking into account the weather of the previous sales season using an extensive data set for winter sports equipment.
To sum up, the authors investigated the effect of weather conditions, as independent variable, on: in general human behavior; consumer decision making and consumption; the consumption of various types of products. On the basis of observed studies, it is possible to identify the following factors of weather conditions that turned out to be significantly influence:
- Sunlight, temperature, and air quality (Tianet al., 2018)
- Temperature (minimum, maximum and average), rain fall, snow fall, dry bulb, which is a measure of air temperature measured by a thermometer freely exposed to the air but shielded from radiation and moisture (minimum, maximum and average), humidity (minimum, maximum and average), wind direction, wind speed (minimum, maximum and average), barometric pressure (minimum, maximum and average) and sunlight. (Murrayet al., 2010).
-Temperature, barometric pressure, sunshine, wind speed, rain and humidity (Drapalet al., 2019)
- Maximum temperature in degrees Celsius rainfall in millimeters, sunshine in hours (or proportions of) relative humidity as a percentage (Parsons et al., 2001).
- Weather data on average daily temperature in °C (temp), daily rainfall in mm/m2 (rain) and number of daily sunshine hours (sun) (Stulecet al., 2019)
- Temperature (Bahngand Kincade, 2012).
1.5 Impact of the weather on beer consumption
Not many authors investigated the effect of weather on the consumption of beer. Although there are several studies that prove that the weather affects beer sales. No previous study has investigated the connection between weather condition and sales of beer products in Russian market.
Koksalanet al. (1999) similarly analyzed the data from one of the leader of the Turkish beer market to develop a medium-term model as well as a short-term model for understanding the factors affecting beer demand and for forecasting beer demand in Turkey. Researches used a residual modeling regression approach using statistical process control which directly consists of lurking variables that explain anomalies in demand, and , using indicator variables, integrate these lurking variables into the model. Both the medium and short-term models very satisfactory results and are currently being used by the company. Unlike other studies the authors conducted that the climate-related index turned out not to have a significant on beer demand. Only the variation in the temperature showed a small negative that was significant at the 0.08 level. In the interpretation of the results authers argued that although climate may be an important variable, it would be very hard to detect it's in a model that uses aggregate yearly data.
A recent study by Kubista (2018) involved analysis of weather effect on sales in the Czech FMCG market. Researcher introduces a novel approach to analysis, using tree-based machine learning algorithms. These flexible non-parametric methods can estimate complex relationships as well as performing an automatic variable selection. During this study extensive dataset consisting of weekly category sales of over 1000 stores in the Czech Republic for years 2015 to 2017 was collected, analyzed and coupled with various meteorological variables for over 80 different weather stations. Results of this research show that the weather variables are a very significant predictor of the sales, the accuracy of this model is solid, explaining almost 60% of the variance and predicting with the average error of 12%. If the mean daily temperature rises above 14?C for at least two days, the beer sales grow by 16%. If the maximum temperature lasts above 29?C for at least 5 days, the sales move up by additional 13%. No previous study has investigated the connection between weather condition and sales of beer products in Russian market.
To sum up , it can be said that the consumer is really affected by the context in which the consumption occurs. The consumer makes his choice when buying, not only on the basis of the price of the product, but also the context influences him. Based on the literature review, we found that : companies feel a lack of their models for sales predicting; weather conditions have a strong influence on consumer choice; the consumer makes a purchase not only guided by the rational choice of the best price offer, but also being under the influence of various contexts, one of which is weather conditions; sales of beer products depend on weather conditions in at least two markets (Czech and Turkish).
To conclude, on the basis of observed literature, in this paper 7 hypotheses are formulated:
1. Weather condition variables influence on volume of beer sales in Russia
2. Such weather variables as air temperature, atmospheric pressure, humidity, wind speed significantly influence on sales of beer in Russia.
3. An increase in air temperature have a positive effect on the increase in volume of beer sales.
4. A decrease in humidity level have a positive effect on the increase in volume of beer sales.
5. A decrease of atmospheric pressure have a positive effect on the increase in volume of beer sales..
6. A decrease of wind speed have a negative effect on the increase in volume of beer sales.
7. The influence of weather conditions on beer sales in different regions of the country varies
2.Methodology
2.1 Research design
The methodology, which will be used in the study, is based on the objectives of this research. It was decided that the most effective method to investigate the relation between the weather condition and volume of beer sales is quantitative. Using this method huge amount of data was analyzed, qualitative analysis is not applicable in this situation. Furthermore, to make a survey to analyze the customer's preferences according to the weather conditions in 24 cities in Russia for three years, which will have nonbiased results extremely hard to accomplish. On the other hand analyzing consumer choices by making questionnaires could bring more deeply understanding the stimulus or motivation of customers. The study was made in order to gain insights into relation between volume of beer sales and the weather conditions. The central model in this research is regression analysis. Regression analysis is a statistical method of investigating the effect of one or more independent variables on a dependent variable. Independent variables are also called repressor or predictors, and the dependent variables and criterion. Making a statistical model we took into account that the compliance of the model and using the bigger number of variables leads to the greater significance. Therefore it was decided to make model more simple in order to make it more clear for interpretation and practical usage.
2.2 Data collection
In research it was used panel data, which consists information about different objects in different periods of time. Database include information about beer sales of 604 different SKU (stock keeping unit), 78 brands in 24 cities and weather conditions in 24 cities daily during three years from 2017 to 2019.
First step in working with data was collecting. The considered database consists of two main parts: data about beer sales and weather statistics, which were collected from different sources by different analytics tool.
The information about sales was provided by beer production company statistics. For getting information was written select query in sql language for internal corporate data warehouse. The main functions, which were used for making query was: Union all, Case, Left join, On, Where. This FMCG company operates in Russia and take place of one of the leaders in the market. To be more clear for beer sales in this research we took the volume that was delivered from brewery FMCG Company to its customers in Traditional trade. Traditional trade is the FMCG definition, which means physical retail stores operated by owners, having a limited inventory, and targeting a specific niche of products. For our research it is crucial that the sales are in this channel for several reasons. First, traditional trade has limited warehouse and make orders to the brewery company three- four times in a week, on the contrary in the modern trade, huge supermarkets and network shops, has their own distribution centers and their orders are more constant and it has no so much volatile. Consequently, without knowing the information about the size of normative stocks, actual stocks and logistics distances to the shops we could not reveal the volume of actual demand of specific shop. It means that by analyzing shipments to the company's retail customers we could not understand the real demand, so these data is not suitable for these kind of research. Second, traditional trade has a lot of nonrelated small organizations with low summary volume of sales, so they have less power on the market than modern trade retailers, which always try to buy products with the lower price. Therefore, on the one hand, the forecasting process for the traditional trade is more complex because of volatility of order volume, on the other their sales are not so connected with the price, but still related.
The list of cities was chosen by the biggest volume of sales in the country. The number of cities, which we choose to investigate: 24. Lists of cities: Barnaul, Vladivostok, Volgograd, Voronezh, Ekaterinburg, Kazan, Krasnodar, Krasnoyarsk, Lipetsk, Moskva, Nizhny Novgorod, Novosibirsk, Perm, Rostov, Samara, Saint-Petersburg, Saratov, Sevastopol, Stavropol, Tula, Tumen, Ufa, Khabarovsk, Yaroslavl.
Database was decided to divide on the basis of the region. First idea of characteristic to divide dataset was the volume of sales using clustering. In appendix 1 we illustrated the distribution of volume sales due to the city. So it shows that the biggest volume have large cities, where production lines is situated. This clustering do not bring meaning for the measurement of weather condition. Therefore, as the weather variable is the central to discuss, we have to deny this idea of clustering. Consequently, it was decided to make subsets for the each region. Besides the reasons, which were described earlier, this regional division has another list of reasons. From the one side, every region has their own local brands and SKU's, marketing strategies and their own departments of planning and transportation. Secondly, the majority of the products are produced in regional plants. Thirdly, the amount of data could not be calculated in one single model because of the size and hardware capacities. The number of regions is five: Moscow Center, North West, South, Ural, Siberia and Far East. Moscow Center includes cities: Voronezh, Lipetsk, Moskva, Nizhny Novgorod, Tula, North West: Sant-Petersburg, Yaroslavl, Siberia and Far East Barnaul: Vladivostok, Krasnoyarsk, Novosibirsk, Khabarovsk, Ural: Ekaterinburg, Kazan, Perm, Samara, Tumen, Ufa, South: Volgograd, Krasnodar, Rostov, Saratov, Sevastopol, Stavropol. The data was not summarized by regions, the aggregation is still on City level. According to the weather data, the region group has similar climate zone.
For the research it was collected the data about sales of 604 different SKU (Stock keeping units). SKU is widespread abbreviation in logistics and operational processes in a FMCG and retail markets. SKU is an item ID (article), inventory accounting unit, warehouse number used in trade to track statistics on goods/services sold. Each item that is sold, whether it is a product, a product variant, a set of products (sold together), a service, or a fee, is assigned its own SKU. SKU is convenient when you need to track sales statistics for a particular product, compare sales of different product variants. SKU's has different characteristics and can be representative for the whole market, in more detail it will be described later. Database includes information about sales in volume of dekaliter of beer and sales in in rubles, financial variable for three years from 2017 to 2020.
As it was mentioned earlier, collected database includes the information about product specific characteristics as type of container, type of product, volume of tare volume of container, strength of drink. Such an extensive and detailed dataset offers a unique opportunity to analyze the overall weather affects across several types of products. In more detail the data consists such groups as:
Type of product: lager, dark, alcohol free and etc.
Type of tare: bottle, pet, can, kegs, pet keg
Volume of tare: 0,33 l; 0,44; 0,45; 0,47; 0,5 and etc.
Different brands: Carlsberg, Kronenbourg and etc.
Color of tare: blue, green, yellow and etc.
Alcohol: number of % of alcohol in the drink: 0,5;4,5;4,6;4,7; 4,8; 5 ; 5,3 ; 5,4 ; 5,5; 5,6 ;6,5 ;7 ;8 ;10.
The collected data base from FMCG company has information about sales in dekaliter and in rubles for SKU in particular date in particular city. For the further analysis was calculated parameters as price per deciliter, which was calculated as a deviation sales in rubles and sales in dekaliter. Than for identifying the discount was made special query, where was found the maximum price for the SKU in city in each year. We decided to take a year period to specify a baseline for calculation the discount, because company refresh their price strategy yearly and the reconsidering the price and promo limits also processed yearly on basis of inflation and other macro and micro economics factors. Than was merged the result of this calculations, maximum price per dekaliter, to the considered database and calculate discount as residual from maximum price per dekaliter and observed in a specific row. All of this data was processed with help of different R packages: «dplyr», «magrittr», «lubridate», «fastDummies».
In conclusion of beer sales data, the final database consist of the following list of variables: data, year, city, sku, volume of sales in delaliter, volume of sales in rubles , type of product, type of product, volume of tare, different brands, colour of tare, alcohol content in percent, price per dekaliter in rubles, maximum price per dekaliter during the year in rubles, discount in rubles, discount in percent.
In total it is almost 2,7 million data points for the sales. The database has aggregation for city, SKU and day/week. The data is balanced as it has information for each object in different periods of time. The data do not have heterogeneity, on the first sight, because the volume of sales is measured by dekaliter and rubles. On the other hand, the data in different cities could be different because of specific macroeconomic situations.
The results, which we will have in this research, could be replicated for the all beer market in Russia. The main reason is that collected data includes key product features and wide geographic scope and information for a sufficiently large amount of time: 3 years and 3 months of sales, and for the 24 cities, which are different size and culture oriented, for 604 SKUs, which are in different price segments, alcohol contents, types of tare and etc.
The second part of the resulted data base in information about the weather conditions. Weather data was collected from open source. The weather data is developed and accompanied by the company "weather Schedule", St. Petersburg, Russia, since 2004. The company is licensed to operate in the field of Hydrometeorology and related fields.
The company provides information about the actual weather observed at ground stations. Information about the actual weather comes from the international data exchange server, NOAA, USA, in SYNOP and METAR formats.
The observations in SYNOP format are received on the site eight times a day, every three hours: 0:30, 3:30, 6:30, 9:30, 12:30, 15:30, 18:30 and 21:30 UTC. The observations in the METAR format come to the site once or twice an hour, about 10 minutes after observing the weather at the weather station. Observations made at 10'400 SYNOP weather stations, 5'400 METAR weather stations and 250 coastal stations (format KN-02).
The weather variables, which was collected for analysis the relationships between weather and sales:
T: air temperature (degrees Celsius) at a height of 2 meters above the ground;
P0: atmospheric pressure at station level (millimeters of mercury);
P: atmospheric pressure at sea level (millimeters of mercury),
U: relative humidity (%) at a height of 2 meters above the ground;
FF: wind speed at an altitude of 10-12 meters above the earth's surface, averaged over a 10-minute period immediately preceding the observation period (meters per second);
Td: Dew point temperature (degrees Celsius).
Further, these parameters are named following way: T: air temperature (degrees Celsius): Weather. T, P0 atmospheric pressure at station level: Weather. P0, P: atmospheric pressure at sea level: Weather. P, U: relative humidity: Weather. U, FF: wind speed: Weather. Ff, Td: Dew point temperature: Weather. Td.
Data was downloaded from the source for each particular city independently. Then all the downloaded excel files was transformed to data frames in R system and joined in one data frame by the function «union all» package «dplyr».
The data was aggregated to the daily basis to have ability to be merged with the sales data using «dplyr» package functions. For reduction of the missing variable's features were calculated not only average weather characteristics, but minimum and the maximum. According to the observed literature sources it was necessary to do for making a model and identifying significant factors.
Furthermore, the weather data is balanced as it has information for each object in different periods of time. Conversely, database has empty data points. For coping with this issues we used a method of last observation carried forward. The consequences of these moderation do not have a risk to make a bias for the model, because the empty dates was verified and only 1 day in a month was empty. The following data treatment was made with help of R , «zoo» package and function «na.locf». Last observation carried forward (LOCF) is a method of imputing missing data in studies and maintaining the sample size.
Formation of the result database for conducting the model. Two databases, which contains information about beer sales and data about weather conditions, were merged by three parameters: city, sku and day. It is the key elements in both databases. The merge was made using R, package dplyr, function merge. The final database consists about 2,6 million data points and 35 variables: City, SKU, Data, weather variables, information about SKU, sales data, the glance dataset in Appendix 2.
As we mentioned earlier the results, which we will have in this research, could be replicated for the all beer market in Russia, because collected data represents information of the main products of industry, includes information of the main geographical markets and official observations of the weather conditions.
2.3 Data analyze
Multiple regression. According to aim of the research it was decided to use multiple regression model to analyze the connection between the variables. Since we have variety of the independent that describe weather conditions, we need to include all those that are significant in our model and exclude those that are not related to the dependent variable in order to have a more accurate model. Therefore, multiple regression was chosen as a central type of model for achievement of the main goal of the research.
Multiple regression is widely used in macroeconomic calculations and is one of the most common methods in econometrics. The main goal of multiple regression is to build a model with a large number of factors, while determining the impact of each of them individually, as well as their combined impact on the modeled indicator.
The multiple regression equation is given by:
y = a + b1 Ч 1 + b2 Ч 2 + … + bk Ч k,(1)
where x1, x2, xk are the k independent variables and y is the dependent variable.
A dependent variable (y) is a variable describing a process that researchers are trying to predict or understand. Explanatory variables (X) are factors used to model or predict the values of dependent variables. In a regression equation, they are located to the right of the equal sign and are often called explanatory variables. A dependent variable is a function of independent variables. Regression coefficients (в) are coefficients that are calculated as a result of regression analysis. Values are calculated for each independent variable that represent the strength and type of relationship of the independent variable with respect to the dependent variable (Wooldridge, J. M., & Joyner, E. 2013).
The result of the regression is the equation and other parameters, which allows researchers to evaluate the accuracy of the model. The R-squared value, also called the certainty measure, characterizes the quality of the obtained regression line. This quality is expressed by the degree of correspondence between the source data and the regression model (calculated data). The measure of R-squared value is within the interval from zero to one. If the value of the R-square is close to one, it means that the constructed model explains almost all the variability of the corresponding variables. On the contrary, if R-squared value is close to zero, it means that constructed model has poor quality. In our research we expect that R-squared value will be close to the 0,5-0,6, in case of the current models, which are used for the forecasting processes in FMCG has the forecast accuracy close to the 50%. It means that the existing internal model in FMCG sector could predict approximately only 50% of the volume correctly to the specific city for the specific SKU (Stock keeping units).
P-value is the value, which is used to test statistical hypotheses. In fact, this is the probability of an error in the rejection of the null hypothesis. Hypothesis testing using P-values is an alternative to the classical procedure of the test using the critical value of the distribution. Usually, the P-value is equal to the probability that a random variable with this distribution will take a value no less than the actual value of the test statistics. Usually, the p-value is compared with the generally accepted standard significance levels of 0.05 or 0.01.
Before making a regression model it is necessary to go through several steps to test our data. List of requirements for data includes different aspects. First of all, variables must have a distribution close to normal. The dependent and independent variables must be measured on a metric scale. To construct linear regressions, the dependent and independent variables must have a linear relationship. To avoid multicollinearity we made the matrix of paired correlation coefficients is evaluated. Multicollinearity is a concept that is used to describe a problem where a loose linear relationship between explanatory variables leads to unreliable regression estimates. Of course, this dependence does not necessarily give unsatisfactory ratings. If all other conditions are favorable, i.e. the number of observations and sample variances of the explanatory variables are large, and the variance of the random term is small, then in the end you can get quite good estimates. As a result of the assessment it was witnessed obviously that such variables as discount in rubles and discount in percent are correlated, so this variables will not be used in one model. Therefore, we sorted out the options implementing the variables one by one and calculating the model separately. The correlation between discount in rubles and discount in percent is 0,93.
Table 1 Correlation matrix analysis of sales data
==================================================================================
Sales_vol Sales_ru PricePerDecalitre PricePerDecalitre_max discount discount_procent
----------------------------------------------------------------------------------------------------------
Sales_vol 1 0.940 -0.170 -0.170 -0.030 0.010
Sales_ru 0.940 1 -0.010 -0.010 -0.010 0
PricePerDecalitre -0.170 -0.010 1 0.990 0.160 -0.040
PricePerDecalitre_max -0.170 -0.010 0.990 1 0.270 0.070
discount -0.030 -0.010 0.160 0.270 1 0.930
discount_procent 0.010 0 -0.040 0.070 0.930 1
As it shown in the table 1 the high correlation rate between the pair of parameters : sales volume in dekaliter and sales volume in rubles 0,94, price per dekalitre and maximum price per dekalitre 0.990. The correlation between variables: maximum price per dekaliter and volume of discount 0,27. Except these pairs of variables, the correlation coefficient between others is weak.
Table 2 Correlation matrix analysis of average weather data
==================================================================================
Weather.T_mean Weather.Po_mean Weather.P_mean Weather.U_mean Weather.Ff_mean Weather.Td_mean
-----------------------------------------------------------------------------------------------------------
Weather.T_mean 1 -0.150 -0.250 -0.370 -0.070 0.870
Weather.Po_mean -0.150 1 0.560 -0.040 -0.250 -0.200
Weather.P_mean -0.250 0.560 1 -0.090 -0.100 -0.360
Weather.U_mean -0.370 -0.040 -0.090 1 -0.020 -0.020
Weather.Ff_mean -0.070 -0.250 -0.100 -0.020 1 -0.080
Weather.Td_mean 0.870 -0.200 -0.360 -0.020 -0.080 1
-----------------------------------------------------------------------------------------------------------
The table 2 illustrates the correlation coefficients for the weather parameters. The coefficient for average atmospheric pressure at station level and atmospheric pressure at sea level is high 0,56. Other weather factors has very low correlation coefficients, so it could be include in one model.
Before using multiple regression for the analysis we checked our data for suitability for this method. First, all numeric variables were tested for normality of distribution by graphical method with help of «ggplot» package in R. Identification outliers were made to clean data from the values, which have huge influence on the model, but not significant. It was decided to excluded data for 1-3 January in each year, because these days are outliers, which is clearly seen in Appendix 5. These particular days are outliers, because of non-working days and logistics limits: closed warehouses, transport companies on holidays. In addition, it was necessary to clean the data from the outliers, which was generated by the mistakes in the company's internal data base. Despite the fact that Company is huge, the culture of managing the data and control for its clearness are at initial step. So, it was decided to exclude the data of several SKU, which has more than 20% of discount. It is possible, that some SKU have actual huge value of discount, however it could be done by Company to sell products with a low remaining shelf life. This kind of the relations should be excluded from the model as well, because it has no connection with the consumer choice. In the end of the cutting the dataset, it was decreased on approximately 8%.
The multiple regression approach has a number of attractive features and drawbacks. Attractive features include: simplicity comparing to other methods, has many practical uses. This method is widely available and has been used in many investigational studies. Disadvantage: lack of accuracy in comparison to machine learning technics, and the list of assumptions, sensitivity to emissions.
According our data we have different groups of data: like different cities, years and products. After geographic analysis it was obvious that different groups have different trends and relation with the variables. So it was decided to include fixed effects in the model. One advantage of including fixed effects in the analysis is that it avoids the problem that connected with groups deviation.
In many applications including econometrics and biostatistics a fixed effects model refers to a regression model in which the group means are fixed (non-random) as opposed to a random effects model in which the group means are a random sample from a population. Generally, data can be grouped according to several observed factors. The group means could be modeled as fixed or random effects for each grouping. In a fixed effects model each group mean is a group-specific fixed quantity. Considering our data we choose three variables, which should be fixed: City, Year, SKU. First of all we took into account Cities, because different regions of Russia are in completely different climatic zones. The perception of temperature, precipitation and wind power is very different in the regions of the country. For instance, the air temperature of 20 degrees below zero in St. Petersburg, the center of Northern West region, feels very differently to the same air temperature as in the Krasnodar, city in the South region in Russia. Furthermore, the differences will be in the perception of this temperature relative to other weather indicators, in particular high humidity of St. Petersburg, which distorts the feeling of temperature, making it 4-5 degrees colder than it shows on the temperature. Also the difference will be in perception of the same temperature people from the point of view of its specificity for this region. For Ural region air temperature higher than 20 degrees rated as rare weather conditions and extremely high, for the South region the same air temperature regarded as normal. Furthermore, the cities and regions are situated on different height, so the atmosphere pressure is feels differently. In addition, the average humidity rate for the regions are diverse, so the changes feels differently.
In addition, cities have specific features, which could have effect on the beer consumption. As it was mentioned earlier cities has different macroeconomics factors as level of income, the unemployment rate and other similar factors. These factors has a strong connection with the consumer choice and preferences. Other factor is education level, in literature review it was noticed that education level related to the consumption of beer and other beverages. Finally, the factor, which has the strongest relation to the volume of beer sales is demographic. According to the federal low only citizens, who reached 18 years may buy and drink alcohol beverages. Moreover, a lot of marketing companies or departments investigate the consumer market and could say the target auditory for the specific product segment. Having demographic statistics researches are capable of forecasting the number of future consumers.
In addition, it was decided to include in the dataset the information about non-working days was gathered for the observed period of time. The value of this variable is forecasted as high in case that beer is a drink, that is positioned as a drink, which people share with friends, resting and having fun. So, it is assumed that the holidays and weekends , it is the time when the beer consumption is rising. This variable will be used as control, the research does not have the aim to estimate the connection of number of holidays and sales of beer.
For the making a model was decided to use R studio system and R language, because two major parts of the research were possible to do with help of R. First, it is data collection and transformation.
Multiple regression constructed with the help of the package Lfe[Gaure 2013] was chosen as a method of data analysis. This package was chosen as it includes statistical models, besides of the large number of multilevel fixed effects used in regression, which other packages are unable to pull. The inclusion of fixed effects in the regression was not accidental. It avoids getting incorrect results regarding the significance of any variable by finding an erroneous correlation (Gaure, S. 2013). Conducting a model was made with help of function «felm», it is preferable tool for lineal models with fixed effects. For illustration of the descriptive statistics and results of the model was chosen to use «stargazer» package. The «stargazer» package provides a way to create publication quality tables, and a way for researchers to avoid creating new tables each time they tweak their dataset. This package gives the basic understanding needed to create result tables for statistics and regression model.
Implementing the multiple regression with fixed effects allows us to make a model, which shows the connected between the dependent variable: sales if beer and several independent variable of weather conditions and in the same time take into account the difference of feeling the weather of people in different cities, different years trends and different consumption of SKUs, that allows to generalize the results of our research on bigger scope. As it was mentioned earlier it was decided to make separate models for each region for daily and weekly data.
Results
Descriptive data were generated for all observed variables. The results of the descriptive statistics of the numeric variables of the database are presented in table 3. For the more detailed understanding of the considered dataset it is represented the main features of the each variable.
Table 3 Descriptive statistics of observed database numeric variables
==================================================================================
Statistic N Mean St. Dev. Min Pctl(25) Pctl(75) Max
-------------------------------------------------------------------------------------------------
Sales_volum 2,660,513 109.240 174.811 0 7.0 126.9 1,000
Sales_rubl 2,660,513 84,884.180 139,142.800 10.684 6,003.900 98,874.900 1,677,885.000
PricePerDecalitre 2,660,513 851.983 282.677 81.915 641.600 985.333 2,499.011
PricePerDecalitre_max 2,660,513 872.803 289.625 81.915 660.300 1,013.333 2,739.091
discount 2,660,513 20.820 32.084 0.000 0.000 31.522 537.792
discount_procent 2,660,513 0.023 0.033 0.000 0.000 0.038 0.200
Weather.T_mean 2,660,513 6.487 11.659 -40.125 -1.708 16.354 34.867
Weather.Po_mean 2,660,513 748.933 10.780 705.933 742.925 756.131 784.525
Weather.P_mean 2,660,513 761.812 7.560 732.177 756.688 766.513 792.931
Weather.U_mean 2,660,513 71.499 15.698 19.250 61.000 83.750 100.000
Weather.Ff_mean 2,660,513 3.195 1.814 0.000 1.875 4.125 15.312
Weather.Td_mean 2,660,513 0.374 9.812 -43.875 -5.087 8.255 23.137
Weather.T_max 2,660,513 10.351 12.625 -37 0.6 21 39
Weather.Po_max 2,660,513 750.902 10.623 708.800 745.000 757.900 785.700
Weather.P_max 2,660,513 763.935 7.231 736.600 758.900 768.700 793.500
Weather.U_max 2,660,513 87.051 11.893 22 82 94 100
Weather.Ff_max 2,660,513 5.213 2.712 0 3 7 22
Weather.Td_max 2,660,513 2.886 9.666 -41 -3 11 28
Weather.T_min 2,660,513 2.503 11.030 -43 -4.2 11.2 29
Weather.Po_min 2,660,513 746.885 11.006 703.900 740.500 754.200 783.700
Weather.P_min 2,660,513 759.659 7.917 729.300 754.600 764.800 792.000
Weather.U_min 2,660,513 55.066 20.435 5 38 72 100
Weather.Ff_min 2,660,513 1.382 1.400 0 0 2 11
Weather.Td_min 2,660,513 -2.314 10.187 -47 -8 5.6 22
Holiday 2,660,513 0.051 0.220 0 0 0 1
Weekend 2,660,513 0.186 0.389 0 0 0 1
Day_Off 2,660,513 0.217 0.412 0 0 0 1
According to our database the dependent variable: volume of sales ranged from 0,009 to 1000 thousand of dekaliter, the average volume of beer sales is 109 dekaliter. In addition, the data demonstrate the evidence of the seasonality, from may to august sales are higher than in other months, it is illustrated in the diagrams in Appendix 3. These diagrams shows the daily volume of beer sales and the smooth line of the volume of sales, which makes the trend more obvious. As the aim of our study to find out the connection between volume of sales and weather conditions, the variable of sales volume in rubles was not used in a model, only for the calculation of the discount variables and prices.
Average air temperature ranged from -40,125 to 34,867, the mean of this parameter is 6,5 since we chose cities in the different parts of Russia, where temperature differ significantly. The same trend we could witness in statistics of average daily air temperature in Appendix 4. In table 4 we witness the difference of region on the example of average temperature.
Table 4 Temperature statistics per region
Region |
Weather.T_min |
Weather.T_mean |
Weather.T_max |
|
South |
3,82 |
11,21 |
18,20 |
|
MC |
1,47 |
7,02 |
12,03 |
|
NW |
1,14 |
6,06 |
9,46 |
|
UP |
1,83 |
4,06 |
9,83 |
|
SFE |
4,09 |
3,18 |
8,30 |
|
Average |
0,1 |
6,45 |
11,56 |
Atmospheric pressure in millimeters of mercury; on the station level ranges from 705,9 to 784,5 the mean of this variable is approximately 748. Atmospheric pressure in millimeters of mercury on sea level ranged from 732,17 to 792,93 the mean of this variable is approximately 761. Relative humidity ranges from 19,25 to 100%, the mean of this variable is 71,49. The wind speed ranges from 0 to 15,312, the mean of this variable is 3,19. The dew point temperature ranges from -43,875 to 23,137, the mean of this variable is 0,374. Price per dekaliter ranges from 81,915 to 2 499,0, the mean of this variable is 851,98 rub per dekaliter. Maximum price per dekaliter ranges from 81,915 to 2 739,0 , the mean of this variable is 872,80 rub per dekaliter. Discount variable ranges from 0 to 537, the mean of this variable is 20,82 rub per dekaliter. Discount in percent variable ranges from 0 to 20%, the mean of this variable is 2,3%. Variables: holiday, weekend and day off are dummies, if the day is weekend, it is 1, if it is working day, it is 0. These variables were used as control, because this parameter should be taken into account making a model.
Inclusion of variables in multiple regression without fixed effects did not bring significant results. Using stepwise approach we added variables one at a time to the set of independent variables until the changes are statistically insignificant. Even the best combination of variables shows the result of 0,20 R squared, which cannot be described as significant, because it means that only 20% of the dependent variable can be clarified by the weather factors.
Applying multiple regression with fixed effects we used stepwise approach as we did with the simple multiple regression. First of all, we implemented all the variables in the model and then removed one by one and compared the results of the models with one fixed variable. Then we added in a model fixed effect. First fixed variable, which we included in the model was parameter of city, where sales were made. Since this factor was suggested as the most significant in this data. As it was mentioned earlier that people feel weather differently in different regions because of the specific climate zones, closeness to the sea, oceans, rivers and other factors. The second variable, which we added as a fixed effect was parameter of year. It was decided to use this parameter because the graph of beer sales in Russia (Graph) shows that the sales by years have the same seasonality, but different trend, it could be caused be the changing of laws, changes of the politics of the company. The SKU parameter was the third variable which was insert in a model, and it gave us the best variation of a model. SKU was inserted in a model as a fixed effect because politics, promotion and changes of prices are similar within these group, but different from each other significantly. Adding more or other variables did not have positive effect on a resulting models. Cities, years and SKU were chosen as fixed variables for the final models.
During making models it was explored that the the R-squared value for the daily basis data is not so high. Consequently, it was decided to make additional models on weekly basis data. It is not on a contrary with a company processes as all operational, planning and forecasting processes are conducting on weekly basis. Moreover, most of the companies in FMCG sector build their forecasting model on weekly basis, because daily data has too high level of uncertainty and volatility. The data was grouped by the week number and year, all the sales data was aggregated by summarizing, all the weather data was calculated as mean, maximum or minimum depending on variable.
The central region for this particular Company, where the central office is situated, is North West. Moreover this region is the first and the last step for import and export production. So this region has better service level to the customers, because of short logistics distance.
The models equations with fixed effects of Year, SKU, City for North West region:
Sales Volume in North West region (daily data) = fixed effects +0,015*PricePerDecalitre + 1,051*Weather.T_mean +0,128*Weather.Po_maх -0,436*Weather.U_mах- 2,007*Weather.Ff_min -42,567*Day_Off |
(2) |
|
Sales Volume in North West region (weekly data) =fixed effects -1,014*discount + 7,103*Weather.T_mах + 11,79*Weather.Po_maх + 8,927*Weather.U_mах + 6,716*Weather.Ff_mах+183,475*Day_Off |
(3) |
The results of daily and weekly data basis regression models for beer sales volumes. Comparing daily and weekly data it is obvious by the results of the models that, the relation with non-working days is absolutely opposite. It could be explained by the fact that as we analyzing the data about sales to the shops from retailer-producer, on non-working days a lot of shops do not work on procurement process, because on this days usually only cashiers and other staff working, office employees do not work. On the other hand, when we aggregate data to the weekly basis we could catch the positive effect of holidays on volume of sales. This effect we witness in all of the observed regions.
Table 5 Regression results for North West region
Furthermore, model of daily data shows the positive relation between the volume of sales in dekaliter and price per dekaliter, it is not obvious relation. Regression model do not show the direction of the relation, only their relation. In North West we see that the bigger sales volume related to the higher price, what do not contradict to the law of demand. It could consequence of limited budget for the promo activities and discounts. In addition, company try to use promo process as promotion of special SKU, new product, high marginality products. In weekly data the stronger connection sales volume has with the size of discount. It shows the same logic, but controversy relation. The increase of discount on 1 rub leads to the decrease of sales volume in 1, 01 dekaliter.
The models equations with fixed effects of Year, SKU, City and factor variable:
Sales Volume in North West region (daily data) = fixed effects -0,046*PricePerDecalitre + 1,046*Weather.T_mean +0,161*Weather.Po_maх -0,405*Weather.U_mах- 1,729*Weather.Ff_min -43,697*Day_Off + as.factor(Volume tare) |
(4) |
|
Sales Volume in North West region (weekly data) =fixed effects -0,253*discount + 6,906*Weather.T_mах + 11,115*Weather.Po_maх + 8,676*Weather.U_mах + 7,702*Weather.Ff_mах+167,857*Day_Off + as.factor(Volume tare) |
(5) |
Table 6 Regression results for North West region with factor variable
In addition, it was decided to try include factor variable to the model as we have additional data about product characteristics, which could be helpful for the model. All other character variable did not show positive effect on a model, such as a type of product, type of tare, color tare, alcohol content, on the other hand the volume of tare increase the R-squared value of the model on 0,01-0,02. The rest regions have similar results concerning factor variable in a model, except volume of tare characteristic's, others rises R squared not significantly. The table 6 shows the results of the regression, which included the factor variable volume of tare, in North West region.
Next observed region in Moscow Center. In Moscow Center region the same situation with non-working days. Different relation with price for daily and weekly basis. Average temperature has positive effect on volume of sales in both models, in weekly data increase of temperature leads to bigger increase of volume of beer sales in six times. The same situation is witnessed with Po in weekly data the coefficient is racing in 250 times. In daily database the important variable is wind speed, it leads to the decrease of sales volume.
The models equations with fixed effects of Year, SKU, City in MC region:
Sales Volume in MC region(daily data) =fixed effects-0,031*PricePerDecalitre + 1,039*Weather.T_mean + 0,059*Weather.Po_maх -1,781*Weather.Ff_min - 41,17*Day_Off |
(6) |
|
Sales Volume in MC region(weekly data) =fixed effects +0,146*PricePerDecalitre + 6,804*Weather.T_mean + 15,431*Weather.Po_maх - 11,077*Weather.P_mean + 131,049*Day_Off |
(7) |
The significance of the models and particular variables is illustrated in the table 7. Considering the fact, that the weekly data model includes 5 variables, the R square shows relatively high result 0,525 and adjusted R square 0,523.
Table 7 Regression results for Moscow Center region
The difference of value of the R-squared value between model based on daily and weekly data are significant, it rises from 0,383 to 0,532. So it underline that model on weekly database is more suitable for forecasting and understanding the relation between the variables.
The models equations with fixed effects Year, SKU, City and factor variable:
Sales Volume in MC region(daily data) =fixed effects -0,098*PricePerDecalitre + 1,024*Weather.T_mean + 0,082*Weather.Po_maх -1,768* Weather.Ff_min - 42,49*Day_Off + as.factor(Volume tare) |
(8) |
|
Sales Volume in MC region(weekly data) =fixed effects -0,28*PricePerDecalitre + 6,571*Weather.T_mean + 14,329*Weather.Po_maх - 10,090*Weather.P_mean + 114,759*Day_Off + as.factor(Volume tare) |
(9) |
Then it was decided to improve a model adding an qualitative variable. Consistently, we added one by one qualitative variable, which we have in a data base. Only this specific variable brings positive effect to the model, it is volume of tare. Interesting that it is rightful for all the regions. The difference of the R-squared value between models without and with implementing a qualitative variable are not so significant, it rises from 0,383 and 0,532 to 0,394 and 0,538. In addition, it was stated in literature review, that volume of tare have influence on consumer choice in deferent external factors, especially weather factors.
Table 5 Regression results for Moscow Center region with factor variable
The models equations with fixed effects Year, SKU, City:
Sales Volume in South region(daily data) = fixed effects -0,025*PricePerDecalitre + 1,444*Weather.T_mean -0,485*Weather.P_mean -0,222*Weather.U_mean -0,312* Weather.Ff_mean -27,953*Day_Off |
(10) |
|
Sales Volume in South region(weekly data) =fixed effects +0,134*PricePerDecalitre + 9,648*Weather.T_mean + 5,098*Weather.Po_maх -0,543*Weather.U_mean -19,713* Weather.Ff_min + 124,573*Day_Off |
(11) |
In South region the results of the models shows the evidence that average temperature have positive effect on volume of beer sales. In weekly data the relation is more stronger, the increase on 1 degree of calcium average temperature leads to the 9,7 dekaliter. Conversely, the humidity and speed wind result in decrease of beer sales. On the basis of weekly data model could be said that the most preferable weather condition to consume beer products is warm day with low humidity level and wind speed. The influence of atmosphere pressure in millimeters of mercury is controversy for the daily and weekly data.
Table 6 Regression results for South region
Concerning models, which includes factor variables in South region, 0,45, 1,3 volume tare products is the more popular than other types. Kegs products, which has 30 liters volume of tare, shows negative relation due to the volume of sales. Other factors save their significance and coefficients.
Table 7 Regression results for South region with factor variable
The next observed region is Siberia and Far East.
The models equations with fixed effects Year, SKU, City:
Sales Volume in Siberia & FE region (daily data) =fixed effects -0,139*PricePerDecalitre + 0,45*Weather.T_mean -0,56*Weather.Po_maх +0,601*Weather.Ff_min -42,638*Day_Off |
(12) |
|
Sales Volume in Siberia & FE region (weekly data) =fixed effects -0,448*PricePerDecalitre + 4,748*Weather.T_mean + 7,292*Weather.Po_maх -7,583*Weather.P_mean +100,629*Day_Off |
(13) |
For the Siberia region the price factor, which was determined as significant is discount, it has inverse relation with volume of sales. As it was mentioned earlier on the first look, it seems illogical, that with higher discount sales are less, but beer brewery company gives discount in traditional trade for a specific products and for the limited volumes. It was made for the saving budgets and promotion purposefully. Average air temperature has positive influence on the volume of beer sales. In weekly data the relation is more stronger, the increase on 1 degree of calcium average temperature leads to the 4,75 dekaliter.
Table 8 Regression results for Siberia & FE region with factor variable
Table 9 Regression results for Siberia & FE region with factor variable
the R-squared value increases in average on 0,05 due to adding factor variable in Siberia and Far East region. It could not be estimated as significant contribution. On the other hand, it is witnessed that this variables are significant with coefficients, except 30 volume tare, it is keg products.
Подобные документы
Организация снабжения и хранения продуктов. Сырьевые базы ресторана. Организация хранения сырья. Производственно-торговая структура ресторана "Beer-ka". Характеристика производственного персонала. Контроль качества продукции. Методы и формы обслуживания.
отчет по практике [38,2 K], добавлен 25.04.2015Marketing of scientific and technical products and services in the field of information technology. Differences sales activity in B2B and B2C. The role of the procurement center and features of the procurement decision-making in the industrial market.
реферат [167,3 K], добавлен 27.05.2014Strategy and major stages of project’s fruition. Production of Korean cuisine dishes. Analysis of the industry sector, of produce’s market, of business rivals. Marketing plan, volume of sales, personnel and company management. Cost of the project.
курсовая работа [724,1 K], добавлен 17.02.2013Crisis in Russia and international tobacco enterprises. International tobacco companies in the Russian market. Рroper suggestions with the purpose to adapt them to the Russian tobacco market in the new circumstances to maintain the level of profit.
реферат [15,4 K], добавлен 15.05.2016Purpose of the Marketing Plan. Organization Mission Statement. The main strategies employed by BMW. Sales volume of automobiles. New records set for revenues and earnings. Current models of BMW. Product life cycle. Engagement in Celebrity Endorsement.
курсовая работа [879,4 K], добавлен 03.05.2015The collection and analysis of information with a view of improving the business marketing activities. Qualitative & Quantitative Research. Interviews, Desk Research, Test Trial. Search Engines. Group interviews and focus groups, Secondary research.
реферат [12,5 K], добавлен 17.02.2013PR-мероприятия организации коммерческой сферы. Цели, задачи и содержание PR-деятельности в организации. Анализ PR-деятельности пивного ресторана City Beer House. Примерный план PR-мероприятий. Средства реализации PR-проекта и оценка его эффективности.
курсовая работа [76,1 K], добавлен 29.08.2014Информация об объемах продаж. Эффективность рекламных каналов и работы sales-менеджеров. Степень рентабельности торговых точек. Сбытовая сеть, ее функции и виды. Обеспечение финансовой эффективности сбытовых операций и комплексного сервиса клиентуры.
реферат [49,8 K], добавлен 19.01.2011Огляд маркетингових стратегій, орієнтованих на збут. Методи стимулювання покупців до придбання товару. Правила поведінки торгового представника. Сегментування ринку за групами споживачів. Складання сегментарного рейтингу ринкових пріоритетів банку.
контрольная работа [166,9 K], добавлен 23.05.2014Характеристика деятельности и виды услуг оздоровительного центра "Крона". Проведение SWOT–анализа предприятия. Концепция продвижения предлагаемых услуг. Проведение рекламы на радио и телевидении. Sales promotion и творческий этап рекламной кампании.
творческая работа [48,9 K], добавлен 05.03.2011