What predicts real estate price better: closeness in geographic or characteristic space

Empirical modeling of the impacts of various sizes of shopping centers on the values of surrounding properties. Using a finite mixture model of heterogeneous households to delineate housing submarkets. Finding the problem of spatial autocorrelation.

Рубрика Экономика и экономическая теория
Вид дипломная работа
Язык английский
Дата добавления 02.09.2018
Размер файла 167,9 K

Отправить свою хорошую работу в базу знаний просто. Используйте форму, расположенную ниже

Студенты, аспиранты, молодые ученые, использующие базу знаний в своей учебе и работе, будут вам очень благодарны.

Размещено на http://www.allbest.ru/

Final qualifying work

WHAT PREDICTS REAL ESTATE PRICE BETTER: CLOSENESS IN GEOGRAPHIC OR CHARACTERISTIC SPACE?

Kusakin Danil Nikolaevich

Аннотация

Данный проект нацелен на изучение поведения продавцов на рынке вторичного жилья. Общеизвестно, что при оценке недвижимости для продажи учитываются собственные характеристики квартиры, характеристики окружения и текущая среднерыночная стоимость альтернатив. В данной работе мы расширяем модель пространственной авторегрессии, что позволяет изучить эффект близости на цену не только в географическом пространстве, но и в пространстве характеристик. Результате оценивания свидетельствуют о том, что схожесть в пространстве характеристик имеет большее влияние на стоимость недвижимости, чем географическая близость; так же было выявлено, что одновременное включение в пространственную модель матриц весов, сконструированных на основе географического расстояния и расстояния по характеристикам, позволяет увеличить предсказательную силу модели.

Abstract

The project is aimed to study the residential property sellers' behavior on secondary real estate market. It is widely known that when selling a real estate object, a seller takes into account the objects characteristics, characteristics of surroundings and the current average market price of alternatives. In this research we extend spatial autoregression (SAR) model that allows us to study the effect of closeness on a price not only in a geographical space, but in a characteristic space as well. Results of estimation demonstrate that proximity in characteristic space has a greater effect on a real estate price than a geographical closeness; it was also found out that simultaneous inclusion of weighting matrices based on a geographical and characteristic space into the spatial model increases predictive power of the model.

Introduction

The quality of residential property directly determines a general comfort of living of every person: how much time it will take him/her to get to a job, how many shops are around, what is crime rate in the area and so on. So every person tries to be a very responsible when buying a new flat or a house.

This market also draws a lot of researchers' attention due to unusual process of a pricing. All the residential property objects have a sophisticated structure of characteristics. Every single flat or house has its own internal parameters (such as total square, number of rooms, number of floors in the building, etc.). At the same time, every real estate object has a definite location that determines its external parameters (ecological factors, crime rate of the area, a distance to the nearest supermarket or center of the city, etc.). So internal characteristics refer to one flat or house while external variables belong to a wide range of objects.

This creates a first challenge of housing price evaluation: there is no a clear answer which group of variables should have a greater influence on a real estate object price.

Moreover, when sellers set a price they try to take into account a current market situation. For example, sellers adjust the price of their residential property basing on the price of other nearest objects, because they have the same (or a very similar) external characteristics, so they may be considered as a substitutes to some extend. It is obvious that the further objects are located, the less they have in common, so the less influence they have on each other. This leads us to the fact that prices of residential property objects are non-randomly disturbed; for researcher this means that they have to face the problems of autocorrelation and heteroscedasticity. This problem is widely discussed in this field of study and there are several ways to consider a geographical spatial autocorrelation effects.

Except geographical closeness there is one more way of housing prices interaction. If two houses have identical characteristics they may be considered as substitutes, does not matter how far they are from geographical point of view. So we can say that there is another way to measure closeness (or difference) of residential property objects. Analogically with a geographical space, where we discuss about closeness according to objects location, we can create a characteristic space, where closeness means the similarity of some characteristics.

While geographical effects are well studied there were no found articles aimed to study a characteristic space. Nevertheless, the study of these relations between objects can give us a new understanding of how the market works, that will let us to get a new important theoretical and practical results.

So the aim of this research is to find out whether closeness of the real estate objects in characteristics space or geographical space has a greater influence on real estate price.

The results of the research can be widely applied from the practical point of view, because the understanding of which housing characteristics have more significant impact to its price will help private owners and real estate agents to set more optimal and justify price for the flat. Also a new methodology allows to increase the predictive quality of models that can be used to create some kind of a housing price calculator on different web portals concentrated on the market.

To reach the goal further steps are required:

· to make an analysis of a theoretical background; this let us construct a good set of variables, understand what problems usually researchers face to in this field of study and what models they use;

· to determine which variables we should include to a characteristic space; some variables can restrictions following from common knowledge or technical limitations, so this makes them inappropriate for such an analysis, that is why we need some criteria to choose the best characteristics;

· to find a way of a characteristic space construction; both spaces have mostly identical intuition (they determine the distance between the objects), but characteristic space consists of several variables (in contrast to geographical space that accounts only an object's location), so we need to find an approach to compare them;

· to develop a basic econometric model that will be able to take into account both spaces simultaneously;

· to develop Machine Learning methods that will choose “best” econometric model; as far as we are aimed to build a predictive model, we need to some algorithm of a model optimization.

This paper consists of five parts. Section I provides theoretical background where common approaches of housing price evaluation are introduced. In the Second Section the motivation of the research and hypotheses are introduced. In Section III and IV we introduce the model and describe the database that will be used in the research. Following sections V demonstrates the results of the project.

1. Literature Review

As far as the purpose of this project is to create a housing price model with a high predictive power it is essential to find out the effects of which parameters were revealed and what models were used in previous works.

A common approach of a housing price modeling is applying a hedonic regression where the price of the object depends on its own internal and external parameters. In these models researchers usually separate these two groups of variables to reflect the special features of data that was described above. For example, vital factors determining real estate value among inner characteristics are living square, age of the building, materials used in decoration, the number of rooms, bathrooms and dressing rooms (Des Rosiers et al., 1996; Hoen et al., 2015).

However, much larger number of articles are aimed to investigate the effect of external characteristics. For instance, researchers pay attention to the impact of the area facilities: Brasington (2009), Chugunov (2013) found out that the nearest school quality (average score of children, teacher/pupils ratio and expenditure per pupil) is capitalized into housing prices.

Dapaah (2010); Des Rosiers et al. (1996); Shady et al. (2014) explored the effect of shopping centers. All of these works pointed out a clear relation between the nearest shopping mall and the real estate price, in some of these papers the effect was non-linear: there is a critical radius within which a shopping mall has a negative impact on the price of residential properties, and a positive influence beyond it. Such an effect can be explained with the fact that people are willing to pay more to live closer to a shopping mall because it gives them an easier access to it (reduces costs and time to get to it), but within short distance shopping malls create negative externalities (air pollution, traffic jams, noise). Sirpal (1994) studied the impact of shopping centre size on the residential property value. He collected the data for several areas with the same “quality” characteristics (like crime rate or racial composition) to get more homogeneous sample and to have identical other external effects. The results of this research show that the size of shopping center has a positive effect on the dependent variable.

Bowes and Ihlanfeldt (2001) investigated the effect of rail transit station: they say that opening of a new railway station creates negative externalities for residents of houses located close to it and may increase neighborhood crime rate, so new transit station reduces the prices of nearby houses. However, the importance of these effects varies with the distance from downtown and the median income of the neighborhood.

Otherwise, Osland et al. (2016) pointed out that in previous works some of socioeconomic neighborhood characteristics might have overestimated effect on real estate value. They constructed a zone-specific model to take into account random special effects of an area. After the estimation authors come to a conclusion that most of socioeconomic neighborhood parametres have relatively modest impact on dependent variable. This contradiction demonstrates the importance of the right methodology, assumptions and considering specifics of the data. This idea is strongly supported by Belasco et al. (2012). Authors say that a “good” model may present more convincing results even if we have is relatively small data rather than “regular” models used with more detailed and comprehensive database.

Another popular line of researches in this sphere is the study of an influence of environmental quality. Lots of works are aimed to study ecological factors, for example, it was found that Chicagoans are ready to pay more to buy a residential property in the city area with lower air pollution (Chattopadhyay,1999). Katysheva and Hakimova (2012) revealed the effect of the concentration of a carbonic oxide in the air and the distance to the nearest factory on the real estate price; however, there was no relation between the concentration of other contaminants (nitrogen oxide and nitrogen dioxide) and the housing price. Other authors analyze effects of some real estate characteristics that refer to ecology: one of the most popular topics are water resources (Lansford, Jones, 1995; Young, Teti, 1984) and parks (Hoshino, Kuriyama, 2010); usually researchers reveal nonlinear effect of these characteristics that depend on the distance between a studying objects and residential property. Hoen et al. (2015) proposed an alternative way to measure surroundings quality: authors explored the influence of new nearby wind energy facilities; however, they could not find a stable statistical effect of wind turbines in either the post-construction or post-announcement/preconstruction periods.

One of the most difficult problems to deal with in modeling residential properties price is spatial autocorrelation (SA). Some papers that pay a lot of attention to the problem are introduced in table 1.

Table 1. Finding the problem of spatial autocorrelation

Article

Models

Comments and Results

Basu, Thibodeau (1998)

Semilog hedonic house price equation and a spherical autocorrelation function

Found strong evidence of spatial autocorrelation in transaction prices within submarkets.

Correlation from district characteristics.

Kriged EGLS predictions are more accurate than OLS in six of eight submarkets, while OLS has smaller prediction errors in submarkets where the residuals are spatially uncorrelated.

Helbich et al. (2014)

OLS, Spatial Autoregressoin, Spatial 2SLS, Mixed geographically weighted regression

Having provided evidence that global specifications are not fully capable of modelling spatial heterogeneity (SH).

The empirical model comparison reveals that, independent of the model, ignoring SH always leads to a lower model fit and worse prediction accuracies.

Brasington (2009)

Hedonic price estimators corrected for spatial autocorrelation

Each house price influences other nearby house price […], biased and inconsistent OLS estimates.

Analyzing table 1 we can come to a conclusion that all the housing markets have the problem of spatial autocorrelation. There are several causes of the problem: “snob” effect (some households are willing to pay more just to live in a premium area) (Can, Megbolugbe, 1997); natural historical development of every area, due to which all houses in a particular district have the same external characteristics (like a big shopping centers, accessibility of services, parks and so on), while houses in other parts of the city do not have. Moreover, except natural reasons of SA there are some econometric causes: because of misspecification of the model or omitted variables we cannot correctly estimate unobservable effects of a definite area, that is way the results of such a price modeling will be incorrect in this case (McMillen, 2010; Osland, 2013).

Ignoring the problem of spatial autocorrelation leads us to biased and inconsistent estimated parameters, incorrect confidence intervals, that leads us to a low explanatory quality and increases the risk of type I error that means low predictive quality (Anselin, 2002; Basu, Thibodeau,1998; Devaux, Dube, 2016; Haining, 2009).

Initially the problem of spatial autocorrelation was described by Cliff and Ord (1968), however, this branch of study got a great development in 1990s (Devaux, Dube, 2016). The purpose of spatial modelling is not only to refer dependence between variables, but also to take into account the relation between different observation for a single variable.

Nowadays, we have two main approaches to account for spatial dependences: spatial lag model and spatial autoregressive error model. There are no huge differences between these model, and the choice of a model depends on our modelling strategies and the way we prefer to specify spatial correlation.

A general specification of spatial autoregressive error model is (Anselin, 2003):

,

where: W is the weight matrix;

л is a spatial autoregressive parameter;

X is a vector of object's characteristics;

is a vector of characteristics effects;

е, u are vectors of errors.

In this type of models, we explain the housing price by the most a priori important variables and leave the spatial effect to the residuals. However, this model is not devoid of some drawback. First, we have to determine elements of weighting matrix, where wij is responsible for whether houses refer to neighbor areas/zones (like in (Osland et al., 2016)), this requires a clear natural separation of a city into such a kind of zones or we have to divide all the objects into some neighbor groups. Moreover, one of assumptions of this model is that residual have to be uncorrelated, this means that it is “not correct in all cases where spatially autocorrelated residuals are detected” (Osland, 2010).

Another way to consider spatial dependence is the spatial lag model. The main idea of it is that “near object” are harder correlated than “distant object”. A general specification of this model can be expressed as:

,

where: P is a list of prices

W is a weighting matrix

X is a vector of object's characteristics;

is a vector of characteristics effects;

u is an error term.

To express the main idea of the model weighting matrix W is constructed in a way that the further geographically objects are located the less weight it has (the less is corresponding wij); we also may consider objects that do not interact by giving them corresponding zero value in W. In this case the WP element is responsible for spatial dependence and may include all the unobservable interactions between object that we want to evaluate. WP also may be considered as the list of average weighted price of all the objects for every object i. The parameter reflects the spatial correlation (Wall, 2004). If =0, the model (2) is a standard linear regression model; otherwise, we actually have SA and the estimated value of the parameter shows as how strong the relation between objects is. A vital step of spatial autoregressive modeling is the choice of functional form of elements in matrix W. The matrix is not calculated automatically, so it is required to construct it manually. One the first step we have to calculate a distance matrix D, where each element dij is the distance between objects i and j. For multy-dimensional parameter (like geographical point of an object) it is necessary to choose a distance metric for it. Usually in such a kind of studies researchers use an Euclidean distance due to its easiness and naturalness of interpretation from the practical point of view. However, Lu et al. (2014; Shim et al. (2014) showed that sometimes non-Euclidean metrics can improve model fit and provide additional and useful insights about the nature of house price variation.

The choice between these two types of models is mostly connected with our purposes of the research, however, the assumption of uncorrelated errors makes model (1) less universal than model (2). Furthermore, Anselin and Lozano-Gracia (2008) compared these two types of considering spatial autocorrelation in their research and came to a conclusion that the spatial lag model is more effective for studying for the market, because it allows to consider some unobservable characteristics that we cannot evaluate in other ways.

As we can see, a lot of researchers were done in this field of study. Different types of models were developed to reflect features of the market and to avoid some econometrical problems. The analysis of a theoretical background shows that previous works were concentrated on the study of a geographical distance between residential property objects only. However, it is obvious that evaluating the real estate object seller also compares its characteristics with the characteristics of alternatives and makes the price adjustments based on this type of closeness, so there are correlation effects between objects in a characteristic space as well. Accounting of this type of objects interactions allows to make more accurate housing market analysis and to get more consistent results, that is why the survey of a characteristic space is an important point in this field of study.

2. Research question and hypotheses

The absence of studies aimed to investigate the interactions between residential property objects in characteristic space can be considered as a gap of knowledge in the field of research, so in this project we develop a methodology of a characteristic space construction and estimation to get a deeper understanding of how the market work and how some processes are organized there.

To understand why we should study a characteristic space, we have to find out how the market works and why the problem of spatial autocorrelation occurs. On the first step, a seller sets the price according to characteristics of the real estate object (both internal and external). Then he/she adjusts the price according to the current market situation: he/she analyzes prices of other objects for sale and changes the price of his/her house according to it, besides that the closer are these “other” object the more they should affect the establishing price. For example, we want to sell our house, there are two alternative selling houses on the market: one is 100 meters away from our one and the second - 500 meters away; each of these alternatives should affect the price of our house, but we naturally understand that the second option is further from us, so it should have lower impact. Such a kind of a price adjustments and interactions is a cause of a geographical autocorrelation.

Nevertheless, geographical correlation is not the only problem that occurs on the market. We can also determine a characteristics space that will include the characteristics of the residential property object. This space is constructed and “works” similarly with geographical space, the only difference is how we understand the distance in this spaces. We can give an analogical example of interactions among characteristics: we sell a one-bedroom house, in this case a two-bedroom house is a closer substitute to our house that a three-bedroom one, this means that two-bedroom house is closer than three-bedroom one in characteristics space, so it will have a greater effect on our house price.

So the analysis of dependences in characteristic space is an important step of the housing market study, because it may give a new piece of knowledge useful for practitioners that will allow them to forecast the housing price more accurately.

Moreover, most of previous papers studied European and American housing markets, and far fewer examined the Russian market, at the same time it has some differences in comparison with them. It is reflected in the prevalence of mass housing building in Russia, while in other countries private real estate is widely spread. Dealing with mass building we should take a closer look on the structure of data related to inner characteristics because houses built at the same time and located closely to each other tend to have the same inner features so it may cause autocorrelation in this pool of parameters as well. The analysis of a Russian market also leads us to inability to include some characteristics like availability of a swimming pool or a garage, number of bathrooms and other, because mass building and a private housing have different sets of parameters.

The research question of the project is to find out if taking into account a spatial autocorrelation in the characteristic space allows to increase the predictive quality of the housing price models.

On this stage we can develop the following hypothesis:

· H1: accounting for a spatial autocorrelation increases a predictive power of the model.

In previous article was shown that applying spatial models allows to consider some unobservable effects that leads to more accurate prediction so we can assume that the same procedure will let us to improve our model as well. On the one hand, this hypothesis seems to be trivial, but, on the other side, the wrong determination of how the SA is arranged may lead to the overfitting problem, so the prediction for a test sample might differ from real data significantly (e.g., Basu, Thibodeau (1998) showed that accounting for a SA on the submarkets where it was found reduces prediction error, but on the submarkets where the residuals were uncorrelated OLS estimator introduce more qualitative result).

· H2: closeness in a characteristics space has a greater influence on the residential property price than a closeness in a geographical space.

We assume that when people buy a house or a flat they set some criteria on its quality (like minimum total square or number of rooms); then they compare suitable for them alternatives and only after that they choose the option with the best geographical parameters. So, to our mind, geography comes after characteristics.

3. Data

To study determinants of housing price the data about both characteristics and geographic spaces is required. To collect the information about first group of parameters we use Metrosphera.ru website. This web portal was chosen due to its reliability and popularity: the number of visitors is about 80000 - 85 000 per day.

The dataset contains the information about all sale announcement between 20.10.2014 and 01.02.2015 in Perm, Russia. Perm is 13th largest city in Russia and has approximately 1 million residents. In the given period there were no any significant macroeconomic structural breaks, so there is no need to estimate time effect.

A special feature of this data is that we observe declared in the announcement (asking) but not the real sale price. Asking and real sale price may differ, so in this project we deal with supply side mostly, because buyers behavior is unobservable for as. Except the list of prices, the database contains information about such residential property characteristics as address, city area, number of rooms, floor, total number of floors in the building, materials and type of the building, dummy for real estate agent.

To add geographical coordinates GIS data of all buildings in Perm were taken, then coordinates were matched to the real estate object by an address of the building. This allows as to calculate the geographic distance between all real estate objects in a dataset. The GIS data also contain the full list of firms with coordinates, so we can calculate the distance between houses and other facilities (e.g., to the nearest school or to the nearest bus stop) if necessary. As an example, we added the distance to the nearest supermarket and to the nearest hypermarket, because in most of previous articles a stable statistical effect of these variables on housing price was found.

We conducted the initial data analysis to exclude the most obvious outliers. For example, there was found a flat that is located on the 106 floors, however the highest building in Perm has just 27 floors; there were also excluded observations which have living/kitchen square bigger than total square (mistakes when filling up the form); there was set a limit on the total square, minimum - 12 sq.m. (technical standard), maximum - 400 m2 (the biggest flat in Perm). After that the sample is about 14 600 observations.

Our primary dataset contained information about 11 types of building materials, but more detailed analysis showed that most of them have almost the same technical characteristics (like durability or soundproofing), so all of them were divided into two big groups: brick (52,2% of sample) and panel (47,8% of sample) buildings.

Another important variable is type of the building. First of all, it is strongly correlated with the age of the construction, secondly, it determines a general quality of the residential property (total square, space planning, room height, …). In our sample we have several types:

1. “Lenin's project” (1920-1932 years of building);

2. “Stalin-era” building (1930-1960s years of building);

3. “Khrushchev-era” building (1957 -1973 years of building);

4. “Brezhnev-era” building (1972-1985 years of building);

5. Grey panel (1978-1990 years of building);

6. “Small family” house (1980-1987 years of building);

7. Improved space planning (1985-2000 years of building);

8. Individual project (2000-curent time).

Table 2. Main descriptive statistics of the sample

Variable, units

Number of observations = 14593

Mean

St.d.

Min

Max

Price, thous. rub.

2966.19

1729.98

295

27800

Price per m2, thous. rub.

56.72

11.75

7.88

112.12

Number of rooms

2.02

0.93

1

7

Number of floors in the building

8.87

5.04

2

26

Floor of the flat

4.66

3.82

1

25

Total square, m2

52.49

24.95

12

388

Living square, m2

31.49

16.37

6

280

Kitchen square, m2

8.53

4.27

1

160.6

Distance to the nearest supermarket, km

0.19

0.14

0.00

1.59

Distance to the nearest hypermarket, km

6.62

4.84

0.02

18.73

As we can see from table 2, our dataset does not contain observations which do not follow a common knowledge, so we can trust it.

The project is based on estimation of a predictive model, so we need to determine two samples: first - a training sample, second - a test sample. We cannot test the prediction quality of a model on the same data, because we can construct a specification that will give a good explanation of a particular dataset behavior, but will fail to predict the values for a new dataset, so we will not can to use such a model for practical purposes.

To estimate the model, we need to construct a weighting matrix which has n*n size, where n is a number of observations, so if we increase sample size, the amount of time required for estimation is growing rapidly and the estimation of a model using a large dataset can take an infinite amount of time from the practical point of view, that is why we use a part of an initial sample to simplify the calculations. To determine a train sample, we used a random selection of observations procedure to extract 25% of the initial sample. Usually in articles with predictive models researchers determine the size of a test sample less than of a training one to have more data for estimation and to get more consistent results. In this work we cannot use the same approach, because in spatial models we have component, so if we determine different sizes of the samples we will get different scales of this component due to matrices multiplication, that is why we set the size of a test sample equal to training one. Moreover, the size of a training sample does not become smaller in our case when we increase the number of observation in a validation sample, because we have even more data than the program can estimate, so the equal samples' sizes is not a problem in our case. So we have 3596 observations in both training and test samples. To make sure that these samples are random, we compared main descriptive statistics (table 3), and according to the analysis we cannot say that samples differ one from another.

Table 3. The comparison of descriptive statistics of samples

Variable, units

Total sample

Training sample

Test sample

Number of observations = 14593

Number of observations = 3596

Number of observations = 3596

Price, thous. rub.

2966.2

(1728.9)

3002.7

(1862.2)

2950.9

(1721.4)

Price per m2, thous. rub.

56.7

(11.8)

56.9

(11.9)

56.6

(11.6)

Number of rooms

2.01

(0.92)

2.00

(0.93)

2.02

(0.93)

Number of floors in the building

8.86

(5.04)

8.86

(4.97)

8.91

(5.15)

Floor of the flat

4.66

(3.82)

4.66

(3.86)

4.69

(3.83)

Total square, m2

52.5

(24.9)

52.7

(25.7)

52.4

(24.9)

Living square, m2

31.5

(15.8)

31.4

(17.2)

31.1

(15.1)

Kitchen square, m2

8.53

(4.27)

8.59

(4.90)

8.48

(4.09)

Distance to the nearest supermarket, km

0.19

(0.15)

0.19

(0.15)

0.20

(0.15)

Distance to the nearest hypermarket, km

6.62

(4.84)

6.65

(4.85)

6.62

(4.85)

Table cells contain mean, standard errors are in parenthesis.

As we can see main descriptive statistics do not differ significantly for the samples, so we can consider that a random sample is representative for total one. In this case an average flat has 2 rooms, total space about 53 m2, located in a 9 floors building and costs approximately 3 million rubles. Variable of the price, total and living square have a relatively high variation; this may be caused by distribution of the real estate through the city and the type of the building.

4. Methodology

An implicit goal of the work is to construct a model with a high predictive power, to do that we have to determine a “good” set of variables. In this research we use logarithm of the price per m2 as a dependent variable. Price per m2 is the main indicator for both supply and demand sides, so it is more consistent that absolute value of the price; logarithm transformation allows to show the influence of parameters on the price per m2 in a percentage ratio that is a common approach in this field of study. The list of independent variables is following: total square, number of rooms, dummy for the first floor, house floor, dummies for city areas (7 in total), dummies for house type (8 in total), dummies for building material, distance to the nearest supermarket, squared distance to the nearest supermarket and the number of supermarkets in a certain radius (500 m, 1000 m, “walking” distance).

Such variables as total square, number of rooms, dummy for the first floor, house floor, house type and building material are considered as one of the most crucial characteristics of a flat price in Russia, they are introduced on the housing web portals, so we should include them to the model. Some other vital residential property parameters (like kitchen square) were excluded from the analysis for two main reasons: they were highly correlated with other variables or there were to many omissions in the data, so the estimation of these variables would lead us to less consistent results. Perm is divided into seven large areas that differ one from another by the location, availability of facilities, status value and other parameters, so including dummies for areas allows us to account some unobservable characteristics and to avoid estimators bias. As one of the most important external housing characteristic we include distance to the nearest supermarket, because in most of previous articles the effect of this variable was found; squared distance is used to estimate nonlinear effects, and the separation of the shops to super- and hypermarkets acts as an instrument of a shop size. The number of supermarkets within a definite radius is another important variable, because it shows general area convenience and quality that may be determining factor when choosing a flat. Including variables responsible for different radiuses in the project is an alternative way to estimate nonlinear effects. A “walking” distance is a parameter of the closest, the most easy-to-reach objects; it was calculated as an average distance to a bus stop and in our case it is equal to 238 m.

It is obvious that the list of independent variables is relatively parsimonious and including other variables may increase a model quality. However, this project is aimed to developed a new methodology rather than to estimate “the best” current model. Introduced in this research approach has a great potential for further market analysis and it is not limited by a chosen set of variables.

The predictive power of different specifications will be compared using the following criteria (formulas are introduced in table 4):

1. Standard Forecast Error;

2. Mean Squared Error;

3. Root Mean Squared Error;

4. Mean Absolute Error;

5. Mean Absolute Percentage Error.

Table 4. Formulas of a prediction quality criteria

Criteria

Formula

Standard Forecast Error (SE)

Mean Squared Error (MSE)

Root Mean Squared Error (RMSE)

Mean Absolute Error (MAE)

Mean Absolute Percentage Error (MAPE)

where: ;

;

- is a real value of a dependent variable;

- is a predicted value of a dependent variable;

is a number of observations.

The research methodology will be based on a hedonic price model, where a price of a particular real estate object depends on its internal and external characteristics simultaneously. A basic price model is a regular linear regression model:

,

where: - is an asking price for the object i;

- is a vector of object i and its surroundings characteristics;

- is a vector of characteristics effects;

- is an error term.

As far as in all previous articles the spatial autocorrelation was found, we suppose that our dataset suffers from the same drawback. There are two approaches to test the existence of spatial interactions between objects: Moran's Index and estimating regression with control for spatial effects (Devaux, Dube, 2016).

Moran's Index can be determined as (Osland et al., 2016):

,

where: n is a number of objects;

is a spatial weight;

is a value of interested variable;

is a mean value of interested variable.

Zero hypothesis of Moran's test is that objects are randomly distributed in a space, so if we reject H0, it will be necessary to include spatial correlation into models.

The verification of spatial relation through regression and significance of spatial correlation parameter might be not universal, because the result depends on the chosen specification and variables, so such a type of analysis can be less consistent and in this project we will use Moran's I only to determine the fact of availability of SA problem.

In the research we study the effects of two different spaces, so we need to specify a general spatial lag model (equation 2) more precisely. New econometric equations of housing price are:

,

,

,

where is an asking price for the object i;

is a vector of object i and its surroundings characteristics;

is a vector of characteristics effects;

is a vector of objects' prices;

is a parameter of geographical spatial correlation of prices;

is a parameter of geographical spatial correlation of prices;

is a weighting matrix where each element is a function of geographic coordinates of object i and other objects;

is a weighting matrix where each element is a function of characteristics of property i and other objects;

is an error term.

One the one hand, it may seem that a general model (7) is a priori better then models (5) and (6) because it accounts the effects of both spaces, so we can get more information from it. However, it is not always true, because if there is no SA problem in some space but we still include the effect to the model, it may cause the overfitting problem, so we will have “low bias - high variance” model that will have low prediction quality.

As far as the aim of the research is to explore which of the spaces (geographical or characteristics) has a greater influence on the real asking estate price, we have to compare the predictive power of equations (5)-(7) using criteria introduced above.

To construct characteristic space, we need to use variables from the same list that we used for objects' characteristics. If we construct weighting matrix based on a variable with low variance, the corresponding component if models (5)-(7) will tend to be similar to original variable, so this leads us to a multicollinearity problem. The estimation of the model with full multicollinearity becomes impossible (we need to exclude correlated variables); and a model with partial multicollinearity becomes unstable and sensitive to including/excluding of observations. To solve this problem, we need to include such variables to the specification only once (because absolute or weighted values of this parameter are very close to each other). We will estimate absolute values only and the choice is based on economic intuition: when people buy a residential property they pay their attention to its characteristics, and only after that compare it with other objects; so a model with absolute values will be more effective than a model with weighted characteristics. So we can determine an equation of a price as:

,

where is an asking price for the object i;

is a vector of object i characteristics with high variation;

is a vector of object i characteristics with low variation;

is a vector of characteristics effects;

is a vector of objects' prices;

is a parameter of spatial correlation;

is a weighting matrix based on ;

is an error term.

There are two vital issues concerning proper metrics of distance. First, there are several ways to measure geographical distance (in kilometers, in steps/time required to get to a place, etc.). In contrast, there are not common approaches to measure a distance in characteristics space. There is a theory of closeness in discrete-continuous space that was developed for objects clustering aims, so in this project we try to apply this theory in the particular case and will test the relevance of different metric. The second important question is the choice of functional form of weighting matrix in the spatial autoregression model. The function will directly affect which objects we consider as close ones and which are not, so predictive quality of different models depends on this function, that is why we have to pay a lot of attention to it.

We use a distance in kilometers as a metric of geographical distance because it is the most widely spread approach that allows to give a more understandable and universal interpretation. Metrics like time required to get to a place are less appropriate because different city areas have different availability of some facilities (like, public transport), so in this study we do not use them.

A common approach to calculate weighting matrix W is to take inversed values of a distance matrix D, . In this research such an approach cannot be applied, because in the sample we have objects with the same address and with the same characteristics (at least one of them), so the distance between them is equal to zero. So to solve the problem elements of weighting matrix are calculating using formula:

,

where is a weight of a j's price influence on a i's price;

is a distance between flats i and j;

is a maximum distance between flats in a sample;

is a parameter of the space shrinkage.

On the first step we estimate models (5)-(7) using parameter equal to Ѕ, to make initial analysis of this models. Then we will vary this parameter trying to increase model quality.

To calculate a distance in characteristics we use variable that have different measurement units and scales, so cannot include their absolute values to one space. To solve the problem, we scale the variables that will be used for space construction from 0 to 1 using formula:

,

where: is a scaled variable;

is a variable of interest.

This formula of scaling is used because it is a linear transformation, so we shrink the variables from 0 to 1 and maintain the variance of initial variable simultaneously using this approach.

A characteristics space can be introduced as a sum of weighting matrices multiplied on list of prices. The problem is that some variables have a greater effect on a housing price than other, so we also need to put some weight for each variable in the sum to have more accurate prediction. In other words, in the model we introduce the parameter of a characteristic space as:

,

where is a vector of objects' prices;

Wc is a characteristic space weighting matrix;

Wj is a weighting matrix based on a variable j;

aj is a weight of a variable j in characteristic space.

There are no any assumptions what weight we should put for each variable in characteristic space, so we need to estimate it in the model. We will estimate two more specifications, including characteristic space without any limitations and with constraint on the weighs sum equal to 1, and will compare the predictive quality of both to choose the best one.

5. Results of the research

Before estimating the models, we carried out Moran's test to search for a spatial autocorrelation of the residential property objects. The test shows that we reject the null hypithesis of random distribution of objects in a geographical space on a one per cent probability level. This tell as that at least in geographical space we have the problem of a SA, so we should use spatial models to estimate a housing price more accurately. We cannot estimate a SA in characteristic space because it consists of several variables, but the test is conducted only for one parameter. To overcome the difficulty, we will estimate a linear model and an autoregressive model for geographical space and will test the significance of a spatial autocorrelation parameter. If this estimator will be significant, the chosen specification reflects the spatial relationships of the object, so we can use it for estimation of correlation in a characteristic space as well.

On the first step we estimated five models. First model is a standard linear regression; second - spatial lag model that includes geographical space; third - spatial lag model that includes characteristic space; fourth - spatial lag model that includes characteristic space, where sum of the weight of variables in the space is equal to one; fifth - spatial lag model that includes both geographical and characteristic space. On this stage we set the parameter of the space shrinkage (equation 9) equal to Ѕ. The results of estimation are introduced in the table 5.

Table 5. Results of estimation

[1]

[2]

[3]

[4]

[5]

Dummy on a first floor

-0.0247***

-0.0215***

0.0035

837.3***

0.0024

(0.0076)

(0.0073)

(0.0132)

(17.12)

(0.0127)

House floor

0.0007

0.0008

0.0089***

226.0***

0.0081***

(0.0009)

(0.0007)

(0.0031)

(3.599)

(0.0029)

Number of rooms

-0.0773***

-0.0770***

-0.0591***

-61.93***

-0.0570***

(0.00470)

(0.00451)

(0.00580)

(9.665)

(0.00558)

Total square

0.0007***

0.0006***

-0.0026***

11.87***

-0.0027***

(0.000195)

(0.000187)

(0.000306)

(0.473)

(0.000294)

Dummies on a city area

+

+

+

+

+

Material (brick)

0.0303***

0.0254***

0.0209***

2.154

0.0173***

(0.00595)

(0.00571)

(0.00578)

(9.679)

(0.00555)

Parameters of closeness to the nearest super- and hypermarkets

+

+

+

+

+

const

4.073***

3.360***

4.715***

-11851.1***

4.126***

(0.0153)

(0.0428)

(0.120)

(28.86)

(0.120)

Geographical space

0.0000866***

0.0000860***

(0.00000488)

(0.00000499)

Characteristic space:

House floor

0.0000214***

0.579***

0.0000185**

(0.00000762)

(0.00831)

(0.00000733)

Number rooms

-0.00000603

0.0374***

-0.00000400

(0.00000434)

(0.00725)

(0.00000418)

Total square

-0.0000756***

0.255***

-0.0000754***

(0.00000628)

(0.00961)

(0.00000603)

Number of supermarkets in 500 m radius

-0.00000183

(0.00000372)

0.0810***

(0.00608)

0.00000455

(0.00000359)

Number of supermarkets in 1000 m radius

0.0000159***

(0.00000322)

0.0476***

(0.00533)

-0.00000224

(0.00000327)

Num. of obs.

3596

3596

3596

3596

3596

Num. of parameters

24

25

29

28

30

Log.likilehood

1613.9

1765.9

1743.3

-24954.1

1887.0

AIC

-3179.8

-3481.8

-3428.5

49964.1

-3713.9

BIC

-3031.3

-3327.1

-3249.1

50137.4

-3528.3

In all models a dependent variable is a logarithm of a price per m2 and an identical list of control variables

Standard errors (robust) in parentheses;

Significance level: * p < 0.1, ** p < 0.05, *** p < 0.01;

Base category: house type - “Khrushchev-era” building; material - panel; city area - Sverdlovsky.

As we can see from table 5, the comparison of models [1] and [2] shows that the parameter of spatial autocorrelatetion in geographical space is significant, so, first of all, this specification represents spatial interactions between the residential property objects and can be used for further analysis, moreover, an addition of a parameter of SA allows to improve model quality according to criteria Likelihood, AIC and BIC.

To construct characteristic space five variables were taken: a house floor, number of rooms, total square, number of supermarkets in 500 m and 1000m radius. These variables may be considered by people as an important characteristic of houses so the difference in these characteristics might be vital for them; and the preliminary analysis showed that these variables have relatively high variation, so simultaneous inclusion of an absolute value (as Xs) and weighted value (as a part of characteristic space) does not lead as to the problem of multicollinearity, so we can get a consistent result using them. The analysis of a model [3] shows that some of the variables in characteristic space are significant, so this supports our hypothesis of the existence of autocorrelation in characteristic space on a housing market (at least among these variables). According to criteria introduced in table 5 the model [2] is slightly better that the model [3], this may be caused by the fact that to determine a characteristic space we included to many variables to the specification some of which are insignificant, but AIC and BIC are sensitive to the new variables. However, the difference between this models is not large and may be caused by the random effects in the sample. Moreover, the aim of the project is to construct a good predictive model, so we will base on the predictive quality of them mostly.

The coefficients in front of variables in characteristic space can be interpreted as weights of the parameter in the constructed space. In the model [4] we set the constraint to the sum of coefficients in the characteristic space, because it allows to give economical interpretation to the value of each characteristic. However, the estimation of the model shows that all the coefficients and model quality criteria in the model [4] significantly differ from the corresponding estimators in models [1]-[3]. This is caused by the fact that real estimators have a very small absolute value, but we made an artificial change of them, and the general explanatory (table 5) and predictive (table 6) quality of the model decreased significantly. Another reason of such a great difference of estimators of the model [4] in comparison with previous ones is that estimated coefficients of characteristic space variables consist of a spatial autocorrelation parameter and a weight parameter simultaneously, so we cannot separate this effects in this model. As far as the interpretation of coefficients is not so important for us, we will not put such a constraint to the models in a further analysis.

Simultaneous accounting for an autocorrelation in geographical and characteristic space in the specification [5] gives a great improvement of a model quality, furthermore, the variables in both spaces are significant, so this specification represents our assumption of a housing price evaluation in a best way.

The criteria of a prediction error of the models [1]-[5] are shown in a table 6.

Table 6. Criteria of a prediction error

Criteria

[1]

[2]

[3]

[4]

[5]

SE

0,159

0,154

0,153

2630,7

0,148

MSE

0,025

0,024

0,024

696780,3

0,022

RMSE

0,159

0,154

0,153

2630,9

0,148

MAE

0,112

0,106

0,106

2120,6

0,101

MAPE

0,028

0,027

0,027

530,1

0,026

As we said before the model with constrained has a significantly worse prediction results that all the other models, we can exclude it from the further analysis.

At the first sight seem to be that models [1]-[3] and [5] have almost identical results of prediction error. However, such a small difference in absolute values follows from the nature of a dependent variable (mean is equal to 4,0176), so even little changes are important for us. For example, the prediction error of a model [5] is approximately 10% less that of a model [1], average price of a residential property in our sample is about 3 million rubles, so the prediction of spatial model is 300 thousand rubles is more accurate than the prediction a a liner model.


Подобные документы

  • Issues about housing prices formation process. Analytical model of housing prices. Definition a type of relationship between the set of independent variables and housing prices. The graph of real housing prices of all Russian regions during the period.

    курсовая работа [1,6 M], добавлен 23.09.2016

  • Identifing demographic characteristics of consumers shopping in supermarkets. Determine the factors influencing consumer’s way of shopping and the level of their satisfaction (prices, quality, services offered, etc in supermarkets and bazaars).

    доклад [54,4 K], добавлен 05.05.2009

  • Adam Smith - a Scottish moral philosopher, pioneer of political economy, and key Scottish Enlightenment figure. Nature and Causes of the Wealth of Nations. The Real and Nominal Price of Commodities or of their Price in Labour and their Price in Money.

    презентация [107,2 K], добавлен 31.05.2015

  • The profit function possesses several important properties that follow directly from its definition. These properties are very useful for analyzing profit-maximizing behavior. Outlining the properties of the profit function important to recognize.

    анализ книги [15,2 K], добавлен 19.01.2009

  • Solving the problem of non-stationary time series. Estimating nominal exchange rate volatility ruble/dollar by using autoregressive model with distributed lags. Constructing regressions. Determination of causality between aggregate export and volatility.

    курсовая работа [517,2 K], добавлен 03.09.2016

  • Financial bubble - a phenomenon on the financial market, when the assessments of people exceed the fair price. The description of key figures of financial bubble. Methods of predicting the emergence of financial bubbles, their use in different situations.

    реферат [90,0 K], добавлен 14.02.2016

  • Short and long run macroeconomic model. Saving and Investment in Italy, small open economy. Government expenditure and saving scatterplot. Loanable market equilibrium in closed economy in the USA. Okun’s Law in the USA and Italy, keynesian cross.

    курсовая работа [1,6 M], добавлен 20.11.2013

  • Понятие внешней среды организации. Сведения о предприятии ОАО "Шахтинский завод Гидропривод". Компоненты макроокружения: экономический, политический, правовой, социальный, технологический, природно-географический. Swot и space-анализ деятельности фирмы.

    курсовая работа [380,6 K], добавлен 03.06.2014

  • The air transport system in Russia. Project on the development of regional air traffic. Data collection. Creation of the database. Designing a data warehouse. Mathematical Model description. Data analysis and forecasting. Applying mathematical tools.

    реферат [316,2 K], добавлен 20.03.2016

  • Estimate risk-neutral probabilities and the rational for its application. Empirical results of predictive power assessment for risk-neutral probabilities as well as their comparisons with stock-implied probabilities defined as in Samuelson and Rosenthal.

    дипломная работа [549,4 K], добавлен 02.11.2015

Работы в архивах красиво оформлены согласно требованиям ВУЗов и содержат рисунки, диаграммы, формулы и т.д.
PPT, PPTX и PDF-файлы представлены только в архивах.
Рекомендуем скачать работу.