Review of localization methods and data analysis package

Using Duraton-Overman test to study detailed location patterns of the forestry and agricultural industry in Russia. Data analysis package. The first article on geocoding. Methods for measuring localization. Distance-based localization measurement.

Рубрика Программирование, компьютеры и кибернетика
Вид курсовая работа
Язык английский
Дата добавления 11.02.2017
Размер файла 579,9 K

Отправить свою хорошую работу в базу знаний просто. Используйте форму, расположенную ниже

Студенты, аспиранты, молодые ученые, использующие базу знаний в своей учебе и работе, будут вам очень благодарны.

Размещено на


Russia is a country with a vast territory and a significant reserve of natural resources. Now, however, she found herself in difficult conditions when it is necessary to “heal” on import dependence due to the imposition of sanctions against Russia. This research is about Russian agricultural and forestry economic sectors. This two sectors was chosen because Russia is the country with largest forest resource (20,1% world stock) and income of the agricultural sector is a major part of the GDP (14%). Therefore improvement in the economic efficiency of these sectors can greatly help the country's economy as a whole.

One of the most important characteristic of the economic landscape is the geographical concentration of economic activity. Until our days there were no researches that tried to measure industrial localization in Russia not only in forest or agricultural industry but in any industry at all. Now the interest in cluster analysis rises due to the necessity of finding new ways to support firms in a stagnant economy in Russia.

The problem is that it is hard for fresh firms to understand what is the best location, to start their business and why this location is the best. This research will help new producers to solve this problem with the help of localization measurement. This problem is important in the agricultural and forest industry because Russia has the greatest forest reserves and great amount of fertile land located mostly in south and central part.

Thesis statement of this work is that there is a connection between localization and economic indicators of the firm and the question is: how localization influences economic indicators of the firm?

To solve this problem, it will be necessary to collect data about the location of any firm in these industries in more than one period. After that analyzing this data researcher will measure localization by using two indexes.

There are two main indexes. The first is an Alison Glaser index that was used in many works for measuring localization. Second is Duranton Overman index that is more complex, but can give more reliable results with some data. Both indexes are popular in research works about localization.


In theoretical part of this work main aim is analysis of existing ways of assessing the localization of industries and examines the application packet for data. To achieve these 2 goals great amount of information was studied. All literature is divided into three big parts. The first part is articles that help to understand ways of using R project. R project is program for statistical and econometric analysis. This program was chosen because it is multifunctional and can work with a great amount of information much faster than other programs. The second part of literature is about geocoding. Geocoding is a very specific task for analytical programs. This literature help to understand what is geocoding and how to do this in a correct way. Third part of the literature review is about localization. This literature answer questions about measurement ways of localization, what idea of measurement is the most appropriate.


There is a great amount of different literature about the R project and for this work information about the type of object to work with and how to work with them was taken. Necessary objects were vectors, lists, functions and expressions. Also was studied information about data frames and classes of objects that is used in R and ideas about evaluation, function constructing and different specific econometric instruments. All this information was taken from R Language Definition(R Core Team 2014).

For writing this work is also needed information about graphics and also about different specific ways of analyzing large data represented as lists and it was taken from “R: A language for data analysis and graphics”(Ihaka and Gentleman 2009). “Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data” is another useful article. It helped to understand how to get the necessary data from the web. For this task author write specific script that automate the data mining process and this article contains helpful information about data mining and script writing in R.


To write this work it was necessary to understand what is geocoding and look at successful examples of work where it was used. The first article about geocoding is “Geocoding and Monitoring of US Socioeconomic Inequalities in Mortality and Cancer Incidence: Does the Choice of Area-based Measure and Geographic Level Matter?: The Public Health Disparities Geocoding Project”(Krieger 2002). First of all author defined what is Geocoding. “Geocoding (sometimes called forward geocoding) uses a description of a location, most typically a postal address or place name, to find the geographic coordinates of spatial reference data such as building polygons, land parcels, street addresses, postal codes (e.g. ZIP codes, CEDEX) and so on. Geocoding facilitates spatial analysis using Geographic Information Systems and Enterprise Location Intelligence systems” (Krieger 2002). In this article author analyzed US public health system and tried to find what measures should be used or at which geography level. Krieger tried to find a correlation between mortality rate or cancer incidents.

The second article was “Monitor social inequalities in low birth weight and Geocoding Project (US)”(Subramanian and Carson 2003) The study of this article was “To determine which area based socioeconomic measures can meaningfully be used, at which level of geography, to monitor socioeconomic inequalities in childhood health in the US” (Subramanian and Carson 2003). This article gave necessary information about cross-sectional analysis and geocoding in socioeconomic sphere. International Journal of Health Geographics written by Gaidet, Nicolas Iverson, Samuel a Takekawa, John Y Scottnewmanfaoorg, Scott H Newman. This journal is also connected with US health system but not directly. This work is about people coordinates that is determined by automated geocoding. “Automated geocoding is a method used to assign geographic coordinates to an individual based on their street address. This method often relies on street centerline files as a geographic reference (Gaidet et al. 2011). This method has one great problem that was called positional error in the geocoded point and authors of this article tried to find how to evaluate this error and how to fix it. To make this analysis researchers used R project and found a new method of geocoding using residential property parcel data. This article was also useful for this paper because of interesting geocoding idea and R geocoding analysis.


This is the largest and the most important part of review. First of all it was necessary to look at most common methods of localization measurement. Therefor was read one of the most famous article about localization “Geographic Concentration as a Dynamic Process”(Dumais, Ellison, and Glaeser 2002) Researchers worked with data from CBLRD (Census Bureau's Longitudinal Research Database). This data contains information about USA manufacturing industries and analyzing this data authors tried to understand the dynamic process of firms concentration. Glaeser and Ellison tried to find the reasons for localization or dispersion of firms in USA and conclude is this process stable or not. They conclude that industry agglomeration level remains fairly stable but there is great variation in location of these agglomerations. Authors also analyzed the born of new firms and the closure of old firms and tried to find reasons for this firm decision.

The first idea of Ellison-Glaeser index:

where: is the share of industry i's time t employment located in state s is a sum of squared deviations of the industry's state employment shares from a measure , , of the states' shares of employment in the average industry

They tried to find correlation between location and producers activities. The final point of the article was testing Marshall Hypothesis of industry agglomeration: 1) Agglomeration saves transport costs 2) it allow labor market pooling 3) it facilitates intellectual spillover. All 3 point was correct after empirical analysis but labor mix was the most important variable that motivates firms to agglomerate. In this article Authors use special index to measure localization (Ellison-Glaeser index).

Si is the share of industry й's employment in area i, Xi is the share of total employment in area i, and the {Zj} are the sizes of the plants j of industry й. This idea of localization measurmen is still used in a modern articles.

After understanding EG index idea it was important to look at different application of this index and for example to understand the influence of localization on labor force. “Localization of Knowledge and the Mobility of Engineers in Regional Networks”(Almeida and Kogut 1999) provides this information. It gives definition of localization sometimes it is called agglomeration - “the benefits that firms obtain by locating near each other ('agglomerating'). This concept relates to the idea of economies of scale and network effects. As more firms in related fields of business cluster together, their costs of production may decline significantly (firms have competing multiple suppliers; greater specialization and division of labor result). Even when competing firms in the same sector cluster, there may be advantages because the cluster attracts more suppliers and customers than a single firm could achieve alone.

Another important thing to understand is why localization can be good or bad for industry and “Economics of Agglomeration. Cities, Industrial Location and Regional Growth.”(Belleflamme, Picard, and Thisse 2000) answers this question . The main idea of article was that knowledge generated once, spills imperfectly among nations and firms. Researchers claim that since labor and institutions networks vary by region, there should be regional differentiations in the spillovers localization. Authors also analyze the relationship between the movement of primary patent holders and the location of technological knowledge through the investigation of patent citations of important semiconductor innovations. Authors found that localization of knowledge is specific only to certain regions (particularly Silicon Valley) and the localization degree varies across regions. By analyzing information on the movements of patent holders between firms, researchers empirically show that the mobility of engineers between firms effects the local transfer of knowledge. The movement of knowledge is included in regional labor chains. This article is one of the first article about localization after that the theory of localization measurement has advanced far ahead.

After look at old but still applied methods of localization analysis it is time to look at modern approach and ideas of localization analyze. Article “Testing for Localization Using Micro-Geographic Data”(Duranton and Overman 2005) is the first article where new approach of localization measurement was used. In this article researchers worked with UK data about industries from different spheres. They tried to measure industrial localization and analyzed industrial clusters using really new idea. The idea is that firms can not only be localized in a small distance or dispersed (that means that there is great distance between different businesses in one sphere) but also can be random distributed. Authors tested the hypothesis about non-random distribution on UK exhaustive data-set. The data was represented with codes for each type of industry in UK. This codes was taken from classification of economic activities it is called NACE. In Russia NACE is also used but codes in EU NACE is not completely the same. The result of this test was significant for localization theory.

This is Duraton-Overman criteria that measure the distance density between firms in one sector and using this density localization can be measured. Here, N is number of different firms, dj1,j2 is the distance between firms j1, and j2, h is a band with parameter and f is a Gaussian kernel. There was the second work “Exploring the Detailed Location Patterns of U.K. Manufacturing Industries Using Microgeographic Data” (Duranton and Overman 2008) that continue and In this article this authors worked with new UK data (2008 year) and they tried to prove or to refute the results of first work. They compare the localization of continuing establishment versus “entrants” and “exiters”, and foreign-own versus domestic firms in the same industry type. Researcher also studied colocolization (correlation) between vertically-linked branches. All of new hypothesis was proved besides difference between foreign-owned and domestic firms (there is no difference) and all previous hypothesis was also confirmed.

After analysis of new localization methods of measurement, it would be logical to look at how these methods can be combined. Article “An anatomy of the geographical concentration of Canadian manufacturing industries”(Behrens and Bougna 2015) demonstrate how DO and EG ideas can be combined for localization measurement. In this article authors analyzed Canadian industry localization used detailed micro-geographic (coordinates of each firm) and panel data for 10 years (from 2000 to 2010). They studied localization and conclude that from 40 to 60% of industries was localized depending on the year with a downward trend of localization in Canada. Researchers also tested the idea that fresh trees (new firms) can show another locational trend but this hypothesis was refuted. In this article authors used both Elsion-Glaeser index and Duranton-Overman approach to analyze industrial localization in Canada and this article is one of the most modern articles about localization structure of this article will be close enough to Behrens&Bougna researcher.


The first and may be most famous work about localization was written by Ellison-Glaeser. Information EG index calculation was presented in previous section and in this section only results of the works about localization will be presented. The main goal of his work was to find an idea of formal tests on localization measurement and after that using this test prove industrial localization. His test was compare localization with random firm distribution if test is higher than reference value than the firm distribution is not random and there is an industrial localization. This test was used for 459 different industries in USA: 446 of them where localized and only 13 were more evenly distributed than it would be with random distribution.

Among the most localized industries due to his work are:

1) tobacco products

2) textile mill products

3) leather and leather production.

Among the less localized industries are:

1) Rubber and misc, plastics

2) Fabricated metal products

3) Printing and publishing

4) Furniture and fixtures

If look at agricultural and forest industries in this work there is an average localization in this sectors (significant higher than random distribution but not very strong as in very high localized industries.

.Another work was written by Duranton and Overman. This research was based on UK data and it has another theoretical approach (another localization index was used) Main results of this work are:

1. 52% of NACE 4-digit shows significant localization at a 5% Significance level and 24% shows dispersion

2. In most industries localization take place at a distance less than 50 kilometers

3. There is a strong skewness in localization degree across industries

4. A 4-digit NACE industry level show tendency of localization at regional level as in comparison to 4-digit level that just localized without reference to the region

As for agricultural and forestry sector. They are the very dispersed (the most dispersed industries) as in comparison to machinery for textile and leather production that are the most localized industries.

At 2008 Duranton and Overman wrote their second work. In this work they looked at colocalization of industries pairs. Among interesting results of this colocalization analysis we can emphasize that forestry industry is colololized to Cutlery and Tools selling to Wood production.

Another interesting work was written by NAKAJIMA Kentaro, SAITO Yukiko, UESUGI Iichiro. They measured firm localization in Japan and this work was very similar to Duranton and Overman work but on data from Japan. They found that half of 561 four-digit industries are localized and the distance of localization is less than 40 km. Their results were very similar to results that was get from UK data. As return to sectors that are explored in our work forest industry and agriculture both are dispersed industries with low localization index.

One more interesting work about localization was written by STEFANIA VITALI, MAURO NAPOLETANO and GIORGIO FAGIOLO and they have done cross-country analysis. They look at industry localization in Belgium, Italy, Germany, France, UK, Spain. The main results of this work are:

1. In all countries Duranton-Overman criteria showed that 50% of sectors is localized except Belgium (share of localized sectors is less than half)

2. EG index for all countries except Italy showed stronger localization than DO index

3. They presented list of countries by level of industrial localization: Spain, France, Germany, Italy, UK, and Belgium. (from the most localized to the less localized country)

4. The most localized industries by both DO and EG index are: Silk-type weaving, manufacturing of jewelry, manufacturing of carpets and rugs, other textile weaving, Manufacturing of machinery for textile, Manufacturing of knitted and crocheted pullovers.

In this work there was no information about forest or agricultural sector in Europe.

The last work of this section is Bahrens and Bougna work about Canadian industry localization. They worked with 9 years period data and the main results of their work are:

1. Depending on the considered year nearly 40-60% of the industries are localized in Canada.

2. Localization in Canada has decreased from 2001 to 2009 year.

3. The most localized industries are textiles and to the extraction and processing of natural resources.

4. Despite a general decreasing trend of overall Canadian industrial concentration some localized industries has become even more localized from 2001 to 2009 year.

5. Agricultural and forest sectors are dispersed sector with low localization indexes.

All this information was used to achieve main aims of the theoretical part of this work: analysis existing ways of assessing the localization of industries and examination the application packet for data.


After analyzing theoretical information method of data analysis and localization measurement was chosen. This stage of the work can also be divided into 2 parts.

2.1 DATA

First of all data was collected from Ruslana. Ruslana is part of Bureau van Dijk database. Bureau van Dijk database contains information about companies all over the world and Ruslana provides information about Russian, Ukrainian and Kazakhstan companies. Information about companies was chosen according to the Russian NACE in agricultural and forest industry. There was information about 500 000 firms that worked for 5 years and for each Ruslana presents 463 unique variables.

All information about companies was firstly stored in text format. Users can only get no more than 2500 firms per 1 download. So, all data was stored in many .txt files in Unicode encoding: 23 .txt files for forestry industry and 185 for agricultural industry.

At first R reject to open it because of file format or encoding and I used two visual basic scripts to resave the files in the proper format (.csv). First script was written to collect all .txt files in one excel book. The aim of second script is to save each shit of excel book as .csv file with comma separator.

On the next stage of work with data all files was open in R and from each of them necessary variables was extracted. This variables was: INN (Taxpayer Identification Number), OKPO (National Classification of enterprises and organizations), postcode, city, address. For this task I have written simple script in R that is presented in applications. (script№3).

All data was first cleaned from unnecessary variables and only then all the cleaned files have been linked. Also was taken only active firm from data (variable status). This work order was chosen (first clean than merge) because when I tried to merge at first computer gave an error about operating memory shortage. After cleaning all files was merged as one big .csv and then saved. For this task I used simple R script that is presented in applications. (script №4)

Also I tried to make a summary of each variable of the data but necessary variable summary that gives useful information about firm for this research is only firms size and firms status (is it active or not). Information about this variables can be seen in the applications.

After that data was divided into small samples (2500 firms in each) because google can only work with 2500 units per day. For google geocoding I took script from my scientific director because scientific laboratory in which she operates has already written work connected with geocoding and localization analysis. This script is presented in applications (script №5).

Because of this severe restriction it was impossible for one person to geocode all the data using one compute or one ip address. Therefore I used private vpn client and also asked my colleagues from university to help me with geocoding using their computers.

Unfortunately, google do not give coordinates for each firm. There was nearly ј of firms without coordinates (only from firms that give information about their address) and for this firms I used yandex geocoding tools.. This script can be seen in applications (script №6).

After finishing all this work files after google geocode was collected in 1 file and using script №4 merged as one file. Than two big files: yandex geocoding file and google geocoding file were merged as one big file.

As a result of geocoding all firms get their unique geographic identifiers and after that this unique identifiers was added to all variables that were received from Ruslana. On the next stage was chosen variables that is necessary for localization analysis: four-digit NACE, latitude, longitude and number of employee during last year.


It is normal to start with visual analysis of our data to understand how most of the firms are allocate

First picture illustrate allocation of the firms in Forestry industry. There were a lot of firms from agricultural sector (462,732 geocoded firms) and I divided this 01 NACE into four groups:

1. Plant growing (011 NACE)

2. Stock raising (012 NACE)

3. Growing of crops combined with farming of animals (mixed farming) (013 NACE)

4. Provision of services to crop production, landscape gardening and livestock, except veterinary services (014 NACE)

5. Hunting and breeding of wild animals, including rendering of services in these areas (015 NACE)

As can be seen from the applied coordinates to the map: most of the firms are allocated in the southern European part of Russia. Pictures for all named industries are pretty similar and this shows that both industries need the same climatic conditions and may be the same labor resources type.

There are two main methods of localization analysis that was described in the literature review section. First method is EG (Ellison-Glaeser) index. This method is very popular but it has minuses such as:

1. It is sensitive to the geographical units choice

2. This index is not dependable from firm relative position that shows localization

3. Our data micro-geographical nature (each firm coordinates) and therefore this index do not use all benefits of collected data.

Second criteria is Duraton-Overman index

which measures the density of bilateral distances d between firms. (Here, N is the total number of firms, dj1,j2 is the distance between firms j1, and j2, h is a bandwidth parameter and f is a Gaussian kernel.) This index is more appropriate for our data because it uses all benefits from all firms' direct location. This is graphs of unweighted K-densities and d(distance) for agriculture and forestry industries.

This index work with kernel density of distance across all firms. Therefore in this work distance between each pair of firms was calculated and this distance distribution was compared to a random one. Also the second type of measurement was done with using the employee number of each firm. This gave weighted version of first idea results:

Where and is employment level of plants i and j. This index show distribution of pairwise distances between employees in each industry.

On the second step we just give each plant in our industry random coordinates and after that calculated the same indexes as on the first step. And this random coordinates and evaluation of indexes was repeated 1000 times. This yields a set of 1000 estimated values of the K-density at each distance d.

Third step was building a confidence band to understand is industry localized or dispersed. To understand this the results from second step should be compared to first step results. To construct this confidence interval we take for the upper one 95 percentile of generated k values and for the lower one 5 percentile. And now evaluated K-densities from the first step that falls between 95 and 5 percentiles of random generated values can be called random.

Fourth we define localized and dispersed industries by comparison values from the first step and values from the confidence band. If (95-th percentile and top border of the bound) for at least one d from 0 to 1000 kilometers when it never lies below (5-th percentile and bottom border of bound) than this industry is global localized at the 5% confidence level. If for at least one d from 0 to 1000 kilometers when it never lies higher than this industry is global dispersed at the 5% confidence level.

Also there can be defined global dispersion and localization indexes.

if .

Last point is strength of industry localization or dispersion and it can be measured as sum for localization and sum of of dispersion.


The three key results that answer the research question are supposed:

1. Both industries localization indexes is significantly different from random values on the 4-digit NACE level.

2. There is a significant localization in agricultural sector.

3. There is a significant dispersion in forestry sector.

Both results was for evaluating weighted and unweighted (employment) DO indexes.

Nearly 65% 4 digit industries in agricultural sector is localized and 35% of forestry sector at the 5% confidence level. 20% of agricultural sector firms and 49% of the forestry sector were found dispersed at the same confidence level.

Compare the result of this research we can say that in our country agricultural industry is not as strong dispersed as it is in Canada or UK (0.067) and forestry industry seems to be much stronger dispersed (0.037) in this countries in a four digit NACE level.

As agriculture as a result of our research agricultural sector in Russia seems do be localized. However in UK it is dispersed (0·049). The same is result of Bahrens and Bougna for the Canadian data.

This analysis was provided for the 3 years period and localization has only grew in agricultural sector and dispersion in forestry sector has decreased a little.

As for influence of employment weight on the result it does not change it significantly.

All the tables with evaluated coefficients are presented in the applications.


pattern location geocoding

In this research was used distance-based Duraton-Overman test to study detailed location patterns of the forestry and agricultural industry in Russia.This research used modern data (2013-2014) and is first research about localization in agricultural and forestry industry in Russia. The main result is indexes and localization map that shows place with greatest concentration and lowest concentration.

Nearly 65% 4 digit industries in agricultural sector is localized and 35% of forestry sector at the 5% confidence level. 20% of agricultural sector firms and 49% of the forestry sector were found dispersed at the same confidence level.

Forestry sector localization index conclusion seems pretty obvious because forestry sector is sector where raw materials is exempted from nature. There is no need of concentration because costs of more than one producer in one forest sector is more than profits. To understand it can be compared with agricultural sector where different farms buy seeds in bulk through cooperation with each other they save their money and have more profit. Also they can gather money and buy modern agricultural machines the same can be done in forestry sector. However in forestry sector cost will be bigger than this profits because firms fight for limited resource (forest) and in agricultural sector this resource is not so limited and it use can be more effective with the number of nearby firms.

Also it is hard to say is localization good or bad in agricultural sector. In Russia agricultural firms is concentrated mostly in south part of the country due to the weather conditions. To some extent, localization in Russia is result of country's geographic location and it is hard to say what is better firm's concentration or “dispersion”. To answer this question data from several periods is used. Indexes will be counted for each period and correlation between this indexes and firm's economic indicators is calculated. Most likely there is a positive impact of localization on firm's economic indicators in agricultural sector and negative impact in forestry sector.

However, it is acceptably that some unexpected results will appear. It may be connected with high of competition in agricultural sector that will overweight the positive localization effect. Also maybe research will give positive influence of localization in forestry sector, because in our country wood is not hard limited resource and example from 1-st paragraph about modern machines for several entrepreneurs may overweight the negative effect.

The result of this research can be used by government. It may help them to understand in which region it is better to develop agriculture and forestry industry based on the concentration of producers. Also it may be used by entrepreneurs from studied sectors to choose best location for their business. In a simple words they will just look on the research result and understand should they work near other firms or they better not open a new of production near with existing manufacturers.

However there is several ways how this research can be improved in future works. First way is to look in “marginal localization” (change in economic indicators when 1 new producer goes into existing production plant) and find the optimal localization. This result would help to distribute the production in country in the most effective way. Secondly it would be very useful to compare the result of this research and the result of research of other country and use data from both researches to make a comparison of different countries and make a conclusion about hypothesis about significant influence of geographical features of the country on the localization. The same research can be done in other industries and in other countries because in our days there is still a little amount of research on the localization.


1. Almeida, Paul, and Bruce Kogut. 1999. “Localization of Knowledge and the Mobility of Engineers in Regional Networks.” 45(7): 905-17.

2. Behrens, Kristian, and Thйophile Bougna. 2015. “An Anatomy of the Geographical Concentration of Canadian Manufacturing Industries.” Regional Science and Urban Economics 51: 47-69. (December 17, 2015).

3. Belleflamme, Paul, Pierre Picard, and Jacques-Franзois Thisse. 2000. “An Economic Theory of Regional Clusters.” Journal of Urban Economics 48(1): 158-84.

4. Dumais, Guy, Glenn Ellison, and Edward L. Glaeser. 2002. “Geographic Concentration as a Dynamic Process.” Review of Economics and Statistics 84(2): 193-204.

5. Duranton, Gilles, and Henry G. Overman. 2005. “Testing for Localization Using Micro-Geographic Data.” Review of Economic Studies 72(4): 1077-1106.

6. ------. 2008. “Exploring the Detailed Location Patterns of U.K. Manufacturing Industries Using Microgeographic Data*.” Journal of Regional Science 48(1): 213-43.

7. Gaidet, Nicolas, Samuel a Iverson, John Y Takekawa, and Scott H Newman Scottnewmanfaoorg. 2011. “International Journal of Health Geographics.” International journal of health geographics 12: 1-12.

8. Ihaka, Ross, and Robert Gentleman. 2009. “R: A Language for Data Analysis and Graphics.” Journal of Computational and Graphical Statistics 5(3): 299-314.

9. Krieger, N. 2002. “Geocoding and Monitoring of US Socioeconomic Inequalities in Mortality and Cancer Incidence: Does the Choice of Area-Based Measure and Geographic Level Matter?: The Public Health Disparities Geocoding Project.” American Journal of Epidemiology 156(5): 471-82.

10. R Core Team. 2014. “R Language Definition V. 3.1.1.” 3.1.1: 55.\n

11. Subramanian, S V, and R Carson. 2003. “Monitor Social Inequalities in Low Birth Weight and Geocoding Project ( US ).” J Epidemiol Community Health 57: 186-99.

12. Thompson, R. E., Larson, D. R., & Webb, W. W. (2002). Precise nanometer localization analysis for individual fluorescent probes. Biophysical journal, 82(5), 2775-2783.

13. Almeida, P. (1996). Knowledge sourcing by foreign multinationals: patent citation analysis in the US semiconductor industry. Strategic management journal,17(S2), 155-165.

14. Head, K., Ries, J., & Swenson, D. (1995). Agglomeration benefits and location choice: Evidence from Japanese manufacturing investments in the United States.Journal of international economics, 38(3), 223-247.

15. Shefer, D. (1973). LOCALIZATION ECONOMIES IN SMSA'S: A PRODUCTION FUNCTION ANALYSIS*. Journal of Regional Science, 13(1), 55-64.

16. Fritsch, M., Brixy, U., & Falck, O. (2006). The effect of industry, region, and time on new business survival-a multi-dimensional analysis. Review of industrial organization, 28(3), 285-306.

17. Haddag, B., Abed-Meraim, F., & Balan, T. (2009). Strain localization analysis using a large deformation anisotropic elastic-plastic model coupled with damage.International Journal of Plasticity, 25(10), 1970-1996.

18. Venables, A. J. (1996). Localization of industry and trade performance. Oxford Review of Economic Policy, 12(3), 52-60.

19. Park, S. B., & Mitra, S. (2008, June). IFRA: instruction footprint recording and analysis for post-silicon bug localization in processors. In Design Automation Conference, 2008. DAC 2008. 45th ACM/IEEE (pp. 373-378). IEEE.

20. Link, A. N., & Siegel, D. S. (2005). Generating science-based growth: an econometric analysis of the impact of organizational incentives on university-industry technology transfer. European Journal of Finance, 11(3), 169-181.

21. Algazi, V. R., Avendano, C., & Duda, R. O. (2001). Elevation localization and head-related transfer function analysis at low frequencies. The Journal of the Acoustical Society of America, 109(3), 1110-1122.

22. GE, Y., YAO, S. M., PU, Y. X., & JIA, L. (2005). Application of spatial autocorrelation for the spatial patterns of urbanization and localization economy [J]. Human Geography, 3, 21-25.

23. Jaffe, A. B., Trajtenberg, M., & Henderson, R. (1993). Geographic localization of knowledge spillovers as evidenced by patent citations. the Quarterly journal of Economics, 577-598.

24. Kohpaiboon, A. (2006). Foreign direct investment and technology spillover: A cross-industry analysis of Thai manufacturing. World Development, 34(3), 541-556.

25. Petruzzelli, A. M. (2011). The impact of technological relatedness, prior ties, and geographical distance on university-industry collaborations: A joint-patent analysis. Technovation, 31(7), 309-319.


List of variables from Ruslana

[1] "Mark"

[2] "Company.Name"

[3] "National.Identification.number..OKPO.NIN."

[4] "Tax.number..INN.Tax."

[5] ""

[6] ""

[7] ""

[8] ""

[9] ""

[10] ""

[11] ""

[12] ""

[13] ""

[14] ""

[15] ""

[16] ""

[17] ""

[18] ""

[19] ""

[20] ""

[21] ""

[22] ""

[23] ""

[24] ""

[25] ""

[26] ""

[27] ""

[28] ""

[29] ""

[30] ""

[31] ""

[32] ""

[33] ""

[34] ""

[35] ""

[36] ""

[37] ""

[38] ""

[39] ""

[40] ""

[41] ""

[42] ""

[43] ""

[44] ""

[45] "Profit.Margin...2013"

[46] "Profit.Margin...2012"

[47] "Profit.Margin...2011"

[48] "Profit.Margin...2010"

[49] "Profit.Margin...2009"

[50] "Profit.Margin...2008"

[51] "Profit.Margin...2007"

[52] "Profit.Margin...2006"

[53] "Profit.Margin...2005"

[54] "Profit.Margin...2004"

[55] "Number.of.Employees.2013"

[56] "Number.of.Employees.2012"

[57] "Number.of.Employees.2011"

[58] "Number.of.Employees.2010"

[59] "Number.of.Employees.2009"

[60] "Number.of.Employees.2008"

[61] "Number.of.Employees.2007"

[62] "Number.of.Employees.2006"

[63] "Number.of.Employees.2005"

[64] "Number.of.Employees.2004"

[65] ""

[66] ""

[67] ""

[68] ""

[69] ""

[70] ""

[71] ""

[72] ""

[73] ""

[74] ""

[75] ""

[76] ""

[77] ""

[78] ""

[79] ""

[80] ""

[81] ""

[82] ""

[83] ""

[84] ""

[85] ""

[86] ""

[87] ""

[88] ""

[89] ""

[90] ""

[91] ""

[92] ""

[93] ""

[94] ""

[95] ""

[96] ""

[97] ""

[98] ""

[99] ""

[100] ""

[101] ""

[102] ""

[103] ""

[104] ""

[105] ""

[106] ""

[107] ""

[108] ""

[109] ""

[110] ""

[111] ""

[112] ""

[113] ""

[114] ""

[115] ""

[116] ""

[117] ""

[118] ""

[119] ""

[120] ""

[121] ""

[122] ""

[123] ""

[124] ""

[125] ""

[126] ""

[127] ""

[128] ""

[129] ""

[130] ""

[131] ""

[132] ""

[133] ""

[134] ""

[135] ""

[136] ""

[137] ""

[138] ""

[139] ""

[140] ""

[141] ""

[142] ""

[143] ""

[144] ""

[145] ""

[146] ""

[147] ""

[148] ""

[149] ""

[150] ""

[151] ""

[152] ""

[153] ""

[154] ""

[155] ""

[156] ""

[157] ""

[158] ""

[159] ""

[160] ""

[161] ""

[162] ""

[163] ""

[164] ""

[165] ""

[166] ""

[167] ""

[168] ""

[169] ""

[170] ""

[171] ""

[172] ""

[173] ""

[174] ""

[175] ""

[176] ""

[177] ""

[178] ""

[179] ""

[180] ""

[181] ""

[182] ""

[183] ""

[184] ""

[185] ""

[186] ""

[187] ""

[188] ""

[189] ""

[190] ""

[191] ""

[192] ""

[193] ""

[194] ""

[195] ""

[196] ""

[197] ""

[198] ""

[199] ""

[200] ""

[201] ""

[202] ""

[203] ""

[204] ""

[205] ""

[206] ""

[207] ""

[208] ""

[209] ""

[210] ""

[211] ""

[212] ""

[213] ""

[214] ""

[215] ""

[216] ""

[217] ""

[218] ""

[219] ""

[220] ""

[221] ""

[222] ""

[223] ""

[224] ""

[225] ""

[226] ""

[227] ""

[228] ""

[229] ""

[230] ""

[231] ""

[232] ""

[233] ""

[234] ""

[235] ""

[236] ""

[237] ""

[238] ""

[239] ""

[240] ""

[241] ""

[242] ""

[243] ""

[244] ""

[245] ""

[246] ""

[247] ""

[248] ""

[249] ""

[250] ""

[251] ""

[252] ""

[253] ""

[254] ""

[255] ""

[256] ""

[257] ""

[258] ""

[259] ""

[260] ""

[261] ""

[262] ""

[263] ""

[264] ""

[265] ""

[266] ""

[267] ""

[268] ""

[269] ""

[270] ""

[271] ""

[272] ""

[273] ""

[274] ""

[275] ""

[276] ""

[277] ""

[278] ""

[279] ""

[280] ""

[281] ""

[282] ""

[283] ""

[284] ""

[285] ""

[286] ""

[287] ""

[288] ""

[289] ""

[290] ""

[291] ""

[292] ""

[293] ""

[294] ""

[295] ""

[296] ""

[297] ""

[298] ""

[299] ""

[300] ""

[301] ""

[302] ""

[303] ""

[304] ""

[305] ""

[306] ""

[307] ""

[308] ""

[309] ""

[310] ""

[311] ""

[312] ""

[313] ""

[314] ""

[315] ""

[316] ""

[317] ""

[318] ""

[319] ""

[320] ""

[321] ""

[322] ""

[323] ""

[324] ""

Подобные документы

  • Data mining, developmental history of data mining and knowledge discovery. Technological elements and methods of data mining. Steps in knowledge discovery. Change and deviation detection. Related disciplines, information retrieval and text extraction.

    доклад [25,3 K], добавлен 16.06.2012

  • A database is a store where information is kept in an organized way. Data structures consist of pointers, strings, arrays, stacks, static and dynamic data structures. A list is a set of data items stored in some order. Methods of construction of a trees.

    топик [19,0 K], добавлен 29.06.2009

  • Проблемы оценки клиентской базы. Big Data, направления использования. Организация корпоративного хранилища данных. ER-модель для сайта оценки книг на РСУБД DB2. Облачные технологии, поддерживающие рост рынка Big Data в информационных технологиях.

    презентация [3,9 M], добавлен 17.02.2016

  • Классификация задач DataMining. Создание отчетов и итогов. Возможности Data Miner в Statistica. Задача классификации, кластеризации и регрессии. Средства анализа Statistica Data Miner. Суть задачи поиск ассоциативных правил. Анализ предикторов выживания.

    курсовая работа [3,2 M], добавлен 19.05.2011

  • Описание функциональных возможностей технологии Data Mining как процессов обнаружения неизвестных данных. Изучение систем вывода ассоциативных правил и механизмов нейросетевых алгоритмов. Описание алгоритмов кластеризации и сфер применения Data Mining.

    контрольная работа [208,4 K], добавлен 14.06.2013

  • Совершенствование технологий записи и хранения данных. Специфика современных требований к переработке информационных данных. Концепция шаблонов, отражающих фрагменты многоаспектных взаимоотношений в данных в основе современной технологии Data Mining.

    контрольная работа [565,6 K], добавлен 02.09.2010

  • Основы для проведения кластеризации. Использование Data Mining как способа "обнаружения знаний в базах данных". Выбор алгоритмов кластеризации. Получение данных из хранилища базы данных дистанционного практикума. Кластеризация студентов и задач.

    курсовая работа [728,4 K], добавлен 10.07.2017

  • Общее понятие о системе Earth Resources Data Analysis System. Расчет матрицы преобразования космоснимка оврага. Инструменты геометрической коррекции, трансформирование. Создание векторных слоев. Оцифрованные классы объектов. Процесс подключения скрипта.

    курсовая работа [4,3 M], добавлен 17.12.2013

  • Technical methods of supporting. Analysis of airplane accidents. Growth in air traffic. Drop in aircraft accident rates. Causes of accidents. Dispatcher action scripts for emergency situations. Practical implementation of the interface training program.

    курсовая работа [334,7 K], добавлен 19.04.2016

  • Історія виникнення комерційних додатків для комп'ютеризації повсякденних ділових операцій. Загальні відомості про сховища даних, їх основні характеристики. Класифікація сховищ інформації, компоненти їх архітектури, технології та засоби використання.

    реферат [373,9 K], добавлен 10.09.2014

Работы в архивах красиво оформлены согласно требованиям ВУЗов и содержат рисунки, диаграммы, формулы и т.д.
PPT, PPTX и PDF-файлы представлены только в архивах.
Рекомендуем скачать работу.