# System of automatic collection of scientific articles

## The characteristics and features of the topic model in the form of stochastic matrices, their purpose and application. Creation and distinctive features of the new algorithm to build a Sub-hierarchy Galois, the specificity of the visualization tools.

Рубрика | Программирование, компьютеры и кибернетика |

Вид | курсовая работа |

Язык | английский |

Дата добавления | 21.06.2016 |

Размер файла | 2,5 M |

### Отправить свою хорошую работу в базу знаний просто. Используйте форму, расположенную ниже

Студенты, аспиранты, молодые ученые, использующие базу знаний в своей учебе и работе, будут вам очень благодарны.

Размещено на __http://www.allbest.ru/__

Contents

__Introduction____1. Sources overview____2. Theoretical basis____2.1. Algorithms____2.1.1.____PLUTON____2.1.2.____CERES____2.1.3.____ARES____3.____Research____3.1.____Analogs overview____3.2.____New algorithm____3.3.____Implementation details____3.3.1.____Architecture____3.3.2.____Interface and visualization____4.____Conclusion____Bibliography__

# Introduction

The topic of the master thesis is directly connected with first year master's course work. Previous year's topic was “Automatic system for collection of metadata of scientific articles from ACM Digital Library” and it was devoted to data gathering, building of topic models for collections of articles from recommender system related conferences and representation of topic models in a form of concept lattices. So, to make such studies more automated several program modules were created and some of external existing solutions were applied. Topic models in a form of stochastic matrices were converted to binary matrices and were used as formal contexts for concept lattices where documents represents object of formal context and topics are attributes. As a final result concept lattices were built and formal contexts were saved in .csv files. However, some of lattices were quite large and chaotic, so it was hard to interpret. So, as a continuation of this research, was decided to try out to interpret it in form of Galois Sub-hierarchy (GSH). But, already after short elementary analysis of that field it was found out that there are not so many algorithms for building of Galois Sub-hierarchy so far. Consequently, it pushed to creation of new algorithm for building of GSH, which maybe a good alternative and overcome some of existing algorithms by some aspects. Moreover, creation of visualization tool of built GSH which will apply all algorithms was marked as a task for this master thesis. A goal for this year is to research opportunities for usage of Galois Sub-hierarchies for compact topic modeling representation. Finally, the set of tasks for this year was the following:

*1. **Explore and analyze theoretical basis on mentioned theme;*

*2. **Make an attempt for efficient GSH building algorithm creation;*

*3. **Implement existing GSH building algorithms;*

*4. **Develop software for visualization of topic models by GSH;*

The structure of this work is the following: after this chapter, called “Introduction”, the main part follows. In Section 1 a review of related resources on domains of this research is given. Then, Section 2 provides the definitions of specific terms and some theoretical basis. Section 3 contains a more detailed description of the aims and tasks of this course project and all the details of research process and implementation. Finally, the chapter called “Conclusion” summarizes the main findings and announces results.

# 1. Sources overview

As far as during the previous year a lot of time has been spent on discovering of theoretical basis including FCA, this year it was necessary to get into Galois Sub-hierarchy topic. Actually, there is no so many articles were found devoted to GSH and its applications especially in Russian. There was no decent source of information found about GSH building algorithms in Russian. That fact again confirms topicality of this work. The main article that introduced the domain and main existing algorithm was [2]. In general it describes the results of experiment of calculating the performance of the different implementations of three different algorithms and developed in different languages. Those three algorithms are: PLUTON, CERES, ARES. Moreover, it provides some information about GSH theory in general and its connection to FCA.

Thus, as previous work showed the main drawback of concept lattices is that the number of concepts may be of much larger size than the relation (or even exponential in the size of the relation). That makes concept lattice hard interpretable as it was already mentioned. One of the options for dealing with this problem is to use a polynomial-size representation of the lattice which preserves the most pertinent information. One of the approaches, which have proved useful in practice, is to restrict the lattice to the concepts which introduce a new object or a new property. This idea is the basis for two very similar structures called the Galois Sub-hierarchy and the Attribute Object Concept poset (AOC-poset) [3].

As the size of the input may still be large, naturally it is important to have efficient Galois Sub-hierarchy-building algorithms to work with. There are several efficient Galois Sub-hierarchy-building algorithms, with very different algorithmic strategies, and with theoretical worst-time complexity analyses which are difficult to compare. stochastic algorithm matrix Galois

# 2. Theoretical basis

This section provides the main terms necessary to understanding of GSH algorithms.

In FCA, a formal context is a triple K = (G, M, I) where G and M are sets (objects and attributes respectively) and I is a binary relation, i.e., I ? G Ч M. Figure 1(left) shows context K = ({1, 2, 3, 4, 5, 6}, {a, b, c, d, e, f, g, h}, I)

a |
b |
c |
d |
e |
f |
g |
h |
||

1 |
X |
X |
X |
X |
|||||

2 |
X |
X |
X |
X |
X |
||||

3 |
X |
X |
X |
X |
X |
||||

4 |
X |
X |
|||||||

5 |
X |
X |
|||||||

6 |
X |
X |
|||||||

Figure 1. Binary relation of a context K

For a set A ? G of objects, we define A` := {m ? M|gIm for all g ? A} (the set of attributes common to the objects in A). Correspondingly, for a set B ? M, we define B` := {g ? G|gIm for all m ? B} (the set of objects which have all attributes in B). Then, a formal concept of the context (G, M, I) is a pair (A, B) with A ? G, B ? M, A` = B and B` = A. A is called the extent and B the intent of the concept (A, B).

Figure 2. Concept lattice of a context K

The concepts are called the object concepts of *o*, and the concepts are called the attribute concepts. The object concept which corresponds to object *o*, *г**o*, is the smallest concept with *o* in its extent, and dually, the attribute concept which corresponds to attribute *a*, *µ**a*, is the greatest concept with *a* in its intent. The Galois Sub-hierarchy is the sub-order of the lattice made out of the set and the restriction of the lattice order to that set [2]. For example, on Figure 2 concept lattice of context K is depicted. For comparison, Figure 3 shows the Galois Sub-hierarchy of the context introduced in Figure 1. It is noticeable that on GSH diagram empty concepts are omitted that already simplify the diagram even for quite small context. Moreover, there are also terms of simplified extent and intent and simplified concept form. In [2] was proposed to use the notation (4,e) instead of (14,de), and use the terms simplified intent (for (4)) and simplified extent (for (e)), as well as simplified concept (for (4, e)). In other words, they use an origin objects and attributes that “produce” the same concept as a simplified extent and intent correspondingly.

Figure 3. GSH of a context K

## 2.1. Algorithms

### 2.1.1. PLUTON

**So, the first algorithm PLUTON includes three sub-algorithms: TomThumb, ToLinext and ToGSH**. TomThumb is in charge for producing of an ordered list of the simplified extents and intents [3]. This ordered list maps to a linear extension of the Galois Sub-hierarchy. Algorithm ToLinext then searches the ordered list to merge pairs consisting of a simplified extent and a simplified intent pertaining to the same concept, in order to reconstruct the elements of the Galois Sub-hierarchy. Algorithm ToGSH is then used to compute the edges of the Hasse diagram (transitive reduction) of the Galois Sub-hierarchy.

Furthermore, down below you can see pseudocode of PLUTON on Figure 4.

Figure 4. PLUTON's pseudocode

However, as this algorithm consists of several sub-algorithms it is feasible to examine these parts on several snippets. On Figure 5 pseudocode of TomThumb is depicted. As you can see it returns ordered list of the simplified extents and intents but in reversed order. It is necessary for ToLinext input.

Figure 5. TomThumb's pseudocode

Figure 6. ToLinext's pseudocode

ToLinext is presented on Figure 6. It is necessary to clarify that method mergePossible() in line 15 checks whether *item* and *item2* is a pair of object and attribute and if intersection of *(item, item2)* *= 1*. Then if *item*'s set of attributes <= than attributes set of every object from *item2*'s object set or the same but vice versa with attributes instead of objects in case if *item* is an attribute.

Figure 7. ToGSH's pseudocode

Method *getParentType(c, c2)* in ToGSH sub-algorithm defines if *c2* concept linked with *c* and at the same time is a parent concept for *c*. If extent of *c* is a subset of *c2*'s extent then parent type of *c2* for *c* is *extent*. If first condition does not perform then the same check proceed with intents of these two concepts. In case if both conditions does not succeed, method returns null that means that there is no connection between these concepts at all.

Actually, [4] says that complexity of all mentioned algorithms is *O(n*^{3}*)* where *n* stands for the number of objects and attributes in the input relation. However, if we'll take a look on other researches that estimate complexity partially the final result differs. Overall complexity of PLUTON was estimated step by step. Tom Thumb's time complexity is analyzed as in *O(|**I**|)*. A brute force implementation of ToLinext has a complexity in *O((|O| + |A|)*^{3}* )*. L. Fura in [7] evaluates the complexity of ToGSH as *O((|O| + |A|)*^{2}* Ч max(|O|, |A|)*^{2}* )*

### 2.1.2. CERES

**Second GSH building algorithm that based on the **size of columns. The basic idea is to look on the size of columns in decreasing order by number of crosses. That is done to generate the concepts of *C*^{A }by decreasing extent side. The strategy is then to compute *C*^{A} by groups of concepts which have the same extent and adding concepts of *C*^{O} \ *C*^{A} to the GSH under construction, when their intent is covered by the intents of the *C*^{A} concepts previously computed. Extent inclusion is used to compute edges during a top-down traversal of the current Hasse diagram. Simplified extents and intents of *C*^{O} concepts are computed by propagation after edge construction [2].

Down below the pseudocode of CERES is shown. As it has been said there is no much information and implementation about GSH algorithms, so this pseudocode was written during that research.

Figure 8. CERES's pseudocode

Time complexity of this algorithm is in *O** **(|O| Ч (|O| + |A|)*^{2}*)** *was defined in the work of H. Leblanc - [6].

### 2.1.3. ARES

**Last but not least is an incremental kind of algorithm - ARES. Idea in the background is to** add a new formal object *o* with its own attributes *Ao = o'* and then inspect concepts pairwise. Let *C* be the current (explored) concept. One after another pairing up with all the rest concepts in GSH and checking it for several scenarios. First one is when both concepts are equal by intent (*C*'s intent is exactly *o'*). Another variant is that one concept includes other that means that it is more common (*C*'s intent is strictly included in *o'*). Next are the opposite situation (*C*'s intent includes *o'*) and the last one when concepts don't have a full inclusion. Consequently, after every comparison Hasse diagram for the Galois Sub-hierarchy either remains unchanged or modifies it and at the same time removes transitivity edges as necessary. Meanwhile, when simplified intents are modified, the algorithm checks if for a concept both extent and intent are empty, and if so, the concept is removed. [2]

For a better explanation, ARES's pseudocode has been divided into several parts. According to [5], A is called a new class, in contrast to [2] where they call it a new object with its set of attribute which is above was called *(o, o'). *Thus, further in pseudocode the new concept will be defined as A.

· LookFor&BindSuperClasses recognizes A's superclasses and binds A to its immediate superclasses.

· BindSubClasses binds A to its immediate subclasses.

Figure 9. ARES's pseudocode Part 1.

· SH, holds already visited H_{i} classes which are A's superclasses, as well as the factorizing classes (obviously A's superclasses) built up in the previous steps. SH's initial value is empty.

· EmptyClasses, the set of the non-meaningful classes which set of declared properties has been cleared out. EmptyClasses' initial value is empty.

The algorithm LookFor&BindSuperClasses visits H_{i} going down from ?, following a linear extension of > H_{i}. By ?, [5] means a root concept that is located on top of concept lattice and looks like {1, 2, 3, 4, 5, 6} {} for context K, for example. However ? it will be not included in final GSH as empty concept. During this visit, A's property set is compared with the set of properties of the visited class that above was mentioned just as current concept *C*. The whole exploration builds SH, the set of A's superclasses in *gsh*. Then A is |if needed| created and bound to its direct superclasses (Create&BindSH). DeleteEmptyClasses ends and deletes each class of the set EmptyClasses, since these classes are non-meaningful, and at this point do not declare any more properties [5].

Figure 10. ARES's pseudocode LookFor&BindSuperClasses.

Figure 11. ARES's pseudocode Visit(C. A).

At the moment, when a class C is visited, all its superclasses have already been visited. *V**isit* deals with the easy cases when either C is A or C is a superclass of A, and calls Extract whenever a factorizing class is needed.

Figure 12. ARES's pseudocode Extract(C).

The Extract algorithm creates a factorizing class C' and inserts C' into the hierarchy. This factorizing class either is A or is stored inside SH. Given a class C, we consider the set S of classes which are C's superclasses while belonging to SH, and we call Sups(C, SH) the minimal elements of S [5].

Figure 13. ARES's pseudocode Create&BindSH.

At the moment, when Create&BindSH is called from LookFor&BindSuperClasses, SH variable stores all A's superclasses in the current graph. If A has not already been found, it must be created and bound together with its immediate superclasses. Function Min(E) returns the minimal classes of set E.

ARES ends with BindSubClasses that binds A to its immediate subclasses. A class T is a subclass of A when Properties(A) is included in Properties(T ). T is an immediate subclass if none of T 's superclasses is itself a subclass of A.

The time complexity of ARES is in *O** **(|O| Ч |** **(w Ч a + m))*, where *w* is the width of the Galois Sub-hierarchy (i.e. the maximum number of pairwise non-comparable elements), *a* is the maximum size for an intent and *m* the number of edges of the Hasse diagram [5].

# 3. Research

## 3.1. Analogs overview

Besides review of the literature according to the domain of this research there also was some investigation on applications that build GSH, and visualize it. However, there were no a lot of alternatives and analogs of this project. The first, more or less similar development was AOCPosetBuilder. In contrast to this project some disadvantages were discovered:

· No visualization. Only export to .dot files. Moreover, with increase of context and GSH respectively, visualization in .dot file format seems to be ineffective (see Figure 14). It does not fit on the screen so it is necessary to scroll it, and edges between nodes are quite messy. Another option is XML file that is not much more comfortable to use.

Figure 14. Dot file visualization example

· Poor documentation. For instance, that project offers different formats of input for context such as XML, CSV, SLF and so on, but there are no requirements for that formats. How should content looks like. It was necessary to make a few attempts to upload CSV file. Nevertheless, big variety of input formats can be marked also as an advantage.

Another project that could be qualified as an analog is Galicia project. It has a lot of graphical features. However, a lot of advantages on visualization side such as 3D, 2D diagram, reduced view, zoom, rotation of 3D model and so on are compensated by lack of algorithmic component. It uses only CERES algorithm for GSH building and generally developers claims that Galicia focuses more on concept lattices than GSH. Moreover, it does not support CSV as an input format.

Figure 15. Galicia visualization example

Consequently, one of the tasks of master thesis project is to develop an application that will support CSV format (requirement as a consequence of course project), more GSH building algorithms and will provide a proper visualization for GSH.

## 3.1 New algorithm

The initial idea of this research was based on the fact that there were no so many experiments and researches made on this field. Old articles did not cover all the algorithms because they have not been invented yet. Moreover, more important influence made an absence of object concepts in the Galois Sub-hierarch, for instance one of the first explorations made by Godin [8] builds only the attribute elements of the Galois Sub-hierarchy.

Thus, it was decided to make an algorithm that will cover this breach and will include both attributes and objects concepts in the GSH. At the same time it should be more or less effective and be able to beat existing algorithms on some points.

So, as an input for our algorithm we have formal contexts stored in .csv files that was a result of previous year work. As a first step it is necessary to find object concepts and attribute concepts as well. Afterwards, all repetitive concepts that have equal intent and extent are excluded so only single instance of every concept stays in collection. Next it is necessary to define inclusions and relationships between concepts that will help to identify the structure of GSH and edges.

There is only short description of an algorithm's idea that was provided above. Furthermore some of the implementation details will be provided. Approximate scheme of the algorithm is shown on the Figure 16.

Looking ahead, should be stressed that as programming language to implement algorithms and build visualization software the Java language was selected. There are some of the reasons in favor of Java:

1. High performance of applications written in Java;

2. Cross-platform;

3. A large number of libraries;

4. Reliability.

Figure 16.Scheme of the algorithm

Obviously, to have a profit in performance it was decided to use a multithreading approach. So, collection of the concepts on documents and topics was decided to make in parallel threads. The process of concepts extraction from the formal context is the following:

1) In cycle taking every row of formal again and save indexes of elements of this row that equal to 1.

2) Then in nested loop over context again row by row looks for the documents that have ones on the same indexes. Afterwards, these elements remembered as a part of extent for the current document of outside loop. Therefore after end of the cycle we have object concept for every document. It is necessary to make a stress that special class Concept was developed as a convenient storage. It have a two List objects from standard Java library where one stores indexes of documents from the intent of the concept and another have indexes of topics from the extent. Moreover, to simplify detection of equality between two concepts the order of traverse was implemented in the way so indexes will be put in the increasing order. Thus, there will be no ({1,2,3},{B}) and ({2,1,3}, {B}) concepts.

3) The same thing made similarly for topic but both cycles traversed by the columns of formal context this time.

4) In first versions the filter for equal concepts detection was implemented in inefficient and undigested way. So in lots of cycles collection with document concepts was compared pairwise with all other concepts, then the same thing with topic concepts and so on. But, then it was simplified with HashTable standard class. After some deep research of the possibilities for this class it was found that it cannot use arrays for the key but collections. So our List<Integer> objects with extents from the Concept class that was already mentioned were used as key in pair key-value. Obviously, List<Integer> object with intent stays for the value. Then, all concepts were put in the HashTable one after another, so when current concept had equal key it was just ignored.

Complexity of described part of the algorithm is measured as *O(|G| **Ч [|M| + |G| Ч |M|])* for object concepts that in the end transforms to *O(|G|*^{2}* Ч** |M|)* and similarly for attribute concepts *O(|M**| Ч [|**G| + |M**| Ч |**G|]) =>* O(|M|^{2} *Ч** *|G|). That's in total gives us *O(|G|*^{2}* Ч |M|** + *(|M|^{2} *Ч** *|G|*) => O(|M**| Ч |**G| **Ч** **[|G| + |M|])*, where |G| and |M| are set of objects and attributes respectively*.*

As a result we have all concepts for the current formal context but we don't know relationships between them. For the next step, it was suggested to adapt cover graph algorithm to set inclusions and define edges of graph which was proposed in [9] by Obiedkov S. for building of graph diagram of concept lattice. However, later on it was decided that this algorithm quite complicated in our case. Moreover, after some experiments it was noticed that concepts with bigger extent's size are usually located on the top of GSH, in another words they were above concepts with smaller extents. So, it was decided to sort collection of concepts by decreasing size of extent and then compare every concept with others by their extents pairwise. As a consequence, of different attempts to use that knowledge, some patterns were observed. Thus, algorithm for graph making was made and its pseudocode is presented on a snippet below (Figure 17). It was decided to call newly invented algorithm - Ra, as abbreviation from Rational algorithm, and it is also Egyptian god that will support the notion how all previous algorithms were named, for instance Ares is a name of ancient Greek god of war.

Probably, should be stressed the order of how collection with concepts is traversed. Firstly, RA was implemented with unsophisticated approach where concept after concept was taken and compared with all the rest concepts. Thus, that led to increase of its complexity because of lots of redundant checks that were made when concepts with the same extent size was compared with each other with no need because they obviously cannot form an edge. Later on that evolved to the comparison of concept only with concepts that have bigger extent's size. Indices of concepts list when extent's size of the concept changes is stored in an array. Thus, when concept list is traversed concepts from one extent's size group are compared with previous size groups. Demonstrative example of this principle is depicted on Figure 17.

Figure 17. Example of comparison order.

Moreover, groups of the previous extent's sizes are traversed in order of increase to avoid creation of transitivity edge. Let's show this case on the example from context K (Figure 1). Suppose that all concepts are known and already represented in a decreasing order:

1. ({2,3,6}, {a,h}) - simplified form (6, ah)

2. ({1,2,5}, {c}) - simplified form (, c)

3. ({1,2,3}, {b}) - simplified form (, b)

4. ({1,4,5}, {d}) - simplified form (, d)

5. ({1,5}, {c,d}) - simplified form (5, )

6. ({1,4}, {d,e}) - simplified form (4, e)

7. ({2,3}, {a,b,g,h}) - simplified form (, g)

8. ({3}, {a,b,f,g,h}) - simplified form (3, f)

9. ({2}, {a,b,c,g,h}) - simplified form (2, )

10. ({1}, {b,c,d,e}) - simplified form (1, )

For example, on the current step concept number 8 is compared with others. If the order of comparison was a simple up to bottom it was firstly compared with concept number one and that edged would have been created. Because, method isEdgeTransitive(see Figure 18) checks whether there is an edge among previous edges with child concept = current explored child concept (3,f), that have current potential parent concept (6,ah) as a child concept. In case if potential parent concepts are traversed in order of decreasing extent's size, concept (3,f) will be firstly compared with concept number 7, and that will be a correct edge, and later comparison with concept number 1 will deny edge creation because it would see among previous edges, edge ((, g) - (3,f)) and edge ((, g) - (6,ah)).

Obviously, method isAbove (see Figure 18) compare concepts by extent size. If CHILD's extent is a subset of PARENT's extent it is possible to make an edge between them if it would not be transitive.

Figure 18. RA graph building part pseudocode

Complexity of RAGraphBuilding was computed as *O(|C|**Ч** log|C|)* where *C* is the number of concepts. In addition, time complexity of isEdgeTransitive was estimated to be *O(|C*^{2}*|)* as in the worst case amount of edges is *C*^{2}

## 3.3 Implementation details

### 3.1.1. Architecture

Since, the project turned out into tons of code, packages and classes, class diagram can't fit on one interpretable screen. Nevertheless, the most significant snippet is depicted on Figure 19.

Figure 19. Class diagram's snippet

However, main conception of project's architecture can be examined on screenshot down below.

Figure 20. Class diagram's snippet

Data representation module is responsible for conversion of csv file's data into application objects and for export to dot files. Class Context is implemented to store csv file's content in a form appropriate for the subsequent processing. Moreover, there also core classes for Concept and Edge representation, which are finally used to establish the GSH object in form of List of Concepts and List of Edges.

For representation of Concept in a form of intent and extent, plus also simplified extent and intent was used a standard java collections ArrayList of Integer type, where each element is number of either object or attribute. Quite obvious, that class Edge is represented by two concept: child and parent.

Algorithmic module includes abstract class AbstractAlgorithm and 4 classes that extend it, Pluton, Ares, Ceres and Ra. During invocation of any of the algorithms, class GSHBuilder that takes an object of algorithm's type in constructor is responsible for making of GSH object. Then it is passed to gui module's class of JApplet that uses mxGraph library for visualization of ready GSH. Information on visualization provided in details in the next section.

### 3.1.2. Interface and visualization

For implementation of application's GUI was used standard Java Swing library. Whole interface is implemented in galois.gui package. Class GSHBStart is responsible for main application's window and extends a JFrame class. Appearance of main application is depicted on Figure 21. Classes ContextTable, ContextTableController and ContextTableModel were implemented to process context representation, reading of context from files and storage, where ContextTable extends standard JTable from Java Swing, and ContextTableModel extends DefaultTableModel.

Figure 21. Main application window

Afterwards, it was necessary to implement GSH visualization, so choice was made in favor of JGraph and JGrapht stack, where

JGraphT:

· is focused on data structures and algorithms.

JGraph:

· is focused on rendering and GUI-based editing.

The two libraries are complementary and can be used together via the JGraphModelAdapter provided by JGraphT; this adapter allows a JGraphT graph data structure to be used as the model being viewed and controlled via JGraph. These two libraries fit perfectly to needs of that project.

Thus, JGraphApplet class that extends JApplet standard class became a container for JGraphT's graph visualized by JGraph. However, during application development it was noticed that JGraph was update to mxGraph. Moreover, that update brought possibility to use mxGraph java library as an independent tool for graph modeling as well as for visualization. On Figure 22 a visualized GSH example is shown. That graph supports zooming, and nodes replacements, any other actions are prohibited to prevent initial GSH changes.

Figure 22. GSH visualization example

Besides main features, it was decided to implement generation of random context by the number of objects and attributes and amount of relations. Just to simplify different experiments. For the same purpose was developed the possibility to create an empty context and opportunity to change initial context just by clicking on cells. Finally, needs to be mentioned that feature of export to .dot file was implemented as well.

# 4. Conclusion

Classification of the massive documents collection is a palpitating problem of Data mining and Information Retrieval domain. This research was dedicated to the classification of RecSys conferences devoted to the recommender system from ACM Digital Library using different approaches. ACM Digital Library is a rather popular digital library that attracts many students and researchers for a big variety of resources and articles. In previous year's course work, as a final result concept lattices were built and formal contexts were saved in .csv files. However, some of lattices were quite large and chaotic, so it was hard to interpret.

So, this year it was decided to try Galois Sub-hierarchy approach that should have decreased the number of concepts, by omitting empty concepts and make topic model more interpretable. However, after exploration of GSH-building algorithms and domain as a whole, it was noticed that besides lack of information, analogs there are not too much algorithms and most of them are slightly outdated. That encouraged to creation of new algorithm. More precisely tasks of this research were the following:

1. Explore and analyze theoretical basis on mentioned theme;

2. Make an attempt for efficient GSH building algorithm creation;

3. Implement existing GSH building algorithms;

4. Develop software for visualization of topic models by GSH;

For the time being all objectives were completed. 4 algorithms including new algorithm that was called Ra were implemented and integrated in the Java application for visualization. According to established goal developed algorithm showed a better time on several examples of input data. Moreover, developed visualization seems to be more comfortable than dot file format that most of analogs use.

Figure 23. Experiments

Those experiments that were managed to be done already showed that Ra overcomes old algorithms due to more effective and modern implementation approach. Nevertheless, research and improvement of new algorithm and software will not stop, because it is necessary to perform more experiments to evaluate the value of new algorithm more thoroughly.

# Bibliography

1) Blei D. M. Probabilistic Topic Models, Communications of the acm, vol. 55, no. 4, pp. 77-84, 2014.

2) A. Berry, A. Sigayret. Performances of Galois Sub-hierarchy-building Algorithms. Formal Concept Analysis, volume 4390 of the series Lecture Notes in Computer Science, pp 166-180, 2007.

3) A. Berry, M. Huchard, M. Mcconnell R., A. Sigayret, J. Spinrad. Efficiently Computing a Linear Extension of the Sub-Hierarchy of a Concept Lattice. B. Ganter;R. Godin. ICFCA'05: International Conference on Formal Concept Analysis, Feb 2005, Lens (France), Springer-Verlag, pp.208-222, 2005, Lecture Notes in Computer Science.

4) A.Berry, M. Huchard, A. Napoli, A. Sigayret. Hermes: an efficient algorithm for building Galois sub-hierarchies. Laszlo Szathmary, Uta Priss. CLA'2012: 9th International Conference on Concept Lattices and Applications, Oct 2012, Fuengirola (Mґalaga), Spain.Universidad de Malaga, pp.21-32, 2012.

5) H. Dicky, C. Dony, M. Huchard, and T. Libourel. ARES, un algorithme d'ajout avec restructuration dans les hiґerarchies de classes. Proc. of LMO'94, L'Objet, pages 125-136, 1994.

6) H. Leblanc. Sous-hiґerarchies de Galois: un mod`ele pour la construction et l'ґevolution des hiґerarchies d'objets. PhD thesis, Univ. de Montpellier II, 2000.

7) L. Fura, G. Laplace, A. Le Provost, and G. Perrot. Algorithme de construction d'une sous-hiґerarchie de Galois. Technical report, Universitґe de Montpellier II, 2005.

8) R. Godin and T.-T. Chau. Comparaison d'algorithmes de construction de hiґerarchies de classes. L'Objet, 5(3/4), 1999.]

9) S.A. Obiedkov, Algorithms and Methods of Lattice Theory and Their Application in Machine Learning. PhD Thesis, 2003

Размещено на Allbest.ru

### Подобные документы

Basic assumptions and some facts. Algorithm for automatic recognition of verbal and nominal word groups. Lists of markers used by Algorithm No 1. Text sample processed by the algorithm. Examples of hand checking of the performance of the algorithm.

курсовая работа [22,8 K], добавлен 13.01.2010Lists used by Algorithm No 2. Some examples of the performance of Algorithm No 2. Invention of the program of reading, development of efficient algorithm of the program. Application of the programs to any English texts. The actual users of the algorithm.

курсовая работа [19,3 K], добавлен 13.01.2010Базовые разделы BIOS и основные доступные возможности для его настройки: Standard CMOS Features, Advan-ced BIOS Features, Chipset features setup и Integrated Peripherals. Настройки, определяющие быстродействие компьютера, режимы работы его компонентов.

статья [17,4 K], добавлен 03.04.2010IS management standards development. The national peculiarities of the IS management standards. The most integrated existent IS management solution. General description of the ISS model. Application of semi-Markov processes in ISS state description.

дипломная работа [2,2 M], добавлен 28.10.2011Theoretical aspects of the application digital education resources in teaching computer science according to the capabilities of electronic programs. Capabilities of tools Microsoft Office and Macromedia Flash. Application of the program Microsoft Excel.

контрольная работа [1,5 M], добавлен 07.07.2013Процессоры Duron на ядре Spitfire (Model 3), Morgan (Model 7), Applebred (Model 8), Mobile Duron Camaro. Схема материнской платы EP-8KHAL+. Микросхема "Северный мост". Звуковой чип ALC201A. Конфигурация системной памяти. Регулятор заглушки шины RT9173.

курсовая работа [3,6 M], добавлен 26.03.2013Program automatic system on visual basic for graiting 3D-Graphics. Text of source code for program functions. Setting the angle and draw the rotation. There are functions for choose the color, finds the normal of each plane, draw lines and other.

лабораторная работа [352,4 K], добавлен 05.07.2009GPSS (General Purpose System Simulation) как язык для имитационного моделирования, его принципы и используемые методы, инструменты и средства. Метод построения модели с помощью GPSS, порядок составления блок-схемы данного процесса. Листинг модели.

курсовая работа [32,1 K], добавлен 20.12.2013Review of development of cloud computing. Service models of cloud computing. Deployment models of cloud computing. Technology of virtualization. Algorithm of "Cloudy". Safety and labor protection. Justification of the cost-effectiveness of the project.

дипломная работа [2,3 M], добавлен 13.05.2015Понятие стратегического планирования, разработка схем программных блоков и основной программы. Структурная схема имитационной модели, создание модели на языке моделирования General Purpose Simulation System. Математическое описание моделируемой системы.

дипломная работа [2,6 M], добавлен 12.08.2017