Recommendation System for Travellers Based on TripAdvisor.com Data. Bachelor’s thesis

Recommender system approaches: core algorithms and respective applications. Overview of popular travel recommender system approaches. Major recommender system issues and their common solutions. Matrix factorisation and к-nearest-neighbours models.

Рубрика Менеджмент и трудовые отношения
Вид статья
Язык английский
Дата добавления 25.08.2020
Размер файла 1,1 M

Отправить свою хорошую работу в базу знаний просто. Используйте форму, расположенную ниже

Студенты, аспиранты, молодые ученые, использующие базу знаний в своей учебе и работе, будут вам очень благодарны.

To begin with, the FunkSVD method is described to function on the basis of the following bullet list of setting modes and model features:

a) Accounts for the independent user and item biases;

b) Employs the primary mode of the L1 regularisation method;

c) Achieves the optimised minimisation of the predictions' squared error with the help of the SGD learning algorithm;

d) Predicts the deviation from the mean rating value and then further exploits that same mean rating value in the prediction of the user's actual rating;

e) Sets a rating value to the training dataset's global mean rating in the cases when a given user or item from the test set is not present in the training set;

f) Updates both user and item matrix values simultaneously (Koren et al., 2009).

Secondly, the model of SVD++ is a direct extension of the FunkSVD algorithm, for which reason it enumerates the same points that have just been listed above with one key addition to the calculation of predicted ratings. Specifically, the implicit ratings of users are accounted for as well, by defining the users' factor vectors through the additional term, which assumes the value of 1 if a given user has rated a certain item regardless of the actual rating value and, on the opposite, turns to a value of 0 if a given user has not rated a certain item (Koren, 2008).

Thirdly, the model of NMF, which is an alternative to the same FunkSVD algorithm, also employs the optimisation procedure of the SGD learning method to minimise the regularised squared prediction error. However, the integral difference of the NMF model is that it restricts the factorised matrices of users and items, which form the basis for rating predictions, to only include the non-negative values, filtering out the negative ones altogether (Luo et al., 2014).

And lastly, the two k nearest neighbours' techniques, namely, the item-based and regular as well as weighted user-based ones, were trained on the same data based on the measure of cosine distance, rather than on its comparable alternative of Euclidian distance measure, as its use would be more favourable in the present case of the highly sparse and multi-dimensional configuration of the sample dataset (Cacheda et al., 2011). Starting with the regular user-based k nearest neighbours, since it is impossible to test the algorithm by asking users to rate new recommended items and calculate the precision and recall metrics, only users with 4 and more ratings were left in the data set, which amounted to a total of 2256 users. Then, for each user a pseudo-random sample of 3 out of all the rated items was created, while the other ratings were set to 0. Similar users were calculated based on the three remaining ratings. Subsequently, the predicted ratings that were initially set to 0 were calculated as an average of the ratings on a specific attraction which was posted by the 10 most similar users. If not even a single one of the most similar users have rated that particular attraction, the predicted rating was calculated as an average rating across the most similar users. The weighted user-based nearest neighbour differs from the regular one in the fact that each rating posted by a similar user was multiplied by the inverse distance between them and the target user and, then, in order to get the average target rating, the sum of weighted ratings was divided by the total weight.

Now speaking in short on the execution of the fifth and last machine learning algorithm of the item-based k nearest neighbour that was tested within this thesis, it used a transposed rating matrix where rows represent items and columns represent users. For each user only their rated items were selected, their ratings set to 0 and similar items found. From the rating vectors of the similar items, the selected user's ratings were extracted and the average of them used as a predicted rating. Had the user not rated any of the similar items, their rating of the item in question was predicted as its average rating with the user's input still set to 0.

Finally, as the second research hypothesis is concerned with showcasing the tourism domain-related benefits of the latent factor-based semantic analysis technique of text classification known as the Latent Dirichlet Allocation (LDA), the methodology of the Python's statistical natural language processing tools, as provided by the two Python libraries of the Gensim as well as of the Natural Language Toolkit, was adapted for the goals of the current thesis. Speaking in detail, the LDA approach is divided into the two consecutive stages of: firstly, pre-processing of the textual review data and, secondly, the actual algorithm execution (Jelodar et al., 2019).

To start with the pre-processing stage, first and foremost, the reviews on each unique tourist sight were concatenated into one single text, also formally called a document, amounting to a total figure of 975 texts - the same as the total number of unique London's attractions within the data sample. As the next data cleaning stage a series of actions was performed on that semantic data, namely: all of the stop words (or terms) were removed; also, every single word was stemmed to its basic form by removing all the prefixes and suffixes of words; as well as the stems that appeared less than 10 times within the whole single text were removed. Lastly, every document was represented as a so-called Bag of Words (BoW), which is the name for the representation of singular stemmed words as tuples of the word ids and the number of times a specific word was encountered within a specific document (Guo et al., 2016).

The core stage of the execution of the LDA algorithm unravelled in the following series of consecutive steps (Blei et al., 2003):

a) Every term of every document is randomly attributed to one of the k number of topics.

b) The algorithm assumes that the topics are attributed inappropriately for every currently performed-on document, while the topic assignment for rest of the document base is correct.

c) The fraction of terms in a document that have been collected under a certain topic is computed.

d) The fraction of a specific words' assignments for a specific topic is computed for the whole of the document base.

e) The two proportions, specified at the third and fourth stages, are multiplied and their product is assumed to denote the probabilities of certain terms being assigned to certain topics.

f) This 5-stage process is cycled through again and again until the point when a repeated steady pattern of topic assignments is established.

To find the optimal number of topics to train the model on, coherence score was used as it had been proved to have the highest correlation with coherence ratings given by humans (Rцder et al., 2015).

Ultimately, the goal is to propose the LDA topic modelling algorithm as the supplementary latent factor-based method that solves the cold start problem as well as enables the diversity of recommendations. To this end, the number of reviews are normalised and then multiplied by ratings to produce popularity scores the following way:

Then, in order to compile a non-trivial recommendation list out of the least reviewed of the London's sights, and yet the ones with the highest ratings among them, the normalised values are inversed:

Thus, when a user picks their preferred categories or, in other words, the newly discovered topics, the recommendations will be sorted by descending popularity (or inversed popularity if they choose to see the least known yet highly rated of the London's sights). Furthermore, if a user selects not one but several categories, in order to first present them with the attractions where all of the chosen topics are most equally present and have the highest probability scores, the following formula will be applied to sort the sights in the descending order of its output:

;

where: min probability - is each attraction's smallest probability score across the selected topics;

max probability - the highest probability across the selected topics.

To produce the final coefficient value that will be used to sort the recommended attraction list, the topic distribution coefficient will be multiplied either by popularity of inversed popularity based on the user's choice.

5. Description of the results

5.1 Matrix factorisation models

First of all, the matrix factorisation algorithms were trained and tested to reveal the optimal setting of the models' parameters with the single goal of yielding the most accurate results of the models' performance, according to the major accuracy metric of MAE.

To begin with, the first one out of the number of matrix factorisation models to be trained and assessed was the FunkSVD algorithm. This unsupervised learning model, once trained, had to be optimised across, firstly, the number of factors and the value of the regularisation term as well as then, with those two parameters being fixed, across the range of different learning rates. The bullet point summary of the key findings for the evaluation process of the FunkSVD model is provided below (see appendix 1):

a) The higher the value of the regularisation term (with any number of features from 2 to 50), the higher the MAE value;

b) The smallest MAE value of 0.698 was yielded with 7 factors and regularisation parameter of 0.005 when learning rate was fixed at 0.001;

c) The best learning rate with the optimal number of 7 factors and the regularisation parameter of 0.005 was revealed at the training mark of 50 epochs and was equal to 0.005. The MAE at this optimal learning rate was shown to be 0.688, which is the best result across all iterations of parameter tuning.

Secondly, the next one out of the number of matrix factorisation models to be trained and assessed was the SVD++ algorithm, which was optimised through the regularised SGD just as the previous model of the FunkSVD. Moreover, similar to the FunkSVD, this unsupervised learning model, once trained, had to be optimised across, firstly, the number of factors and the value of the regularisation term as well as then, with those two parameters being fixed, across the range of different learning rates. The bullet point summary of the key results for the evaluation process of the SVD++ model is provided below (see appendix 2):

a) The best MAE value of 0.699 was recorded by a model configuration of 3 factors with the regularisation parameter of 0.005 and the learning rate of 0.001;

b) Thus, SVD++ algorithm has shown just a slightly worse performance than the FunkSVD method with the difference in the MAE value amounting to 0.11.

Lastly, the final one out of the number of matrix factorisation models to be trained and assessed was the NMF algorithm. Unlike the previous two methods, this unsupervised learning model only allows for non-negative factor values, which most probably in a major way contributed to its much lower overall performance, according to the MAE accuracy metric, than that of other evaluated matrix factorisation models. The bullet point summary of the key results for the evaluation process of the NMF model is provided below (see appendix 3):

a) The NMF algorithm has shown an overall higher values of MAE across all the variations of the number of factors than that of the FunkSVD and the SVD++ algorithms, its lowest MAE value being 0.862;

b) When trained with the parameters that yielded the best performance of the FunkSVD algorithm (7 factors with the regularisation and learning rate of 0.005), NMF model produced MAE value of 1.432.

5.2 K-nearest-neighbours models

The first KNN model that implemented was user-based filtering based on 10 neighbours. With the base sample (i.e. the sample of ratings posted by each user that is used to compute 10 similar users) of 3, which required to leave in the dataset only the users with 4 or more ratings (8807 users), the MAE amounted to 0.692. When only users with 3 or more ratings and samples of size 2 were used (3910 users), the MAE amounted to 0.730 and almost the same result of 0.725 was yielded when similarity was calculated only by one rating for each user (8807 users with 2 or more ratings). These results are summarised in the Table 1. Thus, the lowest MAE produced by the user-based KNN is 0.692 (Table 1).

Table 1 - Performance results of the regular user-based KNN model

Base sample of ratings

Number of users

MAE

1

8807

0.725

2

3910

0.730

3

2256

0.692

After the weights were introduced to the algorithm in an attempt to improve its performance, the MAE values increased for each of the three groups of base samples (Table 2). When similarity of users was calculated based on one rating, the MAE amounted to 1.125. The base samples of 2 and 3 ratings showed better performance, yielding MAE of 0.749 and 0.782. Thus, with its best rating prediction on average deviating from the actual ratings by 0.749, the weighted user-based algorithm based on cosine distance proved inferior to the regular user-based KNN on the given dataset.

Table 2 - Performance results of the weighted user-based KNN model

Base sample of ratings

Number of users

MAE

1

8807

1.125

2

3910

0.749

3

2256

0.782

Since each item was rated at least 4 times, item-based KNN, unlike the user-based one, did not require a base sample of ratings to calculate MAE and, therefore, it was possible to test it on the whole dataset. For each item the similar ones were computed based on at least three ratings, with the rating posted by the user who is getting recommendations set to 0. When the user-item matrix was transposed and the algorithm performed, the MAE value amounted to 0.699. Even though it is 0.007 points less than the best performing user-based KNN, this algorithm has the advantage of using the whole dataset including the users who rated only one item, and the MAE produced is a stable value since its calculation did not employ a random sample that leads to different MAE values based on what items were randomly selected to be used to compute the distance at each iteration.

5.3 LDA model

To derive the best value for the main LDA hyperparameter, the number of topics, coherence scores were calculated for models with 2 to 20 topics. Even though it is possible to produce coherent word-to-topic allocations for more than 20 topics, the primary purpose of this analysis was to derive a number of groups of attractions that would be presented to a new user upon their first login to the system. Thus, in order to avoid overwhelming them with a long list of attraction groups, word allocations into topics above 20 were not tested. The number of training iterations for the algorithm was set to 10 during the coherence score computation due to the limited computational resources and the fact that generally higher numbers of iterations tend to increase model performance instead of decreasing it.

The best coherence score of 0.535 was shown by the model with 16 topics, another model scoring above 0.53 but slightly less being the one with 17 topics (Figure 1). However, since all other models whose number of topics was above 10 did not perform that well, the third best performing model was chosen. The selected model divided documents into 6 topics and produces a coherence score of 0.519. The reason for choosing it over those with 16 and 17 topics was to avoid inundating users with a large number of highly specific location groups. Moreover, as TripAdvisor offers 15 categories that the attractions fall into and the goal of applying LDA in this paper was to produce a smaller number of topics without losing their representativeness, 6 topics were considered the optimal choice.

Figure 1 - Coherence scores at different number of topics

After selecting the most appropriate and high performing number of topics, various numbers of training iterations (i.e. passes) from 10 to 50 were tested and the one producing the highest coherence score selected. Increasing the number of passes to 50 improved the score by 0.039 points and, therefore, it was used to train the model (Figure 2).

Figure 2 - Coherence scores at different number of passes

The word allocations produced by the algorithm are presented in Table 3. The words are listed in the descending order of the weights of their contribution to the topics. The initial model output contained stems that were then manually completed to represent actual words occurring most frequently in each of the documents allocated to a particular topic. The topics were indexed from 0 to 5 and the names were assigned to them by the authors of this thesis based on their perception of the words in the model output.

Table 3 - Allocation of nouns according to different topics

TOPIC 0

Landmarks

TOPIC 1

Art

TOPIC 2

Tours

TOPIC 3

Food

TOPIC 4

Nature

TOPIC 5

Performing arts

church

square

statue

memorial

palace

gin

Westminster

monument

guard

architecture

museum

exhibit

tour

galleries

art

information

guide

display

collect

fascinating

tour

bridge

guide

Thames

tower

river

fan

ticket

stadium

informative

market

restaurant

stall

eat

bar

store

service

beer

buy

train

garden

cafe

animals

children

kid

relax

green

play

Greenwich

canal

theatre

seat

play

perform

venue

music

bar

ticket

product

stage

Next, a model containing only adjectives and adverbs as well as their comparative and superlative forms was trained in order to derive travel styles that would describe the users. Based on the coherence scores for the numbers of topics from 2 to 20, the most highly interpretable model was the one with 4 topics. However, the scores produced by each of these models were significantly lower than those of the models trained at the previous step, with the lowest and highest coherence scores amounting to 0.267 and 0.378 respectively. The model did not produce an interpretable result (Table 4).

Table 4 - Allocation of adjectives and adverbs according to different topics

TOPIC 0

TOPIC 1

TOPIC 2

TOPIC 3

local

green

central

quiet

atmospheric

clean

cafe

south

fresh

nearby

informative

knowledgeable

helpful

brilliant

top

expensive

enjoyable

interactive

own

able

historical

British

famous

modern

memorial

original

national

queen

royal

brilliant

comfortable

theatrical

helpful

musical

west

funny

fabulous

every

intimate

Based on the output of the two models, the one trained without excluding any parts of speech was selected to be used in the recommender system. The distribution of the topics produced by the LDA model among the categories of attractions presented on TripAdvisor is shown in Appendix 5. The categories offered on TripAdvisor with the numbers of sights that belong to them in the collected dataset are listed in Appendix 6.

Most of the sights classified as Art by the model fall into the TripAdvisor Museums and Sights & Landmarks categories. Attractions marked as Tours are represented in 12 out of all 15 of the TripAdvisor categories, with the majority of them falling into the Sights & Landmarks group. Landmarks are also represented mostly among Sights & Landmarks while Food attractions cover 12 categories with the top four most populated ones being Shopping, Sights & Landmarks, Other and Food & Drink. Sights marked as Nature are mostly categorised by TripAdvisor as Nature & Parks and Sights & Landmarks. Finally, the majority of attractions grouped into Performing arts fall into the Concerts & Shows category (see appendix 6).

These distributions were derived by assigning each attraction with one topic that had the highest probability as given in the model output. However, some attractions are characterised by high probability of two or more different topics and thus can be categorised into different attraction groups (see appendix 7). This feature will be used in the recommender system to provide users who select more than one attraction group at the login screen with recommendations that fit all or at least two of the chosen categories at the same time, while those who select just one attraction type will be presented with the locations that have the highest probability of falling into the chosen topic. These lists will be ordered based on the popularity scores as presented in the methodology.

5.4 Discussion of the results

The interpretation of the results relating to the research questions and the hypotheses is the following. First, FunkSVD was expected to show better performance on the TripAdvisor rating dataset than SVD++ and NMF. While the best performing instances of the latter models produced MAE values of 0.699 and 0.862 respectively, the lowest MAE produced by FunkSVD amounted to 0.688, which allows to confirm the first part of the first hypothesis.

Secondly, FunkSVD was expected to perform better than the user- and item-based k-nearest neighbours, which are the common benchmark algorithms for the comparison of collaborative filtering methods. Three different KNN models were tested and their MAE values were calculated. Weighted user-based KNN showed the worst performance results with its lowest MAE being 0.749. Item-based KNN produced the mean error of 0.699, falling behind the regular user-based KNN by 0.007 points. However, even the best result of 0.692 showed by the user-based KNN model with the base sample of 3 ratings is still slightly worse than the error of 0.688 yielded by FunkSVD. Thus, the second part of the second hypothesis is confirmed, proving FunkSVD to be the best performing model on the TripAdvisor attraction ratings dataset out of the selected six recommender algorithms.

Finally, the Latent Dirichlet Allocation for topic modelling allowed to uncover a total of 6 coherent latent topics in the reviews that supplemented the rating prediction. The final range of six topics is the following: “Landmarks”, “Art”, “Tours”, “Nature”, “Food” and “Performing arts”. While half of these attraction categories, such as “Art”, “Landmarks” and “Performing arts” are categorised by TripAdvisor into the similar groups of “Museums”, “Sights & Landmarks” and “Concerts & Shows” respectively, the other three types, in their turn, are distributed between up to 12 different categories from the TripAdvisor website, specifically: with “Tours” being the most diverse category and in small portions including the “Museums”, “Fun & Games” and “Transportation”; with “Food” being comprised primarily from “Food & Drinks” and “Shopping” categories; and with “Nature” largely consisting of the “Nature & Parks”, “Fun & Games” as well as “Zoos & Aquariums”. The highest coherence score of 0.535 was shown by the 16-topic model which can be supported by the fact that TripAdvisor also divides the sights into 15 categories. However, since the goal of this study was to produce a smaller number of groups that would be presented to a new user at the login screen with the minimal loss in the algorithm performance, 6 topics were considered the optimal choice. Thus, it can be concluded that the application of LDA helped produce a smaller number of attraction categories while the loss in the coherence score amounted to only 0.016 points.

To sum up, the highest performing collaborative filtering algorithm that was able to produce the most accurate predictions of user ratings on the TripAdvisor dataset of attractions in London, UK is FunkSVD matrix factorisation model. Incorporation of the semantic analysis performed by the LDA algorithm allowed to group the attractions into a smaller number of categories than that presented on the platform. This classification will allow to mitigate the problem of the cold start by providing new users with the choice of their most preferred attraction types and then presenting them either by their descending popularity or by the inverse of popularity based on the user's preference.

Speaking in detail about the prototype of the application interface that allows any given user to interact with the recommendation system, first of all, on the starting screen, a given user is presented with the range of six options for different types of attractions according to the results of the LDA model, namely: “Landmarks”, “Art”, “Food”, “Nature”, “Tours” and “Performing Arts” (see appendix 7). The user can either select just one category and be presented with a list of attractions that score the highest in that particular topic, or choose a combination of several categories up to all six of them, in which case an output will be produced in the form of a list of tourist sights sorted from those where topic probabilities are the most evenly distributed across the chosen categories to those where one of the topics prevails. In addition, the user is also provided with the tick option of sorting the chosen type of tourist attractions according to the inverse of their popularity, as opposed to the standard sorting by decreasing popularity, which will rank and display the least reviewed but highly rated item recommendations to be the first. Secondly, after the preliminary stage of pre-filtering the attractions according to the type that the new user has chosen (content-based filtering), the new user receives a list of recommended items, any of which he/she can rate by clicking on the icon to the right of every attraction, that way creating their profile and writing their ratings into the system's database (see appendix 8). And finally, after the core stage of employing the FunkSVD model to predict the new user's ratings on the initial basis of the total of at least one attraction rating (collaborative filtering), the user is transferred to the third and final screen, which displays the list of recommended attractions ranked according to the highest predicted ratings, which can be appropriately rated by the user in the similar way as on the second screen (see appendix 9). The user can also navigate to the tab that contains the items which he/she has previously rated, being displayed with their respective actual rating values (see appendix 10).

For clarity, it is important to note that, unlike the LDA method which has been performed offline and only one single time, the core prediction algorithm of the FunkSVD trained model has to be constantly computing the rating predictions for every new user in the real time, based on the newly added data of the new user's first rated attraction(s) and then each time a user adds a new rating.

Conclusion

This research thesis has set out to explore the unique application benefits of the machine learning models based on the latent factors for the development of a recommender system in the tourism domain. As the promising results of such factorisation algorithms had been extensively demonstrated on the film-related data, the present research was dedicated to adapting those models to a much less explored domain of travel recommendations. Thus, in order to achieve this objective, the TripAdvisor platform, which is rightfully considered to be the most visited travel-related online resource in the world, currently enjoying more than 490 million active users every month, was chosen as the only source of data for the construction of a sample dataset.

In the process of conducting a comparison study of the target models of matrix factorisation against each other as well as against the similarly efficient collaborative filtering algorithms of the nearest neighbours, the research's core hypothesis, which postulated the ultimate superiority of the famous FunkSVD factorisation method in accurately predicting the single numerical user ratings for the London's tourist attractions, has been successfully proven. In addition, the LDA technique has discovered a more concise and representative range of generalised topics for the types of London's landmarks and experiences as compared to the TripAdvisor's respective default categories. Moreover, this newly discovered classification has been adapted to pre-filter the system's new users according to their preferences in regard to the different types of attractions. Thus, the machine learning model of the LDA has yielded some very fruitful results, which have been successfully incorporated into the recommender system to solve the cold start problem.

When critically assessing the research's employed methodology and the received results, it should be pointed out that the study managed to conduct the comparison between a very limited range of only six collaborative filtering algorithms on the present-case travel data, while essentially only relying on a single point of comparison in the form of the MAE accuracy metric. Furthermore, the proof of the usefulness and applicability of the results of the LDA algorithm admittedly hinges on the fact of the model's inherently high efficiency on any type of semantic data, which, however, does not discredit the algorithm's findings as it is meant for the generalised predictions of a text's main topics, the process of which can only be sufficiently optimised, but not evaluated according to the accuracy of its results.

The major limiting factors of the performed research study have been the computational constraints of the available resources, which have noticeably restricted the volume of user-item data that could be parsed from the TripAdvisor platform and which have also cancelled out some of the types of model configurations that could not be properly assessed within this study. Thus, it is left for the future research to attempt increasing not so much the size of a sampled dataset as the degree of density of the user-item matrix, for instance, by means of scraping the review data of the most prolific users via the TripAdvisor API; as well as to try testing other more computationally demanding configurations of the matrix factorisation algorithms, such as, optimising the factorisation models by employing the Alternating Least Squares (ALS) learning method. Another significant limiting factor was the inability to test the proposed system online to collect the data necessary to assess the real performance of the model through the precision and recall metrics. Even though the real world testing was simulated through eliminating parts of the collected ratings and calculating distances based on the remaining ones while predicting the excluded ratings, calculating precision and recall that way would have led to unreliable results as it would have required only considering users who rated at least 8 or 10 items which there were very few of.

Some of the future improvements of the presently developed recommender system prototype are proposed to be the following. Firstly, the modification of the present recommender system into a hybrid one by means of, for instance, incorporating the LDA results even further into the recommendation algorithm, which could consist in providing users with the recommendations of those items that have similar topic distributions to the ones they have already rated highly (content-based filtering), in addition to the core collaborative filtering algorithm of the matrix factorisation-like model of the FunkSVD. Secondly, to further hybridise the model, introducing content filtering based on the data on the attractions and adding a set of optimised weights to each model's output might increase recommendation accuracy and diversity. Thirdly, it would also be interesting to extend the system's capabilities to include the option of optimally mapping a potential tour across the city of London based on the user's personalised recommendation list of tourist landmarks and experiences. Lastly and perhaps more importantly, it is paramount for the future development of the prototype to launch the travel recommender system in the current state online for the actual new users to test, as this will enable the application of the recommendation to set relevance measures, such as the precision and recall, allowing the subsequent enhancement of the predictive algorithm.

Reference list

1. Adomavicius, G., & Tuzhilin, A. (2005). Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Transactions on Knowledge and Data Engineering, 17(6), 734-749.

2. Ahani, A., Nilashi, M., Othman, I., Sanzogni, L., & Weaven, S. (2019a). Market segmentation and travel choice prediction in Spa hotels through TripAdvisor's online reviews. International Journal of Hospitality Management. International Journal of Hospitality Management, 80, 52-77.

3. Ahani, A., Nilashi, M., Yadegaridehkordi, E., Sanzogni, L., Tarik, A., Knox, K., Sarminah, S., & Othman, I. (2019b). Revealing customers' satisfaction and preferences through online review analysis: The case of Canary Islands hotels. Journal of Retailing and Consumer Services, 51, 341-343.

4. Al Mamunur, R., Istvan, A., Cosley, D., Lam, S. K., McNee, S. M., Konstan, J. A., & Riedl, J. (2002). Getting to Know You: Learning New User Preferences in Recommender Systems. Proceedings of the 7th International Conference on Intelligent User Interfaces, 1-9.

5. Ali, F., Kwak, K., & Kim, Y. (2016). Opinion mining based on fuzzy domain ontology and Support Vector Machine: A proposal to automate online review classification. Applied Soft Computing, 47, 235-250.

6. Baltrunas, L., Makcinskas, T., & Ricci, F. (2010). Group recommendations with rank aggregation and collaborative filtering. Proceedings of the 4th ACM conference on Recommender systems, 119-126.

7. Billsus, D., & Pazzani, M. (1998). Learning Collaborative Information Filters. Proceedings of the Fifteenth International Conference on Machine Learning, 46-54.

8. Blei, D. M., Ng, A. Y., & Jordan M. I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research, 3(5), 993-1022.

9. Bobadilla, J., Ortega, F., Hernando, A., & Gutiйrrez, A. (2013). Recommender systems survey. Knowledge-Based Systems, 46, 109-132.

10. Bottou, L. (2010). Large-Scale Machine Learning with Stochastic Gradient Descent. In Lechevallier Y., & Saporta, G. (Eds). Proceedings of COMPSTAT'2010 (pp. 177-186). Springer-Verlag Berlin Heidelberg.

11. Breese, J., Heckerman, D., & Kadie, C. (1998). Empirical Analysis of Predictive Algorithm for Collaborative Filtering. Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence, 43-52.

12. Burke, R. (2002). Hybrid Recommender Systems: Survey and Experiments. User Modeling and User-Adapted Interaction, 12(4), 331-370.

13. Cacheda, F., Carneiro, V., Fernбndez, D., & Formoso, V. (2011). Comparison of collaborative filtering algorithms: Limitations of current techniques and proposals for scalable, high-performance recommender systems. ACM Transactions on the Web, 5(1), 1-33.

14. Chang, Y., Ku, J., & Chen, C. (2017). Social media analytics: Extracting and visualizing Hilton hotel ratings and reviews from TripAdvisor. International Journal of Information Management, 48, 263-279.

15. Chen, Y., Wu, C., Xie, M., & Guo, X. (2011). Solving the Sparsity Problem in Recommender Systems Using Association Retrieval. Journal of Computers 6(9), 1896-1902.

16. Cheng, A., Chen, Y., Huang, Y., Hsu, W., & Liao, H. (2011). Personalized travel recommendation by mining people attributes from community-contributed photos. Proceedings of the 19th ACM International Conference on Multimedia, 83-92.

17. Dнez, J., Pйrez-Nъсez, P., Luaces, O., Remeseiro, B., & Bahamonde, A. (2020). Towards Explainable Personalized Recommendations by Learning from Users' Photos. Information Sciences, 520, 416-430.

18. Esmaeili, L., Mardani, S., Golpayegani, S., & Madar, Z. (2020). A Novel Tourism Recommender System in the Context of Social Commerce. Expert Systems with Applications, 149, 1-11.

19. Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27(8), 861-874.

20. Fuchs, M., & Zanker, M. (2012). Multi-criteria Ratings for Recommender Systems: An Empirical Analysis in the Tourism Domain. In C. Huemer & P. Lops (Eds.), E-Commerce and Web Technologies (Vol. 123, pp. 100-111). Springer Berlin Heidelberg.

21. Gavalas, D., Konstantopoulos, C., Mastakas, K., & Pantziou, G. (2014). Mobile recommender systems in tourism. Journal of Network and Computer Applications, 39, 319-333.

22. Ge, M., Delgado, C., & Jannach, D. (2010). Beyond accuracy: Evaluating recommender systems by coverage and serendipity. Proceedings of the 4th ACM Conference on Recommender Systems, 257-260.

23. Golbandi, N., Koren, Y., & Lempel, R. (2011). Adaptive bootstrapping of recommender systems using decision trees. Proceedings of the 4th International Conference on Web Search and Web Data Mining, 595-604.

24. Goldberg, D., Nichols, D., Oki, B. M., & Terry, D. (1992). Using collaborative filtering to weave an information tapestry. Communications of the ACM, 35(12), 61-70.

25. Gorrell, G. (2006). Generalized Hebbian Algorithm for Incremental Singular Value Decomposition in Natural Language Processing. Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics, 430-451.

26. Grиar, M., Mladeniж, D., Fortuna, B., & Grobelnik, M. (2006). In Nasraoui O., Zaпane O., Spiliopoulou M., Mobasher B., Masand B., & Yu P.S. (Eds). Data Sparsity Issues in the Collaborative Filtering Framework. Advances in Web Mining and Web Usage Analysis (LNCS Vol. 4198, pp. 58-76). Springer, Berlin, Heidelberg.

27. Guo, Y., Barnes, S., & Jia, Q. (2016). Mining Meaning from Online Ratings and Reviews: Tourist Satisfaction Analysis Using Latent Dirichlet Allocation. Tourism Management. 59, 467-483.

28. Hamel, S., & Robino, D. (2019). Global Destination Cities Index Report 2019.

29. Herlocker, J., Konstan, J., Terveen, L., & Riedl, J. (2004). Evaluating collaborative filtering recommender systems. ACM Transactions on Information Systems, 22(1), 5-53.

30. Hernбndez del Olmo, F., & Gaudioso, E. (2008). Evaluation of recommender systems: A new approach. Expert Systems with Applications, 35(3), 790-804.

31. Hofmann, T. (2004). Latent semantic models for collaborative filtering. ACM Transactions on Information Systems, 22(1), 89-115.

32. Hu, Y., Koren, Y., & Volinsky, C. (2008). Collaborative Filtering for Implicit Feedback Datasets. Proceedings of the 8th IEEE International Conference on Data Mining, 263-272.

33. Jannach, D., Zanker, M., & Fuchs, M. (2014). Leveraging multi-criteria customer feedback for satisfaction analysis and improved recommendations. Information Technology & Tourism, 14(2), 119-149.

34. Jelodar, H., Wang, Y., & Yuan, C. (2019). Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey. Multimedia Tools and Applications, 78, 15169-15211.

35. Kaminskas M., & Bridge, D. (2016). Diversity, Serendipity, Novelty, and Coverage: A Survey and Empirical Analysis of Beyond-Accuracy Objectives in Recommender Systems. ACM Transactions on Interactive Intelligent Systems, 7(1), 2-42.

36. Kbaier, M., Masri, H., & Krichen, S. (2017). A Personalized Hybrid Tourism Recommender System. 2017 IEEE/ACS 14th International Conference on Computer Systems and Applications (AICCSA), 244-250.

37. Kesorn, K., Juraphanthong, W., & Salaiwarakul, A. (2017). Personalized Attraction Recommendation System for Tourists through Check-in Data. IEEE Access, 5, 26703-26721.

38. Khusro, S., Ali, Z., & Ullah, I. (2016). Recommender Systems: Issues, Challenges, and Research Opportunities. In K.J. Kim & N. Joukov (Eds.), Information Science and Applications (ICISA) 2016 (LNEE Vol. 376, pp. 1179-1189). Springer Singapore.

39. Koren, Y. (2008). Factorization meets the neighborhood: A multifaceted collaborative filtering model. Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, 426-434.

40. Koren, Y. (2010). Factor in the Neighbors: Scalable and Accurate Collaborative Filtering. ACM Transactions on Knowledge Discovery from Data, 4(1).

41. Koren, Y., Bell, R., & Volinsky, C. (2009). Matrix Factorization Techniques for Recommender Systems. Computer, 42(8), 30-37.

42. Lika B., Kolomvatsos K., & Hadjiefthymiades S. (2014). Facing the cold start problem in recommender systems. Expert Systems with Applications, 41(4), 2065-2073. doi: 10.1016/j.eswa.2013.09.005

43. Lin, C. (2007). Projected Gradient Methods for Nonnegative Matrix Factorization. Neural Computation, 19(10), 2756-2779.

44. Logesh, R., Subramaniyaswamy, V., Vijayakumar, V., & Li, X. (2019). Efficient User Profiling Based Intelligent Travel Recommender System for Individual and Group of Users. Mobile Networks and Applications, 24(3), 1018-1033.

45. Lops, P., de Gemmis, M., Semeraro, G. (2011). Content-based Recommender Systems: State of the Art and Trends. In Ricci F., Rokach L., Shapira B., Kantor P. (Eds). Recommender Systems Handbook (Ch. 3, pp. 73-105). Springer, Boston, MA.

46. Lu, J., Wu, D., Mao, M., Wang, W., & Zhang, G. (2015). Recommender system application developments: A survey. Decision Support Systems, 74, 12-32.

47. Lu, L., Medo, M., Yeung, C., Zhang, Y., Zhang, Z., & Zhou, T. (2012). Recommender systems. Physics Reports, 519(1), 1-49.

48. Lucas, J., Luz, N., Moreno, M., Anacleto, R., Figueiredo, A., & Martins, C. (2013). A hybrid recommendation approach for a tourism system. Expert Systems with Applications, 40(9), 3532-3550.

49. Luo, X., Zhou, M., Xia, Y., & Zhu, Q. (2014). An Efficient Non-Negative Matrix-Factorization-Based Approach to Collaborative Filtering for Recommender Systems. IEEE Transactions on Industrial Informatics, 10(2), 1273-1284.

50. Meehan, K., Lunney, T., Curran, K., & Mccaughey, A. (2013). Context-Aware Intelligent Recommendation System for Tourism. Proceedings of the 11th IEEE International Conference on Pervasive Computing and Communications, 328-331.

51. Pantano, E., Priporas, C., & Stylos, N. (2017). `You will like it!' using open data to predict tourists' response to a tourist attraction. Tourism Management, 60. 430-438.

52. Parikh, V., Keskar, M., Dharia, D., & Gotmare, P. (2018). A Tourist Place Recommendation and Recognition System. 2018 Second International Conference on Inventive Communication and Computational Technologies (ICICCT), 218-222.

53. Park, S. T., & Chu, W. (2009). Pairwise preference regression for cold-start recommendation. Proceedings of the 3rd ACM Conference on Recommender Systems, 1-8.

54. Paterek, A. (2007). Improving Regularized Singular Value Decomposition for Collaborative Filtering. Proceedings of the ACM Press KDD Cup and Workshop, 5(8), 39-42.

55. Piatetsky-Shapiro, G. (2007). Interview with Simon Funk. ACM SIGKDD Explorations Newsletter, 9, 38-40.

56. Portugal, I., Alencar, P., Cowan, D. (2018). The use of machine learning algorithms in recommender systems: A systemic review. Expert Systems with Applications, 97, 205-227.

57. Ranjbar, M., Moradi, P., Azami, M., & Jalili, M. (2015). An imputation-based matrix factorization method for improving accuracy of collaborative filtering systems. Engineering Applications of Artificial Intelligence, 46, 58-66.

58. Rendle, S. (2012). Factorization Machines with libFM. ACM Transactions on Intelligent Systems and Technology, 3(3), 1-22.

59. Renjith, S., & Anjali, C. (2014). A Personalized Mobile Travel Recommender System Using Hybrid Algorithm. 2014 First International Conference on Computational Systems and Communications (ICCSC), 12-17.

60. Rich, E. (1979). User Modeling via Stereotypes. Cognitive Science, 3(4), 329-354.

61. Rцder, M., Both, A., & Hinneburg, A. (2015). Exploring the Space of Topic Coherence Measures. Proceedings of the Eighth ACM International Conference on Web Search and Data Mining - WSDM '15, 399-408.

62. Roopesh, L., & Tulasi, B. (2018). A Survey of Travel Recommender Systems. International Journal of Computer Sciences and Engineering, 6(9), 1-7.

63. Salakhutdinov, R., & Mnih, A. (2007). Probabilistic Matrix Factorization. Proceedings of the 20th International Conference on Neural Information Processing Systems, 1257-1264.

64. Sarwar, B., Karypis, G., Konstan, J., & Reidl, J. (2001) Item-based collaborative filtering recommendation algorithms. Proceedings of the 10th International Conference on World Wide Web, 285-295.

65. Shani, G., & Gunawardana, A. (2011). Evaluating Recommendation Systems. In Ricci F., Rokach L., Shapira B., & Kantor P. (Eds). Recommender Systems Handbook (Ch. 8, pp. 257-297). Springer Boston MA.

66. Sharma, R., & Singh, R. (2016). Evolution of Recommender Systems from Ancient Times to Modern Era: A Survey. Indian Journal of Science and Technology, 9(20), 1-12.

67. Shaw, G., Xu, Y., & Geva, S. (2010). Using Association Rules to Solve the Cold-Start Problem in Recommender Systems. In M.J. Zaki, J.X. Yu, B. Ravindran, & V. Pudi (Eds). Advances in Knowledge Discovery and Data Mining (LNCS Vol. 6118, pp. 340-347). Springer Berlin Heidelberg.

68. Shi, Y., Larson, M., & Hanjalic, A. (2014). Collaborative Filtering beyond the User-Item Matrix: A Survey of the State of the Art and Future Challenges. ACM Computing Surveys, 47(1), 1-45.

69. Sinha, R., & Swearingen, K. (2002). The Role of Transparency in Recommender Systems. Proceedings of the CHI EA `02 Conference on Human Factors in Computing Systems, 830-831.

70. Sokolova, M., & Japkowicz, N., & Szpakowicz, S. (2006). Beyond Accuracy, F-Score and ROC: A Family of Discriminant Measures for Performance Evaluation. In Sattar A., Kang B. (Eds.). AI 2006: Advances in Artificial Intelligence (LNCS Vol. 4304, pp. 1015-1021). Springer, Berlin, Heidelberg.

71. Wang, Y., Chan, S., & Ngai, G. (2012). Applicability of Demographic Recommender System to Tourist Attractions: A Case Study on Trip Advisor. Proceedings of the 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, 97-101.

72. Xu, Z. (2014). Trip similarity computation for context-aware travel recommendation exploiting geotagged photos. Proceedings of the 2014 IEEE 30th International Conference on Data Engineering Workshops, 330-334.

73. Yang, X., Guo, Y., Liu, Y., & Steck, H. (2014). A Survey of Collaborative Filtering Based Social Recommender Systems. Computer Communications, 41, 1-10.

74. Zhang, Z. K., Liu, C., Zhang, Y., & Zhou, Z. (2010). Solving the cold-start problem in recommender systems with social tags. Europhysics Letters, 92(2), 1-6.

75. Zhou, K., Yang, S., & Zha, H. (2011). Functional Matrix Factorizations for Cold-Start Recommendation. Proceedings of the 34th International ACM SIGIR Conference on Research and development in Information Retrieval, 315-324.

Appendix 1

Evaluation of the ratings prediction accuracy of the FunkSVD algorithm

Table 1: Comparison of MAE at n-factors

Iteration

Number of factors

Regularisation parameter

MAE

0

7

0.005

0.69767

1

10

0.005

0.6977

2

5

0.005

0.698

3

4

0.005

0.698

4

6

0.005

0.698

5

32

0.005

0.698

6

9

0.005

0.698

7

23

0.005

0.698

8

30

0.005

0.698

9

3

0.005

0.698

10

11

0.005

0.698

11

12

0.005

0.698

12

8

0.005

0.698

13

13

0.005

0.698

14

16

0.005

0.698

15

18

0.005

0.698

Table 2 - Comparison of MAE at different learning rates (a 7-factor model with the regularization parameter of 0.05)

Learning rate

MAE

0.005

0.688

0.010

0.697

0.015

0.704

0.020

0.706

0.025

0.713

0.030

0.714

0.035

0.712

0.040

0.714

0.045

0.718

0.050

0.718

0.055

0.718

0.060

0.722

0.065

0.725

0.070

0.724

0.075

0.722

0.080

0.722

0.085

0.725

0.090

0.723

Appendix 2

Evaluation of the ratings prediction accuracy of the SVD++ algorithm

Iteration

Number of factors

Regularisation parameter

Learning rate

MAE

0

3

0.005

0.001

0.699

1

3

0.005

0.001

0.700

2

4

0.010

0.001

0.700

3

3

0.020

0.001

0.700

4

4

0.015

0.001

0.700

5

3

0.010

0.001

0.700

6

4

0.025

0.001

0.701

7

4

0.020

0.001

0.701

8

3

0.015

0.001

0.701

9

3

0.040

0.001

0.701

10

22

0.045

0.050

0.701

11

3

0.025

0.001

0.701

12

4

0.035

0.001

0.701

13

3

0.030

0.001

0.701

14

4

0.005

0.001

0.701

15

4

0.040

0.001

0.701

16

5

0.035

0.001

0.701

17

3

0.035

0.001

0.701

18

5

0.005

0.001

0.701

19

3

0.045

0.001

0.701

20

5

0.025

0.001

0.701

Appendix 3

Evaluation of the ratings prediction accuracy of the NMF algorithm

Iteration

Number of factors

MAE

0

2

0.862

1

3

0.869

2

4

0.916

3

5

0.934

4

36

0.941

5

6

0.950

6

28

0.957

7

7

0.963

8

39

0.965

9

8

0.974

10

31

0.979

11

9

0.990

12

33

0.991

13

29

0.995

14

38

0.997

15

10

0.999

Appendix 4

Distribution of the LDA topics across TripAdvisor categories

Topic

Category

Number of attractions

Art

Concerts & Shows

1

Art

Museums

141

Art

Nature & Parks

1

Art

Other

2

Art

Sights & Landmarks

36

Art

Traveller Resources

7

Art

Zoos & Aquariums

1

Food

Casinos & Gambling

11

Food

Concerts & Shows

3

Food

Events

3

Food

Food & Drink

28

Food

Fun & Games

3

Food

Museums

3

Food

Nature & Parks

2

Food

Other

34

Food

Shopping

54

Food

Sights & Landmarks

41

Food

Transportation

3

Food

Traveller Resources

9

Nature

Classes & Workshops

1

Nature

Events

2

Nature

Fun & Games

9

Nature

Museums

4

Nature

Nature & Parks

83

Nature

Other

10

Nature

Sights & Landmarks

34

Nature

Zoos & Aquariums

3

Performing arts

Concerts & Shows

114

Performing arts

Museums

3

Performing arts

Nature & Parks

1

Performing arts

Other

1

Performing arts

Sights & Landmarks

4

Performing arts

Traveller Resources

1

Landmarks

Concerts & Shows

1

Landmarks

Food & Drink

5

Landmarks

Museums

5

Landmarks

Nature & Parks

6

Landmarks

Other

5

Landmarks

Sights & Landmarks

191

Tours

Classes & Workshops

1

Tours

Concerts & Shows

1

Tours

Events

4

Tours

Food & Drink

2

Tours

Fun & Games

10

Tours

Museums

7

Tours

Nature & Parks

3

Tours

Other

6

Tours

Sights & Landmarks

60

Tours

Transportation

8

Tours

Traveller Resources

3

Tours

Water & Amusement Parks

2

Appendix 5

Representation of TripAdvisor categories within the discovered topics

Appendix 6

Numbers of tourist attractions according to different categories

Category

Number of attractions

Sights & Landmarks

366

Museums

163

Concerts & Shows

120

Nature & Parks

96

Other

58

Shopping

54

Food & Drink

35

Fun & Games

22

Traveller Resources

20

Transportation

11

Casinos & Gambling

11

Events

9

Zoos & Aquariums

4

Classes & Workshops

2

Water & Amusement Parks

2

Appendix 7

First screen of the recommender system app

Appendix 8

Content-based filtering recommender output for a combination of the “Nature” and “Food” categories

Appendix 9


Подобные документы

  • Company’s representative of small business. Development a project management system in the small business, considering its specifics and promoting its development. Specifics of project management. Problems and structure of the enterprises of business.

    реферат [120,6 K], добавлен 14.02.2016

  • Different nations negotiate with different styles. Those styles are shaped by the nation’s culture, political system and place in the world. African Approaches to Negotiation. Japanese, European, Latin American, German and British styles of Negotiation.

    презентация [261,2 K], добавлен 27.10.2010

  • Evaluation of urban public transport system in Indonesia, the possibility of its effective development. Analysis of influence factors by using the Ishikawa Cause and Effect diagram and also the use of Pareto analysis. Using business process reengineering.

    контрольная работа [398,2 K], добавлен 21.04.2014

  • Relevance of electronic document flow implementation. Description of selected companies. Pattern of ownership. Sectorial branch. Company size. Resources used. Current document flow. Major advantage of the information system implementation in the work.

    курсовая работа [128,1 K], добавлен 14.02.2016

  • Философия управления японским предприятием. Практика решения социально-трудовых проблем в Японии. Анализ производственной системы tps (toyota production system), трудовые отношения в компании. Проблема адаптации японской модели трудовых отношений в РФ.

    курсовая работа [38,8 K], добавлен 16.09.2017

  • Описання теоретичних основ складського господарства та системи автоматизованого управління складом - Warehouse Management System (WMS). Організаційно-економічна характеристика об’єкта дослідження. Причини впровадження систем WMS вендорами країн СНД.

    дипломная работа [8,2 M], добавлен 05.07.2010

  • Description of the structure of the airline and the structure of its subsystems. Analysis of the main activities of the airline, other goals. Building the “objective tree” of the airline. Description of the environmental features of the transport company.

    курсовая работа [1,2 M], добавлен 03.03.2013

  • Logistics as a part of the supply chain process and storage of goods, services. Logistics software from enterprise resource planning. Physical distribution of transportation management systems. Real-time system with leading-edge proprietary technology.

    контрольная работа [15,1 K], добавлен 18.07.2009

  • The audience understand the necessity of activity planning and the benefits acquired through budgeting. The role of the economic planning department. The main characteristics of the existing system of planning. The master budget, the budgeting process.

    презентация [1,3 M], добавлен 12.01.2012

  • Types of the software for project management. The reasonability for usage of outsourcing in the implementation of information systems. The efficiency of outsourcing during the process of creating basic project plan of information system implementation.

    реферат [566,4 K], добавлен 14.02.2016

Работы в архивах красиво оформлены согласно требованиям ВУЗов и содержат рисунки, диаграммы, формулы и т.д.
PPT, PPTX и PDF-файлы представлены только в архивах.
Рекомендуем скачать работу.