Recommendation System for Travellers Based on TripAdvisor.com Data. Bachelor’s thesis

Recommender system approaches: core algorithms and respective applications. Overview of popular travel recommender system approaches. Major recommender system issues and their common solutions. Matrix factorisation and к-nearest-neighbours models.

Рубрика Менеджмент и трудовые отношения
Вид статья
Язык английский
Дата добавления 25.08.2020
Размер файла 1,1 M

Отправить свою хорошую работу в базу знаний просто. Используйте форму, расположенную ниже

Студенты, аспиранты, молодые ученые, использующие базу знаний в своей учебе и работе, будут вам очень благодарны.

Moreover, there are again two general classes of social recommendation approaches: matrix factorisation based one, which combines user-user social trust data with user-item feedback history, and nearest neighbour based one, which first traverses the network of users' direct and indirect friends to provide an additional advantage of social neighbourhood. When it comes to specific models that are commonly trained on such diverse and abundant data, previous research has conveniently identified what state-of-the-art algorithms had been used to implement side information in memory-based (cosine vector similarity, k-nearest neighbours), model-based (Bayesian network model, matrix factorisation model) and graph-based (random walk) collaborative filtering approaches; as well as to consider the interaction-based information by means of tensor factorisation, factorisation machines and graph-based approaches (Shi et al., 2014). It is not surprising that when it comes to efficiently storing in memory and manipulating huge arrays of data the matrix factorisation algorithms prove to be superior and, in particular, manage to excel at both item rating prediction and item list recommendation tasks (Yang et al. 2014).

Not least important is to point out the primary challenges that are associated with the current social recommender system tasks: firstly, trust- and distrust-based social recommendation of potential friends, products and other content; secondly, group recommendation for multiple people looking to choose a single activity, destination, etc; and thirdly, long tail recommendation, which refers to recommending items with low popularity - crucial for an effective recommender system (Shi et al., 2014).

Finally, Pantano, Priporas and Stylos (2016) emphasise that, in the process of making the choice of a specific recommendation system algorithm for the computing of item rating predictions, the researchers should take into account the specifics of the domain's informational context (what the available data sources are; how the information on users and items is organised there), the type of data that was made available in this particular domain (if numerical, strings, mixed, etc.), as well as the maximum bearable mark of the computational costs (incorporating the speed of a programme execution).

2.3 Overview of popular travel recommender system approaches

The central aim of the majority of tourism-related recommendation systems is to provide users with suggestions of relevant travel destinations and tourist attractions that are otherwise commonly known as the Points of Interest (PoIs). Although there exist other more sophisticated systems that are offering users travel routes recommendations and even personalised trip planning services, in line with the objectives of the present research paper, only the algorithms that specialise in recommending PoIs are going to be considered (Gavalas et al., 2014).

The broad variety of popular recommender system frameworks, which are employed for the application in the travel domain, can be most conveniently represented as the following categories according to both the type of approach that they employ and the type of data that they are based on (Roopesh & Tulasi, 2018):

a) Context aware systems that rely on constantly gathering contextual information from a person's device, web browser or social media that way enabling a live update of the user's data on his current location, time and day of the week, current season and weather conditions, etc. The combination of such diverse data of a person's context provide the basis for presenting the user with, for instance: the recommendations for a number of tourist attractions, based on their working hours, shortest user travel paths and sentiment scores from social media (Meehan et al., 2013); or, based on the current weather data and the person's travel history in the form of geographically tagged photos, the recommendations of a range of similar-looking attractions in a different city that were shared by other users on the photo sharing web sites (Xu, 2014). The core difficulties with context-aware systems are the high intensity of ongoing per user computations as well as the high complexity of the server-side software architecture enabling the continuous, repeated parsing of massive volumes of online data;

b) Social network-based recommenders, which function on the basis of a users' social profile information from Twitter, Facebook, Instagram and Flickr among many others, are exploiting the existing social trust relationships between online users in order to make appropriate recommendations. For instance, suggesting to a user a list of attractions based on his as well as on their friends' check-in data shared on their Facebook pages (Kesorn et al., 2017); or applying the data of users' social trust relations in order to predict tourism-related customer purchase behaviour, in regard to the specific tour packages and hotel bookings (Esmaeili et al., 2020). The principal challenge with this approach is the high level of data sparsity, which can be slightly brought down by allowing the scope of the recommendation base to be extended to not as relevant travel preferences of the user's friends of friends;

c) Hybrid filtering approach is also very frequently employed for developing recommender systems in the travel domain, as the collaborative and content-based filtering techniques prove to be inadequate on their own and underperform primarily due to the high levels of data sparsity on the travel-related platforms. For instance, one research has designed a hybrid system by combining multiple similarity measures, namely, the Tanimoto coefficient and the Euclidian distance measure for the collaborative and content-based constituent filtering techniques respectively (Kbaier et al., 2017). In another study a clustering algorithm as well as the associative classification method were sequenced together in order to group users based on demographic data and make predictions of POI ratings for these groups (Lucas et al., 2013);

d) Demographic filtering approach makes the use of the demographic user characteristics, such as age, gender, travel region, general travel style, purpose of the trip, travel companions etc., and is extensively applied as a means of overcoming the cold start problem, which is characterised by the absence of any previously rated items for a new user (Wang et al., 2012). Such application is justified by the fact that different demographic groups enjoy different aspects of their travel experiences, thus, naturally forming unique travel styles per certain group in accordance with these preferences (Fuchs & Zanker, 2012). Due to the fact that this approach is not entirely self-sufficient, in practical application research it is most often developed as a supporting part of either a hybrid (Kbaier et al., 2017; Renjith & Anjali, 2014) or a context-aware travel recommender system (Cheng et al., 2011).

When it comes to the research that has been conducted specifically on the data from the TripAdvisor platform, there has already been a diverse range of research papers published, studying and mining the useful patterns in the users' trip behaviour data as well as applying that travel-related data for the purposes of developing a travel recommender system.

To begin with, a number of published papers has been dedicated to exploring the structured customer feedback in the form the multi-criteria ratings of hotel reviews, which cover all of the essential aspects of the users' hotel experiences, from the satisfaction with the specific experiences of the first front desk interaction at check-in as well as of the room quality and business services of Wi-Fi access to the overall hotel qualities of the location, cleanliness and value for money (Fuchs & Zanker, 2012). The recognised recommendation potential of these multi-criteria hotel ratings from TripAdvisor has proven to be of deep interest to multiple research teams. Perhaps the groundworks for the exploration of the multi-criteria tourist ratings were laid down in the paper by Fuchs and Zanker (2012), who applied a number of the multiple linear regression models to examine the influence of different criterion ratings on the users' overall hotel satisfaction rating across the 4 tourist market segments, which were grouped according to the personal user information explicitly specified as part of the TripAdvisor hotel reviews. As a side note however, only recently several researchers have teamed to attempt exploiting the multi-criteria user ratings, specifically of the Canary Islands hotels and of US spa hotels, for the purposes of themselves creating the SOM clusters of users with different overall satisfaction levels as well as for ultimately applying the CART models for the prediction of the users' travel behaviour concerning the choice of hotels (Ahani et al., 2019a; Ahani et al., 2019b). In continuation, two years later an extended research team of Jannach, Zanker and Fuchs (2014) managed to produce high accuracy recommendations, by means of employing their previous findings of segment-specific hotel satisfaction factors as well as in the result of successfully experimenting with the single-rating matrix factorisation-based algorithm called SVD. Furthermore, Zheng (2017) has proposed a novel multi-criteria recommendation technique of “Criteria Chains”, which assumes a dependent relationship between multi-criteria ratings and, with the help of the context-aware biased matrix factorisation algorithm, evaluates the user rating for each criterion in the context of the previously predicted ones.

Another notable, however much less common approach towards TripAdvisor data exploration has to do with the analysis and processing of textual user reviews. On the one hand, the extensively researched area of sentiment analysis, also known as opinion mining, has not been seeing too much of general algorithmic improvement in the recent years. For instance, the novel combination of the Fuzzy Domain Ontology with the commonly employed method of the Support Vector Machine has been proposed for the goals of text classification in the paper by Ali, Kwak and Kim (2016), however it has not firmly succeeded in improving the performance of a simple Support Vector Machine model and resulted in a 10 per cent drop in the recall measure, despite a similar degree of improvement in the precision and accuracy metrics. On the other hand, the research has been shifting focus towards the other Natural Language Processing techniques of semantic analysis, such as the association rule-based and the topic model-based ones, the latter of which have recently gained much popularity in the tourism domain and started to be extensively featured for the purposes of data mining and pre-processing of textual reviews. For instance, by far the most common of such topic segmentation methods has been the Latent Dirichlet Allocation model (LDA), which uses an unsupervised learning algorithm in order to identify and label the most ubiquitous underlying topics present in the huge volumes of unstructured textual data (Blei et al., 2003). LDA is highly efficient as it can be adapted to be applied in the cases of extremely sparse disarrayed datasets of textual reviews, which is perfect for the use on TripAdvisor data. In the specific research examples this model has been applied with the goal of effectively extracting from the TripAdvisor hotel reviews the most frequent word dimensions (also known as topics) that signify the degree of consumer satisfaction, in order to identify the most important hotel features according to customers (Guo et al., 2017).

Other instances of already explored and collectively unrelated approaches towards the study of the TripAdvisor data within the context of the recommender system research include the following:

a) Applying the probabilistic classifier techniques, such as the Naive Bayes and Support Vector Machines, to the TripAdvisor attraction ratings in order to make user rating predictions with the additional help of the demographic data specified on user profile pages. Ultimately, the performance results did not show significant improvements after the application of demographic filtering, which was mostly due to the inherently high sparsity of the travellers' profile data (Wang et al., 2012);

b) Creating a novel and highly effective hybridisation method, which was directly inspired by the weighted and mixed ones, in order to combine the TripAdvisor item recommendations from some of the top performing location-based context-aware recommender systems (Logesh et al., 2019);

c) Applying the deep learning techniques of constructing artificial neural networks to the TripAdvisor photo databases in order to recommend POIs, based on how likely a given user is to be the author of a given POI's photos that were taken and shared by other users (Dнez et al., 2020).

2.4 Major recommender system issues and their common solutions

Although recommender systems can often face a whole range of different issues, such as: the problem of providing users with accurate, but not diverse enough recommendations or vice versa, the problems of scalability and latency, the problems of shilling attacks and user privacy; most of these problems are most likely to be either encountered and dealt with post factum, that is, well after the launch and first test trials of a specific system, or not at all (Khusro et al., 2016). Interestingly enough, there are only a handful of major problems that every single effective recommender system has to eventually overcome, namely, the two most prominent of them are: the cold start problem as well as the general issue of high data sparsity.

Speaking of the former one first, the cold start problem occurs upon the system's online launch and in all types of recommender system algorithms, especially in the collaborative filtering approach, and denotes the situation of the inability of the system to make relevant item predictions for a new user for the reason of only having received the user's scant input preferences and no other history of ratings. The instances of when the problem of a cold start is very likely to occur are often divided into the three following categories: (a) recommending items for new users with little or no preference history record, (b) continuing to produce relevant recommendations for the existing users, while at the same time updating the database with new items, and lastly, (c) recommending newly added items to the newly connected users (Lika et al., 2014).

This research area has already seen a significant number of various experimental methods that have been suggested for the goal of avoiding, or at least softening, the impact from the cold start problem. Understandably, most of the respective research was conducted with the focus of solving the more pressing new-user instance of the cold start problem, rather than the new-item one. The specific researched solutions often fall into one of the following groups:

a) Introducing the preparatory stage of the initialisation of new users in the form of a brief interview process that is controlled by something called a bootstrapping algorithm, which is developed to adapt to the user's choices in order to elicit the most informative responses; for instance, offering users to rate a short list of items that are representative of different groups of user preferences, that way quickly and accurately identifying the preference type of each new user (Golbandi et al., 2011; Zhou et al., 2011);

b) Establishing a range of user categories and simply making the new users decide on their own with which one of these categories they choose to associate themselves. Although this technique may seem to be the easiest in placing a new user profile in the context of the existing database, without having to make the user answer too many questions or to rate any items, this approach often cannot produce effective results and should only be applied in specific domains where it is not expected of users to be associating themselves with more than a single category at a time (Al Mamunur et al., 2002);

c) Employing the context-aware recommender systems that are most commonly based on the social trust networks data of individual users to exploit the established relationships between users and/or items; for instance, collecting data from the platforms that allow the annotation of items by the assignment of social tags for the purposes of quicker topic identification and more effective opinion sharing (Zhang et al., 2010);

d) Deriving from data patterns a range of association rules for user preferences in order to further extrapolate the new user profiles from the starting input data; for instance, guessing the new users' additional topics of interest by traversing the topic associations of other user profiles (Shaw et al., 2010);

e) Developing standalone predictive algorithms, which most commonly exploit supplementary information of item contents and/or user demographical characteristics, as an addition to a recommender system in order to generate more accurate recommendations for new users; for instance, by way of constructing a regression model for pairs of user and item features for each one of the new users in order to predict their item ratings (Park & Chu, 2009).

However, the most prominent recommender methodology conclusion that is almost unanimously shared by these research papers is the clear superiority of the hybrid system approach in dealing with the cold start (Lika et al., 2014). In this view, hybridisation is often used to adopt a particular content analysis methodology to item descriptions, in order to recommend relevant items based on little initial input data, when the new user's ratings have not yet been accumulated, that way balancing the drawbacks of collaborative filtering with the advantages of the content-based filtering and vice versa.

As for the other issue of high data sparsity, the collaborative filtering algorithms are once again the most exposed to having this problem. In collaborative recommender systems user profiles are most often represented as vectors of users' item ratings, which form a user-item (or consumer-product) interaction matrix. Due to the simple fact that an overbearing majority of users at any point in time has rated only a limited number of items, the user-item matrix is always subject to having zero (or missing) values for those items that the users have not yet rated. This natural occurrence, commonly referred to as data sparsity, and was observed to pose an acute problem of drastically reducing the accuracy of the system's recommendations, the reason for that being that sparsity of data not only weakens the correlation between any given pair of potentially similar users, but also makes a strong correlation between users an essentially unreliable measure. From this perspective, the problem of a cold start can be viewed as a specific case of the sparsity problem, when virtually all elements of a new row or a column in the user-item interaction matrix contain missing values. As a result, specifically in the cases of the technique of collaborative filtering being used without any assistance from other methods, extremely high levels of data sparsity tend to devoid the collaborative approach of any positive impact on the quality of recommendations (Chen et al., 2011).

The research field has seen a number of very diverse solutions to the data sparsity problem, most of which, unsurprisingly, bear a positive impact on and a strong resemblance to the cold start solutions. In spite of the wide range of researched solutions, a few groups of the most commonly used sparsity-alleviating techniques have still been identified as the following:

a) Reducing the dimensionality of the user-item interaction data in order to generate a much denser, more concise interaction matrix of only the top most prolific users with the largest numbers of item ratings. There are many different techniques that can be applied to achieve the reduction of data dimensions: starting from the simple statistical methods, such as creating clusters of either items or users to base the predictions on, and increasing the complexity further to discover the more eccentric techniques, such as the probabilistic topic evaluation model of the Latent Dirichlet allocation (LDA) and the information retrieval technique of the Latent semantic indexing (LSI) in the domain of natural language processing (Grиar et al., 2006; Blei et al., 2003). However, the most widely recognised and, arguably, the most common dimension reduction technique is that of matrix factorisation, which has proven to be extremely efficient in handling large databases without any significant data losses. On the one hand, the possible downside of such a simplification approach can be a noticeable loss in the recommendation accuracy, on the other hand, depending on a chosen technique and a way of its integration, the result may happen to be of increased recommender system's performance (Bobadilla, Ortega, Hernando & Gutiйrrez, 2013).

b) Representing the user-item interaction matrix as a graph of global similarity between users, where nodes refer to users and edges denote the degree of similarity between a given pair of users, in order to predict the potential interest in an item (or even the item rating) of a particular user, based on the length of a path directed from that user to the user who has already rated the item in question. Most commonly such approach requires a preliminary stage of creating for each user a bipartite graph, which establishes connections between a given user and the items that he or she has rated. The issues of this approach are often the low interpretability of user similarity measures as well as the off-the-scale computational intensity given a large enough dataset (Chen et al., 2011).

c) Transitioning onto an item-based collaborative filtering algorithm, in order to move away from seeking for a given user's similar neighbours and instead focus on the item-item similarity according to a given user's ratings. Such alternative approach to the user-based collaborative filtering enables the algorithm to rely on the preferences of a specific user who is making a recommendation query, instead of relying on the sparse item ratings of other users. One of the most resource efficient and most common among such methods are the ones of cosine-based and correlation-based similarity, similarity measures of which are then often used as weights for determining a weighted sum of k nearest items (Sarwar et al., 2001).

2.5 Performance assessment metrics: definitions and applications

Ever since the dawn of the recommender system research, the assessment of the recommendation quality as well as of the system's overall performance have quickly become a vital part of the field. Thus, over the years the recommender systems' performance evaluation metrics have been developed into the following four broad classes of: (a) the accuracy metrics for the ratings prediction, such as the Mean Absolute Error (MAE), the Root of Mean Square Error (RMSE) and the Coverage measure; (b) the relevance metrics for the set of recommendations, such as the Precision, Recall and Receiver Operating Characteristic (ROC); (c) the recommendation ranking metrics, such as the Half Life Utility (HLU) and the Discounted Cumulative Gain (DCG); and last, but not least, (d) the recommendation variety metrics: such as the diversity and the novelty of the recommended items (Hernбndez del Olmo & Gaudioso, 2008).

Traditionally, as a means of later assessing any of the quality aspects of a recommender system's performance in the context of a research study, a user dataset is divided in advance into the following two sets: a training set, which is meant for the purpose of training the machine learning model, and a test set, which is meant for the purpose of evaluating the performance of the already trained model, with the dataset's division conducted along the line of the most common optimal relation of 80 to 20 per cent respectively (Cacheda et al., 2011).

Accuracy prediction metrics are certainly the most frequently featured in the recommendation system research literature. In accordance with the presently obsolete basic supposition that a successful recommender system is the one that most accurately predicts user preferences, many of the previously published research papers had the intention of constructing algorithms that would provide more and more accurate user recommendations and of evaluating such models accordingly (Shani & Gunawardana, 2011).

The core performance metrics, which are always used to measure the accuracy of predicted ratings in the context of a user study, include the two closely related metrics of: the Root Mean Squared Error (or RMSE, for short) and its common alternative of the Mean Average Error (or MAE, for short):

recommender system travel solution

where: R - the user-item interaction matrix;

||R|| - denotes the matrix size (i.e. the number of ratings);

- the user's actual item ratings;

- the user's predicted item ratings (Yang et al., 2014).

Both of these metrics solely depend on the magnitude of prediction errors. However, the key difference when comparing the two is that, unlike the MAE metric, the RMSE one tends to be more heavily penalising the models with only a few instances of large prediction errors and instead preferring the models with a uniform level of prediction errors. In sight of this fact, it is also important to add that in the case when the test set has an unbalanced distribution of item ratings, these metrics are very likely to get skewed by the ratings prediction errors of the most frequently rated items. For this reason it is advisable to calculate RMSE and MAE for every item separately and then compute an Average RMSE and Average MAE over all items. Similarly, if, in the case of an unbalanced user distribution, the goal is to determine what level of recommendation accuracy a randomly chosen user is likely to receive, the Average RMSE and Average MAE metrics should be calculated over all users (Hernбndez del Olmo & Gaudioso, 2008).

The most common general task of a given recommender system is to provide any user with a top-k list of a certain fixed k number of item recommendations that were predicted to be the most reflective of the users' tastes. Thus, in order to measure the relevance of such a list of recommended items for every single user, other special metrics are applied instead, for instance, the most widespread ones are those of Recall and Precision. Recall (also known as sensitivity, hit rate or the true positive rate) is a metric that represents the fraction of the total amount of relevant items that were actually recommended to the user u and is obtained by dividing the number of relevant elements N (k, u) present in the top-k list by the total number of relevant items N(u) (relevance is established, for example, in the case if a rating exceeds a certain cut-off value) (Yang et al., 2014):

where: True Positives - the number of relevant items that were recommended;

False Negatives - the number of irrelevant items that were not recommended (Sokolova et al., 2006).

Precision is a metric that represents the fraction of the total amount of recommended items that were actually relevant for the user u and is calculated by the same metric of the number of relevant elements N (k, u) from the top-k list in its numerator and the total number of recommendations k of the top-k list as its denominator (Yang et al., 2014):

where: True Positives - the number of relevant items that were recommended;

False Positives - the number of relevant items that by mistake were not recommended (Sokolova et al., 2006).

In simpler terms, both of these metrics evaluate a recommender algorithm's ability to identify all of the user-relevant items that are present in the test dataset. While it is exactly true for the recall metric, the precision metric is also measuring the system's capacity to draw relevant items in relation to the system's errors of recommending useless items (Shani & Gunawardana, 2011).

For the reason of producing a more balanced evaluation of a model's performance based on the abovementioned metrics of recall and precision, a separate set of indicators called F measures has been derived in order to simultaneously track the behaviour of both of these metrics. The most extensively applied one of the F measures has been the F1 score, which assigns equal weights to both the precision and recall and can be easily computed as a harmonic mean of these two metrics:

where: Precision - the metric showing the fraction of the total number of recommended items that were actually relevant for the user;

Recall - the metric showing the fraction of the total amount of relevant items that were actually recommended to the user (Sokolova et al., 2006).

Conveniently enough, the F1 measure also represents the value of the area under the precision-recall curve, which is helpful when trying to assess the quality of algorithm's performance over a range of recommendation list lengths, in order to determine the most favourable one (Sokolova et al., 2006).

However, there exists another popular alternative of evaluating the accuracy of a recommendation list, namely, the Receiver Operating Characteristic analysis (or ROC, for short) and its respective curve, which measures the recall (known as the true positive rate) against the false alarm ratio (also known as the fall-out or the false positive rate):

where: False Positives - the number of relevant items that by error were not recommended;

True Negatives - the number of irrelevant items that were mistakenly recommended (Fawcett, 2006).

The focus of the ROC curves is to reflect the proportion of unwanted items that have still been put on the list of user recommendations. Thus, the central goal of the ROC analysis and the ROC optimisation is to return every relevant item without returning any of the irrelevant ones (Fawcett, 2006).

The choice between the two aforementioned options of the recommendation list accuracy metrics should primarily be decided upon the domain of the recommender's implementation as well as on the goals of the system's application. The general shared agreement among researchers is that in the simple case of wishing to recommend to users as much relevant items as possible the precision-recall curve should be sufficient for such a task. However, especially if the system gets implemented in a business setting, for instance in the domain of e-commerce, and holds as its core aim the maximisation of the number of new purchases and the minimisation of the marketing costs of maintaining the recommender algorithm, then the ROC curves, which are known for their widespread application in the cost/benefit decision analysis, should certainly be chosen over the precision-recall ones for their crucial ability of tracking the system's mistakes of recommending useless items (Shani & Gunawardana, 2011).

In the common instance of assessing the precision-recall or ROC curves for several test users when every user will be presented with a fixed number of k recommendations, the appropriate strategy for the recommendation set relevance evaluation is to calculate the precision and recall metrics at each number k of recommendation list lengths for each user, and then compute the average precision and recall at each number k of recommendation list lengths. The analogous approach can be taken in regard to the construction of an averaged ROC curve, every operating point of which will correspond to a different number of recommendations provided to users (Shani & Gunawardana, 2011).

As the length of the recommendation list is increased, every next recommended item tends to lose relevance to the user more quickly. For this reason, in cases when the k number of recommended items is quite large, it is advisable to introduce measures that reflect the quality of the recommendation list ranking. The two of the most frequently employed such ranking metrics are the Half Life Utility (HLU) and the Normalised Discounted Cumulative Gain (NDCG).

The Half Life metric assumes that a user's interest towards items decays at an exponential rate as he moves from the topmost item down the list of recommendations, which is to say that there is an exactly 50 per cent chance reduction for every next item of a given user actually proceeding on to that following item:

where: - the true rating of user u for the item i;

d - the neutrality indicator, frequently chosen to be 0;

- the 1-based rank at which the item i appears;

- the rank of the item on the recommendation list, such that there is a 50 per cent chance the user will review that item (Herlocker et al., 2004).

The Normalised Discounted Cumulative Gain, on the other hand, assumes that the relevance gain reduces logarithmically as a user moves down the list of recommendations. The main difficulty in computing NDCG, though, is the requirement to know the true user ratings for every single item on his recommendation list:

where: ,..., - the ranked list of recommended items;

- the true rating of user u for the item p that was ranked in position i;

- the maximum attainable gain value for user u which is obtained with the optimal re-order of the k items in ,...,(Baltrunas et al., 2010).

Over the years the fundamental definition of a successful, useful recommendation has gained in complexity, shifting from the core focus on accuracy to concepts like diversity, novelty and even serendipity (the degree of surprise or unexpectedness) of recommendations (Ge et al., 2010). According to the commonly accepted concept definitions, the novelty aspect of a recommendation can either reflect the degree of how generally unpopular, yet still relevant a recommendation is for a user or, in calculable terms, the degree of difference between the recommended items. However, as items with low popularity scores are quite unlikely to be familiar to the potential users, most typically the novelty of a given item is calculated as the inverse of its popularity rating, which is often simply measured as the total amount of ratings that that particular item has received. The diversity aspect of a recommendation, in its turn, can be defined as the degree of differentiation of the recommended items by their belonging to a certain item type or by differing across other characteristics. As it is impossible to state with absolute certainty the exact range of a few topics the user is most interested in at any specific time, making sure that the recommendation list is comprised of items that collectively cover a wide spectrum of the user preferences, greatly increases the system's chances of matching the user's current needs. This can be achieved by optimising the diversity of the recommendation list, which can be measured in terms of either item features or item content features, such as for instance: the general item types, subtypes, topics, etc. (Kaminskas & Bridge, 2016). The single most common metric that is frequently used to assess the degree of recommendation diversity is called the Coverage measure and can be calculated as a simple fraction in the following way:

where: k - the number representing the extent of top-k lists of user recommendations;

- the total number of distinct items in top-k places of all user recommendation lists;

- the total number of items that are available for being potentially recommended.

Thus, the fact that Coverage might end up as just a small fraction often means that only the most popular items actually end up on the user recommendation lists, which coincidentally also causes over-the-top levels of accuracy, rendering the respective metrics completely unreliable. For this reason a solid recommendation quality can only be achieved at high levels of both accuracy and coverage. Lastly, the ubiquitous use of the Coverage diversity measure, alongside the common metrics of recommendation accuracy, indicates its utmost importance in achieving a balanced performance evaluation of any given recommender system (Lu et al., 2012).

3. Statement of the research question

The two research questions, of the core and the supplementary focus, that set the direction for the efforts of the current thesis are stated the following way:

- What is the highest performing collaborative filtering algorithm among the matrix factorisation models that is able to produce the most accurate recommendations, based on the TripAdvisor data of user ratings for the London's tourist attractions?

- Does the text recognition technique of the latent Dirichlet allocation reveal coherent categories of user preferences to be employed for the content-based filtering of new users, based on the TripAdvisor data of user reviews for the London's tourist attractions?

First and foremost, in order to approach answering such a research question, the present research thesis sees it necessary to narrow down the focus and to only consider the specific case of the TripAdvisor data on any single city that is currently ranked as one of the most popular ones among tourists. Hence, the case has been made for the city of London, UK, which, according to the Mastercard's annual report on the Global Destination Cities Index, has been forecasted to overtake Paris by the overnight international visitors in 2019 and once again rank in the 2nd place, inferior only to the Bangkok's figures (Hamel & Robino, 2019). In addition, the current research has settled on developing a system, based on the single-rating user reviews of tourist attractions, that would simply recommend to the UK travellers the Places of Interest (otherwise known as PoIs), namely, the landmarks and experiences available in London. Such decision has been informed by the fact that, as this paper previously reviewed, there have been extensive research performed on many of the other types and thematic categories of the TripAdvisor data, for instance: the multi-criteria user ratings and textual reviews for the various hotels (Chang et al., 2017; Ahani et al., 2019b); the demographic information of users, explicitly stated in their TripAdvisor profiles (Wang et al., 2012); as well as the users' photos of PoIs attached to their reviews (Diaz et al., 2020).

Furthermore, in view of the previously reviewed base of academic literature, covering some of the more recent and promising developments in the research field of recommender system methodologies, as well as taking into account the matrix configuration and the inherent sparseness of the travel-related data that has been planned to be effectively collected from TripAdvisor within the present computational and temporal constraints, the thesis is able to justify the key choice of measuring and comparing the performance quality of, specifically, the range of latent factor models for the purpose of answering the abovementioned research question. As it have already been discussed in the literature review, the factorisation models have been shown to be some of the more computationally accessible and efficient recommendation algorithms when applied to the huge and sparse datasets of user-item interaction matrices, which makes them absolutely perfect for the use in the proposed case of the TripAdvisor review data on the London PoIs.

When it comes to the choice of the performance evaluation metrics, first and foremost, from the two reviewed accuracy metrics for ratings predictions this research has settled on the reasonable choice of the MAE metric, as it has a more uniform error calculation approach, treating singular instances of large prediction errors much more leniently in comparison to its closest counterpart of the RMSE metric (Hernбndez del Olmo & Gaudioso, 2008).

Thus, the respective research hypotheses have been specified to be the following:

- The application of the collaborative filtering method of the SVD on the numerical ratings will lead to the best results in the core accuracy metric of MAE as compared against the other factorisation methods of the SVD++ and the Non-Negative Matrix Factorisation as well as against the other common collaborative filtering methods of the user-based and item-based KNN, as it was proved to perform better than other CF algorithms on sparse rating datasets in other domains such as film recommendations (Billsus & Pazzani, 1998; Cacheda et al., 2011; Koren et al., 2009; Luo et al., 2014);

- The application of the topic modelling technique of the Latent Dirichlet Allocation on the textual reviews will reveal a more concise spectrum of the types of attractions enjoyed by users as compared to the 15 types that are specified as the default ones on the TripAdvisor platform without much loss in the coherence of the uncovered topics, as it was proved efficient in uncovering latent topic dimensions (Blei et al., 2003).

Consequently, in line with the research question analysis and the testing of its hypotheses, the main objective of the present thesis is to explore and demonstrate the benefits of the latent factor models for producing user recommendations in the tourism domain, by constructing a fully functional recommender system on the core basis of a top-performing collaborative filtering factorisation algorithm.

Finally, as a means of achieving this objective in a step-by-step process, the list of specific tasks has been proposed to be the following:

a) Firstly, to web scrape user ratings and reviews data from the TripAdvisor platform, specifically, on the landmarks and experiences of London, UK, with the primary help of the Python library requests and parser library BeautifulSoup, as well as to describe and visualise the results of the respective data analysis, mainly in order to assess the degree of the data sparsity problem and make appropriate adjustments;

b) Secondly, to apply the previously reviewed range of collaborative filtering algorithms on the user-item interaction matrix of single numerical ratings as well as to evaluate and compare the performance of the resulting predictive algorithms on the basis of the commonly used accuracy metric of MAE, that way justifying the ultimate choice of a single (or a range) of specific recommender algorithm(s) that will have produced the most accurate ratings predictions for the unknown test users;

c) Thirdly, to apply the topic recognition algorithm of the Latent Dirichlet Allocation on the array of the users' textual reviews for all of the London's tourist attractions, in order to discover a more relevant and precise list of attraction types according to the UK travellers, than the general one provided on the TripAdvisor platform;

d) Fourthly, to enhance the travel recommender system's performance by executing the following tasks: solving the cold start problem for the actual new users by pre-filtering them according to the newly discovered list of attraction types; as well as ensuring the diversity of user recommendations by penalising the recommendation algorithm for suggesting the most popular attractions;

e) And lastly, to develop a fully-fledged practical solution for travel recommendations in the form of an interactive online service application, on the core basis of a single recommendation algorithm which not only showed the highest accuracy of ratings predictions, but also ranked first according to the relevancy metrics of users' recommendation lists.

4. Methodology

In line with the single core objective of the current thesis of exploring and demonstrating the exceptional benefits of the latent factor models for producing user recommendations in the tourism domain of study, the exact factorisation methods, which will be employed for the purpose of conducting the present study, have previously been reviewed on the basis of their successful applications in the other domains of research, especially, for film recommendations. Specifically, the present thesis has chosen a total of three factorisation models, which are to be considered as the potential core filtering algorithm of the future recommender system, namely, the SVD, SVD++ and the Non-Negative Matrix Factorisation methods. The three primary reasons that were considered in making this choice were the facts that: firstly, these are some of the few models that are suitable to be applied in the present case of extremely sparse tourism-related data which can only be effectively represented as a user-item interaction matrix; secondly, these three models out of the rest of the more complex factorisation techniques were tested to be the only ones accessible enough in terms of the computational and temporal constraints of the present thesis; and thirdly but not least importantly, the three of these models were assumed to be sufficient as they have been the most common ones to be featured in the previous recommender studies in other domains, especially of film recommendations, where they have been extensively tested with much success against other collaborative filtering algorithms (Billsus & Pazzani, 1998; Cacheda et al., 2011; Koren et al., 2009; Luo et al., 2014).

In addition, in order to showcase and solidify the superiority of the top-performing latent factor model within this study as well, other classical collaborative filtering methods of comparable prediction accuracy and computational complexity have been chosen to be tested and evaluated on the same data alongside the latent factor models. Specifically, these other methods have been selected to be the two closely related ones of the user-based and item-based k nearest neighbours' algorithms, which are often compared with and were even proposed to be merged with the abovementioned most accessible matrix factorisation methods as a factorisation of a neighbourhood model (Koren, 2010).

When it comes to the choice of the performance evaluation metrics, first and foremost, from the two reviewed accuracy metrics for ratings predictions this research has settled on the reasonable choice of the MAE metric, as it has a more uniform error calculation approach, treating singular instances of large prediction errors much more leniently in comparison to its closest counterpart of the RMSE metric (Hernбndez del Olmo & Gaudioso, 2008).

Thus, the hypotheses for each of the research questions have been specified to be the following:

- The application of the collaborative filtering method of the FunkSVD on the numerical ratings will lead to the best results in the core accuracy metric of MAE as compared to the other factorisation methods of the SVD++ and the Non-Negative Matrix Factorisation as well as to the other common collaborative filtering methods of the user-based and item-based k nearest neighbours' algorithms;

- The application of the topic modelling technique of the Latent Dirichlet Allocation on the textual reviews will reveal a more concise spectrum of the types of attractions enjoyed by users as compared to the 15 types that are specified as the default ones on the TripAdvisor platform without much loss in the coherence of the uncovered topics.

To start with the data collection process, first and foremost, the mining of the TripAdvisor data had to be done completely by hand for the primary reason that the access to the TripAdvisor API, independent of the type of purpose (whether for consumer analysis, academic research or business application), would have had to be requested and paid for. Thus, the data collection process consisted in parsing the user review data to comprise a sample of, for the most part, 150 reviews per each of the 975 unique landmarks and experiences available for the city of London, UK on the TripAdvisor platform, which has amounted to a little over 42 per cent of the total population figure of 2318 respective tourist attractions. It is important to note that the representativeness of the sample was ensured by the simple fact that the user reviews on TripAdvisor are sorted from the most to the least recent ones by default, which allowed to collect a representative dataset of user reviews from all the various types of travellers who posted over the span of the whole past year. That way the collection of the unwanted and skewed sample of season-specific and traveller-type-specific review data was successfully avoided. The only data collection tool, employed to partially automate the process, was a straightforward web-scraping programme, which was written in the Python programming language with the only help of the BeautifulSoup library package as well as the Selenium web-testing framework. As a result, the two datasets of user reviews, with the one containing the numerical ratings of users and the other comprising the respective textual reviews, were represented as the two matrices, which reflected every single interaction (or an absence of such) between the total figures of the 47908 unique users and the 975 unique attractions.

Now speaking in detail about the core process of developing the chosen predictive algorithms, first and foremost, the sample dataset that contains the users' numerical single-ratings for London's landmarks and experiences was separated into the training and test sets according to the most optimal relation of 80 to 20 per cent respectively (Cacheda et al., 2011). For the purposes of training as well as analysing the performance of the total of six pre-selected machine learning models, the whole of the employed toolset is comprised of the two sets of add-on Python packages of SciPy Toolkits, with the one being the sklearn library, while the other is collectively entitled as Surprise, which is a rough abbreviation of the Simple Python Recommendation System Engine. The two libraries of Surprise and sklearn allow the choice of incorporating all of the necessary model features of the previously reviewed matrix factorisation and k nearest neighbour algorithms respectively. Subsequently the present section of the thesis presents all of the selected models, accompanied by a brief specification list of their exact model features and modes of application, the purpose and the advantages of which have already been established and discussed in detail in the literature review section of the present thesis.


Подобные документы

  • Company’s representative of small business. Development a project management system in the small business, considering its specifics and promoting its development. Specifics of project management. Problems and structure of the enterprises of business.

    реферат [120,6 K], добавлен 14.02.2016

  • Different nations negotiate with different styles. Those styles are shaped by the nation’s culture, political system and place in the world. African Approaches to Negotiation. Japanese, European, Latin American, German and British styles of Negotiation.

    презентация [261,2 K], добавлен 27.10.2010

  • Evaluation of urban public transport system in Indonesia, the possibility of its effective development. Analysis of influence factors by using the Ishikawa Cause and Effect diagram and also the use of Pareto analysis. Using business process reengineering.

    контрольная работа [398,2 K], добавлен 21.04.2014

  • Relevance of electronic document flow implementation. Description of selected companies. Pattern of ownership. Sectorial branch. Company size. Resources used. Current document flow. Major advantage of the information system implementation in the work.

    курсовая работа [128,1 K], добавлен 14.02.2016

  • Философия управления японским предприятием. Практика решения социально-трудовых проблем в Японии. Анализ производственной системы tps (toyota production system), трудовые отношения в компании. Проблема адаптации японской модели трудовых отношений в РФ.

    курсовая работа [38,8 K], добавлен 16.09.2017

  • Описання теоретичних основ складського господарства та системи автоматизованого управління складом - Warehouse Management System (WMS). Організаційно-економічна характеристика об’єкта дослідження. Причини впровадження систем WMS вендорами країн СНД.

    дипломная работа [8,2 M], добавлен 05.07.2010

  • Description of the structure of the airline and the structure of its subsystems. Analysis of the main activities of the airline, other goals. Building the “objective tree” of the airline. Description of the environmental features of the transport company.

    курсовая работа [1,2 M], добавлен 03.03.2013

  • Logistics as a part of the supply chain process and storage of goods, services. Logistics software from enterprise resource planning. Physical distribution of transportation management systems. Real-time system with leading-edge proprietary technology.

    контрольная работа [15,1 K], добавлен 18.07.2009

  • The audience understand the necessity of activity planning and the benefits acquired through budgeting. The role of the economic planning department. The main characteristics of the existing system of planning. The master budget, the budgeting process.

    презентация [1,3 M], добавлен 12.01.2012

  • Types of the software for project management. The reasonability for usage of outsourcing in the implementation of information systems. The efficiency of outsourcing during the process of creating basic project plan of information system implementation.

    реферат [566,4 K], добавлен 14.02.2016

Работы в архивах красиво оформлены согласно требованиям ВУЗов и содержат рисунки, диаграммы, формулы и т.д.
PPT, PPTX и PDF-файлы представлены только в архивах.
Рекомендуем скачать работу.