# Data Mining Principles TA Session 7 (February 26, 2021)

## Agenda

Association Rules

Recommender Systems

## Association Rules

Association rules, or affinity analysis is designed to find such general associations patterns between items in large databases

An association rule is a case where the conditional probability of product A given you also purchase product B is high—much higher than the unconditional probability for product A

## Key Metrics

Lift

Support

Confidence

## Lift in AR

The

**lift**value is a measure of importance of a ruleThe lift value of an association rule is the ratio of the confidence of the rule and the expected confidence of the rule

The expected confidence of a rule is defined as the product of the support values of the rule body and the rule head divided by the support of the rule body

Lift value > 1 - positive effect

Lift value < 0 - negative effect

Lift value = 1 - no effect

## Support in AR

The

**support**of an association rule is the percentage of groups that contain all of the items listed in that association ruleThe percentage value is calculated from among all the groups that were considered

The support of a rule is the percentage equivalent of a/b, where the values are:

a: The number of groups containing all the items that appear in the rule

b: The total number of all the groups that are considered

You can specify that only rules that achieve a certain minimum level of support are included in your mining model and this ensures a highly meaningful result

## Confidence in AR

The

**confidence**of an association rule is a percentage value that shows how frequently the rule head occurs among all the groups containing the rule bodyThe confidence value indicates how reliable this rule is

The higher the value, the more likely the head items occur in a group if it is known that all body items are contained in that group

Thus, the confidence of a rule is the percentage equivalent of m/n, where the values are:

m: The number of groups containing the joined rule head and rule body

n: The number of groups containing the rule body

## Limitations

Association rules can give you some useful insights, but they are limited to comparisons between pairs or small sets of products

The algorithm also remains fairly slow by modern standards

## Recommender System

The construction of systems that support users in their (online) decision making is the main goal of the field of recommender systems

In particular, the goal of recommender systems is to provide easily accessible, high-quality recommendations for a large user community

They are everywhere: Amazon, Neflix, Google, etc

Basic idea - if users shared the same interests in the past – if they viewed or bought the same books, for instance – they will also have similar tastes in the future

So, if, for example, user A and user B have a purchase history that overlaps strongly and user A has recently bought a book that B has not yet seen, the basic rationale is to propose this book also to B

## Key Idea

## Types of Algorithms

Algorithms that employ usage data are called

**collaborative filtering**Algorithms that use content metadata and user profiles to calculate recommendations are called

**content based filtering**A mix of the two types is called

**hybrid recommenders**

## Collaborative Filtering

Collaborative filtering is a family of algorithms where there are multiple ways to find similar users or items and multiple ways to calculate rating based on ratings of similar users

Two approaches - memory-based approach and modelling approach

Memory-based approach - find similar users, using such techniques as cosine similarity and pearson correlation and take the weighted average of ratings

Model-based approach - use different ML algorithms

## Memory-based Approach

User-based CF - a subset of appropriate users are chosen based on their similarity to the active user, and a weighted aggregate of their ratings is used to generate predictions for the active user at run-time

Item-based - a memory-based algorithm which explores the relationship between items as a function of how users have rated them

## Memory-based approach

## User-Item Matrix

- A user-item (U-I) matrix is a matrix, which encodes the individual preferences of users for items in a collection, for recommender systems

## Content-based Approach

- The recommendation task then consists of determining the items that match the user’s preferences best

## Content-based Approach

Content analyzer - when information has no structure (e.g. text), some kind of pre-processing step is needed to extract structured relevant information

Profile learner - this module collects data representative of the user preferences and tries to generalize this data, in order to construct the user profile

Filtering component - this module exploits the user profile to suggest relevant items by matching the profile representation against that of items to be recommended

## Similarity metrics

- Cosine similarity

- Pearson’s correlation

## Model-based CF

Matrix factorization

Clustering

Deep learning

## Model-based CF (clustering)

In this strategy, similar users are clustered into segments and the similarity between the target user and a user segment is calculated

For each segment, an aggregate profile, consisting of the average rating for each item in the segment is computed and predictions are made using the aggregate profile rather than individual profiles

To make a recommendation for a target user u and target item i, a neighbourhood of user segments that have a rating for i and whose aggregate profile is most similar to u is chosen

A prediction for item i is made using the k nearest segments and associated aggregate profiles, rather than the k nearest neighbors

## Model-based CF (matrix factorization)

- One model-based approach to collaborative recommendation which has proven very successful recently, is the application of matrix factorization approaches based on singular value decomposition (SVD) and its variants

## Challenges of CF

Totally new users (cold start)

Outliers (grey sheep)

Manipulations with reviews

Data sparsity

## Sources

**Business Data Science: Combining Machine Learning and Economics to Optimize, Automate, and Accelerate Business Decisions**(Taddy)**Practical Recommender Systems**(Falk)**Recommender Systems: An Introduction**(Zanker et al.)**Recommender Systems Handbook**(Ricci et al.)**Various Implementations of Collaborative Filtering**(Grover)**Matrix Factorization Techniques for Recommender Systems**(Koren et al.)