Collaborative Filtering in python

This repository implements a Collaborative Filtering algorithm.

Supported models

User-based Collaborative Filtering
- Uses pearson correlation coefficient to measure similarity between users.
  - significance level is measured by p-value assuming normal distributions of both users ' scores. If p-value is larger than 0.1, computed similarity is regarded as invalid and assign 0.0 as a similarity.
  - If a score matrix(dictionary) is sparse, this algorithm is likely to just assign average score of a user's ratings as a predicted rating.
  - Specifically, predicted rating ( $R_{ij}$ ) is the sum of mean rating of user i ( $\overline{R_{i}}$ ) and the weighted average of neighbor users' rating ( $N_i = \{R_{i n_1}, R_{i n_2},..., R_{i n_M}\}$ ) where M is predefined as 'num_neighbors'. Weights for each ratings ( $S_{i} = \{ S_{i n_1}, ..., S_{i n_M} \}$ ) are predefined similarity measures (in this case, pearsonR). For details, see this post.
- similarity measures currently supported are 'pearsonR' or 'pearsonR+'
  - pearsonR : measures pearson correlation between user and neighbor's ratings of common items. Predicted ratings could be negative.
  - pearsonR+ : uses pearsonR but disregards negative correlation(-1~0) and set it to zero in such cases. This ensures predicted ratings are within proper ranges.

Usage

random_user = list('abcdefghijklmnopqrstuvwxyz')
# ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
random_item = [str(i) for i in range(20)]
# ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19']

random_score_dict = {}
for user in random_user :
  for item in random_item :
    # rating sparsity is 50%
    if np.random.uniform()<0.5 : continue
    # rating range is in [0,5]
    random_score_dict[f"{user}_{item}"] = round(np.random.uniform()*5,2)

model = CF(
user_list = random_user, 
item_list = random_item, 
score_dict = random_score_dict,
sim_metric = "pearsonR+",
method="user_based",
num_neighbors=5
)

predicted, error = model.complete_for(['a_{}'.format(i) for i in range(20)])
# predict all ratings for user 'a'
# {'a_6': 1.8476923076923077, 'a_8': 1.8476923076923077, 'a_10': 1.8476923076923077, 'a_12': 1.8476923076923077, 'a_13': 1.8476923076923077, 'a_15': 1.8476923076923077, 'a_19': 1.8476923076923077}

predicted, error = model.complete()
# predict ratings for all non-evaluated pairs considering all users and items 
# {'a_6': 1.8476923076923077, 'a_8': 1.8476923076923077, 'a_10': 1.8476923076923077, 'a_12': 1.8476923076923077, 'a_13': 1.8476923076923077, 'a_15': 1.8476923076923077, 'a_19': 1.8476923076923077, 'b_1': 3.211642392121491, 'b_2': 3.55875, 'b_6': 4.00375, 'b_7': 1.6737499999999998, 'b_10': 2.7856227678465375, 'b_11': 3.24875, 'b_12': 3.1574999999999998, 'b_13': 3.01875, 'b_14': 2.57375 ...}

Disclaimer

Please use this implementation as a casual reference. For academic or industrial use, you might need to modify formulae as your work requires.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
README.md		README.md
collaborative_filtering.py		collaborative_filtering.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Collaborative Filtering in python

Supported models

Usage

Disclaimer

About

Uh oh!

Releases

Packages

Languages

DaehanKim/collaborative_filtering_python

Folders and files

Latest commit

History

Repository files navigation

Collaborative Filtering in python

Supported models

Usage

Disclaimer

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages