A NOVEL ANIME RECOMMENDATION SYSTEM USING HYBRID-BASED APPROACH

НОВАЯ СИСТЕМА РЕКОМЕНДАЦИЙ АНИМЕ С ИСПОЛЬЗОВАНИЕМ ГИБРИДНОГО ПОДХОДА
Seitbekov S. Kultyshev E.
Цитировать:
Seitbekov S., Kultyshev E. A NOVEL ANIME RECOMMENDATION SYSTEM USING HYBRID-BASED APPROACH // Universum: технические науки : электрон. научн. журн. 2025. 5(134). URL: https://7universum.com/ru/tech/archive/item/20072 (дата обращения: 05.12.2025).
Прочитать статью:
DOI - 10.32743/UniTech.2025.134.5.20072

 

ABSTRACT

Anime has enthralled millions of people all around with its several genres, incredible animation, and original story approach. Viewers find an arduous job in choosing the best anime match for their taste as the company expands. The sheer weight of easily available literature makes the search process taxing and occasionally causes decision fatigue. Conventional recommendation systems which rely just on collaborative or content-based filtering find it challenging to give highly personalized recommendations sometimes failing in limited user data or recent releases. This paper proposes a hybrid-based anime recommendation system that neatly combines content-based algorithms with collaborative filtering to close this gap. Combining user interactions with rich anime metadata including genres, production companies, and reviews, our system produces intelligent, very customized suggestions. Embedding-based neural networks and similarity algorithms among modern machine learning techniques assist us to raise accuracy and user pleasure. By means of rigorous dataset analysis and performance tests, our approach seems to be a major development in recommendation technology. This technology points viewers toward secret resources they might have missed otherwise, therefore transcending basic tools to revolutionize how people find anime. Our work offers a more direct approach to explore the huge universe of anime, therefore adding to the expanding field of AI-driven recommendations.

АННОТАЦИЯ

Аниме увлекает миллионы людей по всему миру благодаря разнообразию жанров, потрясающей анимации и оригинальному подходу к повествованию. По мере расширения индустрии зрителям становится всё сложнее находить аниме, которое соответствует их вкусам. Огромное количество доступных материалов усложняет процесс поиска и иногда приводит к усталости от выбора. Традиционные системы рекомендаций, основанные исключительно на коллаборативной или контентной фильтрации, сталкиваются с трудностями в предоставлении высокоперсонализированных рекомендаций, особенно при недостатке данных о пользователе или новых релизах. В данной работе предлагается новая гибридная система рекомендаций аниме, которая сочетает алгоритмы контентной фильтрации с коллаборативной, чтобы преодолеть эти ограничения. Используя взаимодействие пользователей и богатые метаданные аниме, включая жанры, продюсерские компании и отзывы, наша система формирует интеллектуальные и максимально персонализированные рекомендации. Современные методы машинного обучения, такие как вычисление векторов и алгоритмы сходства, помогают повысить точность предсказаний и удовлетворённость пользователей. Благодаря тщательному анализу данных и оценке производительности наш подход представляет собой значительный шаг вперёд в технологии рекомендаций. Эта технология не только помогает зрителям находить скрытые жемчужины, которые они могли бы упустить, но и меняет сам процесс поиска аниме, делая его более удобным и эффективным. Наше исследование вносит вклад в развитие технологий на основе искусственного интеллекта, улучшая способы знакомства с огромным миром аниме.

 

Ключевые слова: Аниме, гибридная система, коллаборативная фильтрация, контентная фильтрация, рекомендательная система.

Keywords: Anime, Hybrid System, Collaborative Filtering, Content-Based Filtering, Recommendation System

 

Introduction

The Anime global market will reach USD 56.39 billion by 2030 [1]. As a result, fans are flooded with lots of various kinds of anime. In the future, the growing demand for anime content in the global market will promote the market’s growth. The surge is attributed to various factors, such as the availability of the internet globally, a rising interest in video games based on anime series, etc. Also, advancements in technology and the COVID-19 pandemic have significantly changed how anime is shared and viewed, and online platforms like Netflix are becoming very important for the industry of anime. The collaboration between top streaming services and anime studios to license and distribute a vast range of anime titles around the world marks a significant move towards digital platforms in the anime sector [2], [3].

However, the extensive variety and volume of anime content introduce significant challenges for recommendation systems. This research paper focuses on these challenges, aiming to develop a novel hybrid-based anime recommendation system that can navigate the complex landscape of anime content efficiently.

Generally, there are three main categories of recommendation systems: collaborative filtering, in which a recommendation is made based solely on interaction information; content-based, recommending based on user preference, item preference, or both interaction information and in between, a hybrid recommendation which uses both - user metadata and item metadata. [4].

The simplest method to re-recommend his/her history/past behavior is by taking input from the users and recommending a list of similar content. It is a kind of method called content-based filtering. In [5] writers used BERT (Bidirectional Encoder Representations from Transformers) [6] to understand the embedded context elements within the text. Using the BERT Method, anime title and genre data were shown and extracted; this was done in comparison with the cosine similarity approach.

The most common recommendation algorithms are collaborative filtering and its varieties. In [7]–[10] authors developed a recommendation system using collaborative filtering. They improved regular filtering techniques in accuracy and performance.

In their approach [7], they obtained movies distance with the target movies and every other movie followed by listing the top K nearest similar movies using the KNN algorithm [11]. Cosine angle similarity was used.

On the other hand, in [8]–[10] researchers concentrated on performance. In [8] a collaborative filtering recommendation algorithm based on properties, clustering, and SVD [12] was proposed by authors. That is, with fewer calculations, the system sufficiently correctly and effectively makes accurate personalized recommendations to users.

Next, in [9] authors presented a novel collaborative filtering recommendation method based on dimensionality techniques lowering the dimension of states, so enabling sparsity and cold start problems.

In [10] they proposed a Weighted Slope One-VU recommendation algorithm. With such an algorithm, authors significantly reduce the time complexity, but the prediction accuracy is slightly lower than SVD.

The hybrid recommendation is a blend of collaborative and content-based filtering methods. In [13]–[16] authors developed their systems with such techniques. For instance, in [13] they used a Gated Recurrent Neural Network (GRNN) [17] in conjunction with cosine similarity for its predictions. The work pursues designing a recommendation system for anime streaming to be more reliable and feasible. To boost the accuracy of the recommendation it is planned to combine the two methods of collaborative and content-based filtering. Consequently, based on [14], the authors of the suggested method evaluate it against content-based filtering, Item-Based collaborative filtering, and User-Based collaborative filtering algorithms obtaining an 81%. However, the performance of the model needs further optimization. In [15] researchers used a deep learning method that integrates user and anime side information and demonstrated that incorporating side information improves recommendation results by about 5% over the Singular Value Decomposition (SVD) model.

In [16] the authors focused on the semantic analysis of tweets for prediction (content-based). The collaborative part is more implicit, coming from the use of aggregated user data (tweets), rather than a direct implementation of collaborative filtering algorithms.

The authors suggest in [18] a three-pronged strategy for movie recommendations: demographic filtering, content-based filtering (including director/actor information), and collaborative filtering (item-based CF with singular value decomposition). Based on their findings, combining all three techniques produces more accuracy than using just one. Still, the demographic filter is constrained by popularity bias, hence new objects with low view counts may go unnoticed.

The study in [19] addresses how generating or supplementing textual descriptions might help Large Language Models (LLMs)—such as Falcon, Vicuna, and LLAMA—improve anime recommendations. Inputting anime titles, genres, and short synopses into an LLM helps the authors generate better, more context-sensitive recommendations. LLMs also offer natural-language explanations meant to increase user confidence. Although LLM-based methods can enhance content-based recommendation—especially for new or less-known titles—they are computationally costly and call for well-chosen rapid engineering.

In [20] the authors refine anime predictions by combining content-based filtering with an ensemble of boosting techniques (XGBoost, LightGBM, Catboost, etc.). Building a feature set those feeds ensemble algorithms, they depend on textual data from anime names and user ratings. High accuracy and precision result from this mixed approach. To stay current as anime libraries, change, the system might, however, call for major feature-engineering initiatives.

Authors in [21] presents a simple collaborative filtering method with an eye on user rating data from Kaggle's anime collection. While also considering user-similarity measures like SimRank, the writers use matrix factorization. By matching behavior of other users with the same rating patterns, the approach essentially matches viewers with top anime titles. Though somewhat basic, it shines in gathering user rating correlations but suffers with item cold start.

To capture both structural user–item relations and textual context from anime synopses, Javaji and Sarode suggest in [22] a new hybrid pipeline combining graph neural networks (GNN) and sentence transformer (BERT) embeddings. Using GraphSAGE to generate node embeddings, they treat anime recommendation as a link-prediction challenge. Though GNN-based solutions demand large memory and careful hyperparameter tweaking, especially as new anime or users arrive, even with great potential in connecting user interactions and textual data.

Finally, [23] presents RikoNet, a hybrid recommendation system using spectral clustering on anime embeddings combining autoencoders for rating prediction. While spectral clustering covers content-based expansion, the authors find that autoencoder-based CF beats basic matrix factorization in capturing user preferences. Though it can get distorted if all the user's recently rated anime lives in one cluster, their user tests show that the synergy between the two generates more diversity in suggestions.

We want people to discover and enjoy anime they might not have found without our system. Our goal is to make a novel hybrid recommendation system for finding your next favorite anime easy and fun. We believe our project can make a big difference in how people discover new anime.

Materials and Methods

One of the difficulties of content-based recommendation systems is that they tend to fail when there are too many varieties to the input anime. For example, if a user likes the action and fantasy “Attack on Titan” anime, the system will later recommend a list of action anime, losing other genres that might need to be recommended to the user. For collaborative filtering, one of the tasks that it is bad at recommending its new users or anime is what we mean by new users – it is the same thing as users that acquired their type-specific models to be recommended upon. This is mostly due to the system’s reliance on existing user interaction data. For instance, the newly released anime, namely “Jujutsu Kaisen”, is the best anime ever featured on the site “myanimelist.net”, August 2021 [24]. If the user interactions are insufficient, it should be impossible for the system to accurately recommend it to other consumers.

Our proposed approach is to create a unique hybrid recommendation system combining content-based and collaborative filtering techniques to handle this problem. This will boost customizing of anime recommendations and accuracy. Here, Figure 1 illustrates the proposed methodology.

 

 

Figure 1. Methodology for Hybrid System

 

To do that, we will prepare an anime recommendation system by merging content-based and collaborative filtering into a hybrid recommendation system. We will then analyze and gather datasets having anime titles, user scores, user reviews, metadata: anime genre, release year, and so on.

For the research, we have used a MacBook Air M1 with 16 GB RAM and macOS 14.4.1 to perform the data fetching. By contrast, a Google Colab Pro subscription with a Nvidia A100 GPU and 128 GB RAM was used for model training, as it allowed us to train the model quickly and effectively. For the first stage of our study, we needed to collect information. Since we needed to gather a considerable amount of data on different anime and their interactions with the service users, we used MyAnimeList and the Jikan API. MyAnimeList (MAL) provides a list-like system to organize and score anime and manga [25]. MAL also serves as a social networking platform where fans can document and rate the anime they watch. We can say that it is the world’s most proper online anime and manga database for our needs.

Jikan is an unofficial REST API for MAL [26]. We use this API for easy access to MyAnimeList’s database. It is community-driven, open-source, and frequently updated. The name “jikan” means “time” in Japanese, because we save some “jikan” with this API. For our research, we used two datasets: anime and user scores. However, since the Jikan API has not provided access to user anime lists since 2022 [27], we had to register on the MyAnimeList service to obtain the API ID and secret key for getting the user list.

We used the Jikan API to extract information for the Anime Dataset. The dataset is a wealth of information that can be used to analyze and understand the features of different anime shows, their ratings and popularity, and the number of viewers. For example, by using the dataset, we can determine what anime show has the highest rating, what genre is the most popular, how the ratings are distributed, and what viewers prefer in general. Table 1 illustrates the fields that we collect for the Anime dataset:

Table 1.

Fields extracted from the Jikan API for the Anime dataset

Field Name

Description

title

The anime’s title.

title japanese

The Japanese anime’s title.

title synonyms

Alternative titles or synonyms for

the anime.

type

The kind of the anime (e.g., TV)

source

The anime's source material

episodes

The number of episodes

status

The status of the anime

(e.g., Finished, Airing).

aired

The airing dates of the anime.

premiered

The season, year the anime debuted in.

broadcast

The broadcast schedule of the

anime.

producer

The producers of the anime.

licensor

The licensors of the anime.

studio

The studio that produced the

anime.

genre

The genres associated with the

anime.

duration

The running length of every episode.

rating

The content rating of the anime.

score

The anime's average score.

scored by

The count of anime viewers with scores.

 

The User Score Dataset contains insights about the user behavior when interacting with anime. By examining the user scores provided for different anime shows, we can find the most highly rated and frequently watched shows. We can also find various patterns in how users watch the shows, as well as their preferences when selecting and rating different shows. Table 2 illustrates the fields that we collect for the user score dataset.

Table 2.

Fields extracted from the MyAnimeList API for the user score dataset

Field Name

Description

User_id

Unique ID for each user.

Username

Unique username for each user.

Anime_id

Unique ID for each anime.

Anime Title

The title of the anime.

Rating

The user's grade of the anime.

 

In total, we collected 5600 entries for the Anime Dataset and 2432519 entries for the User Score Dataset. Our content-based and collaborative filtering systems are built on this extensive collection. We therefore developed the foundation for our collaborative and content-based filtering systems.

For fetching the data, we wrote an algorithm in Python. Firstly, we created a dictionary. After that, we start by loading the existing anime data and user data from a file named AnimeList.rick and UserList.rick accordingly. This file contains a dictionary where each key is an anime ID or user ID, and the value is another dictionary with the necessary details. For each anime and user ID that needs detailed information, we construct the API URL using the Jikan and MAL API. We make an API call to fetch the details. If the API call fails (e.g., due to network issues or rate limits), we retry up to two times with a short delay between retries. If the API call is successful, we parse the JSON response to extract detailed anime information. We updated the existing anime dictionary with the detailed information retrieved from the API. Finally, we convert our data to a table in CSV format.

The first pre-processing steps required cleaning and preparing the data for analysis. We filtered out anime, whose genres were unknown, and the popularity value was equal to zero because these entries would not provide meaningful insights for us. Table 3 illustrates the first seven entries with fields: title of anime, score of anime, list of genres of anime.

Table 3.

Cleaned Anime Dataset Sample

Title

Score

Genre

Cowboy Bebop

8.75

Action, Award Winning, Sci-Fi

Cowboy Bebop: Tengoku no Tobira

8.38

Action, Sci-Fi

Trigun

8.22

Action, Adventure, Sci-Fi

Witch Hunter Robin

7.26

Action, Drama, Mystery, Supernatural

Bouken Ou Beet

7.01

Action, Adventure, Fantasy

Eyeshield 21

7.92

Sports

Hachimitsu to Clover

8.0

Comedy, Drama, Romance

 

For analyzing the dataset, we have used the pandas library and for visualization of the data using various plots, we implemented the plotly library.

1) Types: To understand the distribution of different types of anime, the type of column was analyzed. According to Figure 2 the most common type with around 2000 titles was TV Series. The chart shows that TV series and OVAs are the most common, with movies also being important. This pattern matches trends in the industry and how people prefer to watch anime.

 

Figure 2. Anime Titles By Type

 

2) Most popular titles: As shown in Figure 3 such anime as Death Note or Angel Beats! are still at the top of anime popularity. This chart highlights the enduring popularity of classic and long-running series.

 

Figure 3. Top 15 popular titles. Less the popularity no. is more popular is the anime

 

3) Anime Score: Figure 4 shows that anime with higher scores tend to have more ratings. Popular, high-quality anime (scores above 8) often get over 500,000 ratings. Anime with lower scores (below 6) usually has fewer ratings. Some anime’s have many ratings but not the highest scores, showing that popular titles can attract large audiences even without top scores. This indicates that in the anime community, popularity and perceived quality are closely connected.

 

Figure 4. Anime Score vs. Number of scores

 

4) Genres: The word cloud in Figure 5 reveals the most common genres of anime. As may be seen, Action, Adventure, and Comedy are the most common genres, as evidenced by the largest words. They are followed by Sci-Fi and Fantasy, which are also highly popular. Note that a relatively large number of anime series also belong to the Romance, Hentai, Drama, and Supernatural genres. Finally, it is important to mention that combinations of genres such as Action Adventure and Comedy Romance are seen to be highly common. These categories seem to be quite similar as they often do not refer to the exclusive content of the genre. Notably, the word cloud shows that anime has a good variety of genres, which can attract a wide range of viewers.

 

Figure 5. World cloud by genre

 

Results and Discussions

1) Preprocessing: MinMaxScaler is used to linearly scale the data between a designated feature range. [28] In our research, the minimum value of the column rating will be scaled to 0, while the maximum value will be scaled to 1.

In this formula  is the scaled value;  is the original value;  is the minimum value of the original data;  is the maximum value of the original data.

Before training, we preprocessed the dataset to perform scaling for the ratings and encoding of the categorical data. More precisely, the ratings were scaled to the 0-1 range with MinMaxScaler according to Formula 1. Similarly, the user ids and anime ids were encoded into numerical values type, which enabled the implementation of the feature matrix to train the model.

2) Data Splitting: We shuffled the dataset and then split it into training and testing sets. Specifically, two matrices were created: a feature matrix  containing user and anime IDs and a target variable  containing the scaled ratings. We have used 10,000 samples for testing, ensuring a robust way to evaluate the model.

3) Model Architecture: We build our recommendation system using a neural network that incorporates embedding layers for users and anime IDs. As explained, the embeddings themselves represent the respective users and anime, and the combination via a dot product aggregates the latent features before passing through dense layers.

4) Training and Evaluation: The binary cross-entropy loss function is given by:

In Formula 2:  is quantity of specimens,  is genuine label of the -th sample (0 or 1),  represents the predicted likelihood for the -th sample being in class 1,  is the natural logarithm.

The Adam optimizer function is given by:

Here, in Formula 3:  is the revised parameter,  is the existing parameter,  is the learning rate,  is the bias-adjusted first moment estimation,  is the bias-adjusted second moment estimation,  is a small constant to avoid division by zero.

Mean Absolute Error is given by:

Mean Squared Error is given by:

In Formula 4 and Formula 5 accordingly: n represents quantity of samples,  is actual value for sample ,  is the anticipated value for sample .

We used binary cross-entropy loss function [29] according to Formula 2 to compile the model, and the Adam optimizer [30] according to Formula 3. Additionally, MAE and MSE [31] according to Formula 4 and Formula 5 were used as loss metrics. The training process involved the following callbacks to ensure optimal performance and prevent the model from overfitting:

  1. Learning rate scheduler that dynamically changes the learning rate while training.
  2. Model checkpoint to select the optimal weights based on the validation loss.
  3. Early stopping was used to halt the fit process when the validation performance does not improve; the optimal weights are restored in this case.

1) Preprocessing: The  formula is given by:

In Formula 6: t denotes the term,  denotes a document,  denotes the corpus,  represents the aggregate quantity of documents inside the corpus,  is the term frequency of a term  in document ,  represents the quantity of documents within the corpus  that contains the term .

The dataset contains the titles, genres, and each anime score. First, the genres are processed into numerical vectors using the  process [32] according to Formula 6. This allows the conversion of text data into numerical vectors, preserving the importance of a particular vocabulary on the background of all the data.

2) Cosine Similarity Calculation: The cosine similarity between two vectors  and  is given by:

We have in Formula 7:  is dot product of two vectors  and ,  is norm of vector ,  is norm of vector .

To quantify the similarity between various anime titles, we calculate the cosine similarity between the corresponding  vectors. After that, a sparse similarity matrix is obtained, in which the element with the coordinates measures the similarity of the anime titles. In Figure 6, you can see a sparse matrix.

 

Figure 6. Cosine Similarity Sparse Matrix

 

3) Recommendation process: The process has the following operations:

1. Target anime identification: the index of the target anime is located from the dataset.

2. Cosine similarity score computation: Cosine similarity scores for the target anime with all the other anime are calculated.

3. Filter and sort: Fields with unknown scores in the anime title are filtered out, and the ones left are sorted based on similarity scores together with the ratings.

4. Top recommendations selection: The top 15 anime that are more like the target anime are picked and the target anime itself is excluded.

5. Extract recommendation: Anime recommended, their names, genres, and scores are retrieved.

The training was conducted for a total of 25 epochs with a total batch accompanied by 20,000. Finally, the model was evaluated on the test set, ensuring that it could make good predictions.

1) MSE: Figure 7 shows the MSE for training and testing data. The MSE drops quickly at first, meaning the model learns fast early on. After about 10 epochs, the training MSE keeps dropping, while the testing MSE levels off. This means the model makes good predictions and doesn’t overfit.

 

Figure 7. MSE during model training

 

2) Loss: Figure 8 shows the losses for training and testing data. The loss drops a lot in the first few epochs. The training loss keeps going down, but the testing loss stays the same after 10 epochs. This means the model predicts future data well.

 

Figure 8. Loss during model training

 

3) MAE: Figure 9 shows the MAE for training and testing data. The MAE follows the same pattern as MSE and losses. The training MAE keeps dropping, while the testing MAE levels off after 10 epochs. This means the model works well with new data and doesn’t overfit.

 

Figure 9. MAE during model training

 

For testing our model, we used the similarity search algorithm to find the most relevant recommendations for an anime like “Clannad”. This algorithm will identify the relevance between the target anime and the others, considering such aspects, for example, genres. In the end, it will show all the top anime series that are most like “Clannad”. The table below Table 4 demonstrates the results found, with the corresponding similarity scores, and genres. As we can see from Table 4, the first recommended anime series, “Clannad: After Story”, to the target anime has a high similarity score, of 88.59%. This is a logical recommendation as it considers a direct continuation of the target one with the same genres - drama, romance, and supernatural themes. The second recommended anime is “Kanon”, with a similarity score of 86.69%. This anime also has a high similarity as it is represented with the same genres and plot. The last on the list is “Clannad: Another World, Tomoyo Chapter”, with a similarity score of 79.69%. The recommendation is because of a similar universe with different episodes. These results show that the similarity search algorithm is efficient to use and recommends to the user relevant viewing options.

Table 4.

Anime Similarity Table for Clannad

Rank

Name

Similarity

Genres

1

Clannad: After Story

88.59%

Drama,

Romance,

Supernatural

2

Kanon

86.69%

Drama,

Romance,

Supernatural

3

Angel Beats!

81.19%

Drama, Supernatural

4

Clannad:

Another World,

Tomoyo Chapter

79.69%

Drama, Romance

 

To identify users who are most like a randomly selected person, we calculated human similarity based on rating patterns. Specifically, the most similar users to a randomly selected user with fewer than 500 ratings were identified. More precisely, the top 7 users with more than 0.2 similarity scores were discovered. Table 5 presents the output of the algorithm, which includes the user identification numbers and similarity scores. The randomly selected user is user ID 599001. As we can see from Table 5, the most similar user to 599001 is a user with ID 100924 and a similarity score of 0.300375. The remaining users in the list also have high similarity scores, which means their rating patterns are closely matched. Overall, the algorithm has proven its performance in identifying users with similar preferences based on their rating behaviors.

Table 5.

User Similarity Table for User ID 599001

Rank

User ID

Similarity

1

100924

0.300375

2

1251851

0.289962

3

58046

0.282955

4

493235

0.276667

5

108495

0.276173

6

536585

0.276103

7

35684

0.275380

 

To analyze the viewing history of user ID 599001 the history of anime watched by this user has been assessed, and to analyze in what genres he is most interested a word cloud was used. As can be shown in Figure 10 the word cloud of the most frequently watched genres of user with ID 599001. In this frequency cloud, large words have been used for the most selected genre that shows high frequency. In this case, it can be observed that the largest words in the cloud are again Fantasy, Action, Adventure, Romance, and Supernatural.

 

Figure 10. World cloud By Genre for User ID 599001

 

We then set our recommendation algorithm to generate the top suggestions from the modeling given a user’s record and similar user preferences. Below in Table 6 are the top 3 anime recommendations for user 599001. These suggestions provide diverse themes and diversity aimed at broadening the titles’ thematic experience. “Akira” is an excellent anime that combines Sci-fi and horror. “Fullmetal Alchemist” and “Code Geass: Hangyaku no Lelouch” will provide a different aspect of action and drama.

Table 6.

Top 3 Anime Recommendations for User ID 599001

Rank

Anime ID

Title

1

96950

Code Geass: Hangyaku no Lelouch

2

38079

Akira

3

93001

Fullmetal Alchemist

 

For testing a content-based filtering algorithm for getting the recommendations we used the anime “Clannad”. The results are shown in Table 7. The most similar anime to the specified one “Clannad” is “Clannad: After Story” with a high 8.93 score. Other top recommendations are “Kanon”, “Fruits Basket”, and “Air”, as they have the same genres of drama, and romance, and are related to supernatural stuff.

Table 7.

Anime Recommendation for “Clannad”

Title

Genre

Score

Clannad: After

Story

Drama, Romance, Supernatural

8.93

Kanon (2006)

Drama, Romance, Supernatural

7.95

Fruits Basket

Drama, Romance, Supernatural

7.69

Air

Drama, Romance, Supernatural

7.28

 

Hence the algorithm performed properly and gave the right recommendations. Actually, 3 out of 4 anime are from the same studio - “Kyoto Animation” [33]. You can see their posters with main female characters in Figure 11.

 

Figure 11. Anime of Kyoto Animation Studio

 

To capture both the collective user feedback and the subtle nuances embedded in anime descriptions, we designed a hybrid system that blends collaborative filtering scores with a content-based score derived from deeper semantic modeling.

Our motivation rests on the observation that collaborative filtering performs best when there is a substantial body of user–item interactions, while content-based approaches can excel at introducing new or less-rated titles, especially if we model textual features with robust language representations. By merging the two systems, we aim to mitigate classic challenges like sparse rating data or the so-called cold-start problem. We introduce a parameter  that regulates the balance between these two sources of information. A higher alpha leans toward collaborative signals, whereas a lower alpha highlights the semantic cues from content-based embeddings. Formally, each anime is assigned a hybrid score of:

In Formula 8, we have  is a learned embedding-based prediction for user  and anime , and  arises from analyzing anime titles, synopses, and textual features - using a language model as BERT. This formula allows us to tune  to optimize precision, recall, or other objectives, depending on the nature of our dataset and the needs of our users.

We first assess precision at ten (P@10) and recall at ten (R@10) on a subset of fifty users for three scenarios: a completely collaborative recommender, a completely content-based one, and the hybrid one at a modest alpha of 0.2 to show the relative merits of each strategy. These metrics give us a sense of how well each system identifies relevant titles within a short, ranked list of recommendations.

The stacked bar chart in Figure 12 displays these results. In this figure, the blue portion represents precision, while the orange portion stacked on top corresponds to recall. Labeled "collab," the first bar represents the entirely cooperative approach whereby ratings alone direct the ranking. Its stacked total is low, which suggests just modest accuracy and recall. Labeled "content," the second bar ignores user-specific rating histories and depends just on textual or semantic similarity.

Its combined height is appreciably higher than “collab,” revealing the power of analyzing anime descriptions or genre embeddings. Finally, the bar on the right is “hybrid,” where we selected  so that textual features dominate slightly, but user preference information still plays a part. We see that this last bar is the tallest overall, signifying that a judicious combination of collaborative and content insights yields the highest performance among the three.

 

Figure 12. Stacked bar chart showing the precision (blue) and recall (orange) at k=10 for the collaborative, content-based, and hybrid recommendation strategies. The hybrid approach outperforms both individual methods in this sample

 

Because  is a continuous parameter, it is natural to explore its full range. We therefore swept alpha from 0.0 to 1.0 in increments of 0.1, evaluating each setting on a sample of one hundred users and measuring P@10 and R@10. Each data point on the curves in Figure 13 indicates the performance of the system at a certain alpha. The horizontal axis runs with alpha; the vertical axis shows the average metric value (precision or recall) observed over all tested participants.

From the Figure 13, we observe that in our dataset the optimal balance of accuracy and recall is usually provided by alpha values close to 0.2 or 0.1. When alpha is set too low (toward 0.0), the recommendations might mirror the content-based approach with insufficient personalization.

 

Figure 13. Precision and recall at k=10 measured as alpha shifts from 0.0 (purely content-based) to 1.0 (purely collaborative). Each point shows the average performance for a set of one hundred users 

 

As alpha drifts higher (approaching 1.0), the system reverts to a purely collaborative style and loses some of the semantic breadth provided by our textual embeddings. These observations underscore the value of blending both signals in moderate proportion. Of course, the exact optimal alpha for a given platform can shift over time if user preferences or the anime catalog evolve, but our results consistently highlight that combining collaborative signals with deeper language-based analysis achieves stronger overall performance.

Conclusion

We have introduced a novel hybrid recommendation system using the rating patterns of a varied user population and the subtle textual descriptions of anime—possibly captured by advanced language models. Our tests verify that the complexity of anime discovery cannot be completely captured by either content-driven approaches or collaborative filtering by itself. On a weighted grading system, though, both approaches can enhance one another's strengths and generate recommendations that excel in recall and accuracy.

On the positive side, the hybrid approach can handle cold starts more gracefully, recommending meaningful titles even if the user has a sparse history or if a particular anime has just been released. Additionally, the ability to tune  provides flexibility: if the user has watched many shows, the system can lean more on collaborative signals, but if the user is new, content-based embeddings can take the lead. The method does require more computational effort than a single-source system, since both collaborative embeddings and textual representations must be precomputed, stored, and combined at query time. Furthermore, if the anime metadata is incomplete or if user ratings are biased, the resulting hybrid scores may reflect those shortcomings.

One natural extension is to introduce dynamic alpha schedules that adapt automatically to the quantity and quality of user interaction data, the typical content patterns in the user’s watch history, or even real-time contextual signals. We also plan to incorporate additional side information, such as user reviews or social network connections, to refine both the content-based and collaborative components. While BERT embeddings already capture subtle textual relationships, domain-specific language models might further improve the representation of anime synopsis, especially if trained or fine-tuned on user-generated descriptions, fan wikis, or forum discussions. At a broader scale, we anticipate that deeper personalization will require architectures that can adapt more quickly to time-sensitive behaviors, ensuring that recommendations stay fresh and relevant as new seasons or spin-off series are released.

Our results show that somewhat improved suggestion quality results from precisely balancing content with shared knowledge. With its variety of genres, styles, and narrative topics, anime emphasizes the requirement of adaptable systems able to manage both popular hits and more specialized niches. We develop a method that stays strong in the face of fast expansion and changing viewer tastes by combining big-scale user feedback with powerful text-based modeling. We think that constant improvements and data-driven adaptability will help to keep a high degree of user happiness, thereby promoting richer discovery opportunities for anime enthusiasts all around.

 

References:

  1. Yahoo Finance, Global Anime Market Report (2022 to 2030) https://finance.yahoo.com/news/global-anime-market-report-2022-100300202.html
  2. Mordorintelligence Anime Market Size & Share Analysis https://www.mordorintelligence.com/industry-reports/anime-market
  3. Grandviewresearch Anime Market Size & Trends https://www.grandviewresearch.com/industry-analysis/anime-market
  4. A Singhal, P Sinha and R Pant, Int. J. Comput. Appl. 180, 17–22, (2017). https://doi.org/10.5120/ijca2017916055
  5. C G Reswara, J Nicolas, M Ananta and F I Kurniadi, Proc. 2023 4th Int. Conf. Artif. Intell. Data Sci. (AiDAS), 109–113, (2023). https://doi.org/10.1109/AIDAS60501.2023.10284693
  6. J Devlin, M W Chang, K Lee and K Toutanova, (2019). https://doi.org/10.48550/arXiv.1810.04805
  7. M Gupta, A Thakkar, Aashish, V Gupta and D P S Rathore, Proc. Int. Conf. Electron. Sustain. Commun. Syst. (ICESC), 415–420, (2020). https://doi.org/10.1109/ICESC48915.2020.9155879
  8. W Hong Xia, Proc. 2019 IEEE 4th Int. Conf. Big Data Anal. (ICBDA), 431–435, (2019). https://doi.org/10.1109/ICBDA.2019.8713205
  9. H Zarzour, Z Al-Sharif, M Al-Ayyoub and Y Jararweh, Proc. 9th Int. Conf. Inf. Commun. Syst. (ICICS), 102–106, (2018). https://doi.org/10.1109/IACS.2018.8355449
  10. J Zhang, Y Wang, Z Yuan and Q Jin, Tsinghua Sci. Technol. 25, 180–191, (2020). https://doi.org/10.26599/TST.2018.9010118
  11. P Cunningham and S J Delany, ACM Comput. Surv. 54, 1–25, (2021). https://doi.org/10.1145/3459665
  12. Z Zhang, (2015). https://doi.org/10.48550/arXiv.1510.08532
  13. V Prakash, S Raghav, S Sood, M Pandey and M Arora, Proc. 2022 4th Int. Conf. Adv. Comput. Commun. Control Netw. (ICAC3N), 718–723, (2022). https://doi.org/10.1109/ICAC3N56886.2022.10074101
  14. Q Pu and B Hu, Proc. 2023 Int. Conf. Ambient Intell. Knowl. Informatics Ind. Electron. (AIKIIE), 1–5, (2023). https://doi.org/10.1109/AIKIIE60097.2023.10389982
  15. Nuurshadieq and A T Wibowo, Proc. 2020 3rd Int. Semin. Res. Inf. Technol. Intell. Syst. (ISRITI), 62–67, (2020). https://doi.org/10.1109/ISRITI51436.2020.9315363
  16. S Bansal, C Gupta and A Arora, Proc. 2016 9th Int. Conf. Contemp. Comput. (IC3), (2017). https://doi.org/10.1109/IC3.2016.7880220
  17. J Chung, C Gulcehre, K Cho and Y Bengio, (2014). https://doi.org/10.48550/arXiv.1412.3555
  18. N K Rao, N P Challa, S S Chakravarthi and R Ranjana, Proc. 2022 4th Int. Conf. Invent. Res. Comput. Appl. (ICIRCA), 711–716, (2022). https://doi.org/10.1109/ICIRCA54612.2022.9985512
  19. A Agarwal and S Sharma, Proc. 2023 16th Int. Conf. Develop. eSyst. Eng. (DeSE), 870–875, (2023). https://doi.org/10.1109/DeSE60595.2023.10468757
  20. S P Vaddineni, U K Pillarisetty, K Ram, Y Rayapati and S Shareefunnisa, Proc. 2024 8th Int. Conf. Invent. Syst. Control (ICISC), 223–229, (2024). https://doi.org/10.1109/ICISC62624.2024.00045
  21. A S Girsang, B Al Faruq, H R Herlianto and S Simbolon, J. Phys. Conf. Ser. 1566, 012057, (2020). https://doi.org/10.1088/1742-6596/1566/1/012057
  22. S R Javaji and K Sarode, (2023). https://doi.org/10.48550/arxiv.2310.04878
  23. B Soni, D Thakuria, N Nath, N Das and B Boro, (2021). https://doi.org/10.48550/arxiv.2106.12970
  24. MyAnimeList, Jujutsu Kaisen Page https://myanimelist.net/anime/40748/Jujutsu_Kaisen
  25. MyAnimeList Web Site, https://myanimelist.net/
  26. Jikan Web Site, https://jikan.moe/
  27. Jikan API https://docs.api.jikan.moe/#tag/users/operation/getUserAnimelist
  28. Towards Data Science, everything you need to know about Min-Max normalization in Python https://towardsdatascience.com/everything-you-need-to-know-about-min-max-normalization-in-python-b79592732b79
  29. Towards Data Science, Cross-Entropy Loss Function https://towardsdatascience.com/cross-entropy-loss-function-f38c4ec8643e
  30. Towards Data Science, The Math behind Adam Optimizer https://towardsdatascience.com/the-math-behind-adam-optimizer-c41407efe59b/
  31. Towards Data Science, Comparing robustness of MAE, MSE and RMSE https://towardsdatascience.com/comparing-robustness-of-mae-mse-and-rmse-6d69da870828
  32. Capital One, Understanding TF-IDF for Machine Learning https://www.capitalone.com/tech/machine-learning/understanding-tf-idf/
  33. Kyoto Animation Web Site, https://www.kyotoanimation.co.jp/en/
Информация об авторах

Master Student, Kazakh-British Technical University, Kazakhstan, Almaty

магистрант, Казахстанско-Британский технический университет, Казахстан, г. Алматы

PhD Candidate, Kazakh-British Technical University, Kazakhstan, Almaty

аспирант, Казахстанско-Британский технический университет, Казахстан, г. Алматы

Журнал зарегистрирован Федеральной службой по надзору в сфере связи, информационных технологий и массовых коммуникаций (Роскомнадзор), регистрационный номер ЭЛ №ФС77-54434 от 17.06.2013
Учредитель журнала - ООО «МЦНО»
Главный редактор - Звездина Марина Юрьевна.
Top