Automatic Categorization Based on Manual Categorization

This mechanism combines manual and automated processing of data.
The process begins with manually assigning categories to customer responses, creating what’s known as a training dataset.
Based on this dataset, the YourCX platform automatically builds a language model that learns to recognize patterns between the content of responses and their assigned categories.

Preparing a Language Model for Training

In practice, the process starts with manually labeling a defined number of customer responses, creating a representative training set.
The system then uses machine learning algorithms to build a categorization model by analyzing linguistic structures and keywords associated with each category.

A few best practices for preparing training data:

Provide diverse examples for each category.
Ensure categories are clearly distinct in meaning, so the algorithm can assign each response to the appropriate category.
If the categories overlap too much, the model may assign multiple categories to the same response.
Balance the dataset across categories—avoid major disproportions.
Ideally, provide at least 400 responses per category to ensure model reliability.

Once the model is trained, it can be applied automatically to new customer responses.
Each new comment is analyzed by the model, which assigns it to one or more predefined categories.
The platform also displays a quality score for the trained model, on a scale from 0 to 1.
In the example shown, the model achieved a score of 0.98, indicating very high accuracy.

Managing Categorization Models in YourCX

The YourCX platform allows you to manage categorization models directly within the admin panel.
There, users can view detailed information about each available model, including:

Model type
Number of categorized responses
Language
Creation date
Number of categories used

Model Optimization

Models in YourCX can also be further trained (fine-tuned).
If you notice issues with classification quality for a particular category, you can improve the training set for that category by:

Importing real customer responses that match the category → use the data importer
Automatically generating sample responses using a built-in language model to enrich the dataset for that category

The YourCX platform includes built-in response generation tools, allowing you to generate any number of diverse training examples for each category to improve model performance.

You can learn more in the article:
"How to Start Using Automatic Categorization in 9 Steps and Save Up to 60 Hours per Month!"

Survey-Specific and Custom Categorization Models

Each survey question with manual categorization automatically generates a dedicated categorization model.
This model learns from manually assigned categories and continues categorizing new incoming responses.

However, if you want to categorize answers across multiple questions or surveys, it’s best to use custom models.
These allow you to create a general-purpose categorization model that works:

Across any language
Across any set of questions in multiple surveys

This enables consistent categorization of responses across different projects, making it easier to compare results across areas.

Customizing the Categorization Model

Each model includes technical settings that can be adjusted to better suit your needs:

Select the base language model
Set the number of training epochs to improve fit with your dataset
Use a weighted or averaged loss function to guide the training process
Define the minimum category size, below which categories are ignored during training
Set confidence thresholds—so that a category is only assigned if the model reaches a defined probability level

After modifying these settings, you will need to retrain the model (if training-related settings changed) or re-run the categorization process (if assignment-related settings were changed).