Search

Using Similar Items Detector to solve Association Problems

Rulex generates description-based and sales-based replacement rules with the Similar Items Detector task.

This task uses description-based matching, which can be used with newly introduced items and helps solve cold start problems.

Prerequisites

you must have created a flow;
the required datasets must have been imported into the flow;
the data used for the analysis must have been well prepared;
a Frequent Itemsets Mining task must be present in the flow and provide input data for the Similar Items Detector.

Additional tabs

The results of the task are displayed in two separate tabs:

The Replacement rules tab displays the generated item sets, where:
- Rule Replacement ID: the sequential ID number for replacement rules.
- Category:
- Replaced item ID: IDs of replaced items
- Replacing item ID: IDs of replacing items
- Similarity score:

The Results tab displays details on the execution of the analysis, where:
- Task Identifier: the ID code for the task, internally used by the Rulex engine.
- Task Name: simply the name of the task.
- Elapsed time (sec): the time required for latest computation (in seconds).
- Number of generated replacement rules: the number of replacements rules which were generated by the task.

Procedure

Drag the Similar Items Detector task onto the stage.
Connect a task that contains frequent itemsets to the new task.
Double click the Similar Items Detector task. The left-hand pane displays a list of all the available attributes in the dataset, which can be ordered and searched as required.
To generate description-based replacement rules, click on the Text based matching tab and configure the options as described in the table below.
To generate sales based-replacement rules, click on the Sales based matching tab and configure the options as described in the table below.
Save and compute the task.

Name	Description
Similar Items Detector options
Text based matching options
Category attribute	Select the attribute that represents the category from the drop-down list. This can be used to match only descriptions that belong to the same category.
Description attribute	Select the attribute that represents the description from the drop-down list, which will be used for text matching.
Word separator	Select how words are separated from one of the following possibilities: Space Tab Newline
Minimum word length	Words that are shorter than the value entered here will not be used for text matching. This helps to eliminate words such as the, a, one, at etc.
Minimum unadjusted similarity cosine	The minimum similarity of pure text matching, without considering Preferential requirements attributes. Entering 1 means the text must be identical, 0 corresponds to no match required.
Case sensitive matching	If selected, the upper or lower case will be taken into consideration when matching text.
Item key attributes	Drag and drop the nominal attributes that uniquely identify the item from the Attributes list. Instead of manually dragging and dropping attributes, they can be defined via a filtered list.
Preferential requirements attributes	Drag and drop the attributes which will influence the similarity score when they match. When they match, a weight is added to the similarity score. This weight is defined in the Preferential requirements weights. These attributes could, for example, define brand, packaging or size. Instead of manually dragging and dropping attributes, they can be defined via a filtered list.
Ignored char list	Select the characters you want to eliminate from text matching.
Preferential requirements weights	The weight awarded to matching Preferential requirements attributes.
Sales based matching options
Takes also sales data into account	Select this option to include sales data in the task execution.
Minimum alternativeness coefficient	The degree of alternativeness between the purchase of two items: 1 (max) if they are never sold together 0 (min) if if one item is always sold with the other one. If a pair of items ensures the Minimum alternativeness coefficient, the corresponding replacement rule is discarded.
Minimum volume replacement score	The minimum percentage of orders in which a replaced item is expected to be replaceable by the replacing item. If this minimum threshold is not satisfied by a replacement rule, it is discarded.