Reshaping Datasets to Wide Format

The reshape to wide pivoting operation reshapes a dataset by:

  • transforming a key attribute into a new set of attributes for each row

  • creating a new column for each distinct value of the transformed key attribute.

Consequently the number of keys in the dataset will be decreased, which is often a prerequisite for merging datasets.


Prerequisites


Procedure

  1. Drag and drop the Reshape To Wide task onto the stage.

  2. Connect a task that contains the attributes you want to transform to the Reshape To Wide task.

  3. Double click the Reshape To Wide task. The left-hand pane displays a list of all the available attributes in the dataset, which can be ordered and searched as required. Select the attributes that will be displayed in the final table (othnames), by checking the corresponding box (by default all the attributes are included). Notice that, since the number of rows in decreased by the Reshape To Wide task, if you check an attribute without dragging and dropping it into the Widened attributes list, in the final table you will have only the value contained in the first line for each value of the Key Attributes

  4. Configure the task options as described in the table below.

  5. Save and compute the task.

Reshape to Wide options

Name

Description

Key attributes

Drag and drop the attributes that will be used as a key to identify each group of records. A record/row will be created for each distinct set of values of the key attributes. Instead of manually dragging and dropping attributes, they can be defined via a filtered list.

Long attributes

Drag and drop the attributes that will be become column headers in the wide format. Instead of manually dragging and dropping attributes, they can be defined via a filtered list.

Note that the combination of Key attributes and Long attributes should be a unique key in the original table. If this is not the case, you have to group according to Key attributes and Long attributes in a Data Manager before applying Reshape To Wide. 

Widened attributes

Drag and drop the attributes that will become column values in the resulting data table. Instead of manually dragging and dropping attributes, they can be defined via a filtered list.

Fill missing values

If selected, zeros will be inserted in every empty cell of the wide columns in the new reshaped table (empty cells in the final table correspond to the combinations of key/long attributes not present in the long dataset).

Remove prefix from widened attribute name

If selected, the prefix "Value" is removed from widened attributes to avoid having the same initial attribute name repeated for every new column in the table. 

By default, the new columns are named Value(long_value) where long_value is a possible value for the long attribute.


Example

The following example uses the Northwind_orderdetails dataset.

Description

Screenshot

  • After having imported a file onto the stage, drag a Data Manager task onto the stage to check the imported data: we want to reshape the Order ID to wide and to have all the corresponding values in one row. If we temporarily group the Order ID values, we will notice that we have 830 different IDs.

  • Add a Reshape to Long task onto the stage and link it to the Data Manager.

  • Double-click on the task to open it and set the reshaping parameters.

  • In the example here, we want our dataset to be reshaped so that we have a row with each order and its quantity.

  • Drag the OrderID attribute onto the Key attribute box.

  • Drag the ProductID attribute onto the Long attribute box.

  • Drag the Quantity attribute onto the Widened attribute box.

  • Uncheck the Unit Price and Discount attributes, as they are not necessary for our analysis.

  • Save and compute the task.

  • Drag a Data Manager onto the stage and link it to the Reshape to Wide task.

  • Double-click on the Data Manager to open it and see the results.

  • As we can see, the new datasets contains 830 records, one for each Order ID.