Import from Word File task
You can import data stored in a Word file, whether they have a table layout or not.
You can do it by either:
Dragging the Word file directly onto the Stage, creating an Import from Word File task.
Dragging an Import from Word File task onto the stage: this operation allows you to perform a more precise import operation, as you can:
Import a single file: only one file is imported, specifying the datasheet from which information will be taken.
Import multiple files together: in this case the files are concatenated to form a single table. This means that all files imported together must have the same structure.
A preview of the imported table is always displayed in the task.
Prerequisites
you must have created a flow;
if you are importing multiple files, they must have the same structure.
Procedure
Drag an Import from Word File task onto the stage or drag the file to import onto the stage.
Double click on the task to open it.
Select whether you want to use a Saved source or a Custom source.
Choose from the drop down list if you want to import your file from your computer (Local Filesystem) or from a Remote Filesystem.
If you are importing from a Remote Filesystem, choose it from the list and then click on the pencil button to set the connection information required (only if you are using a Custom source). The tables are loaded in the Files tab.
If you are using a Local Filesystem, click on the Select button and choose the path where the file is stored.
Click on the Add new path button, located next to the Select button, to add new paths where other files to be imported are stored. You can add as many paths as you want.
Click on the X button, located next to the Select button, to cancel the corresponding path.
Click on the Delete all paths button, located under the Add new path button, to cancel all the inserted paths.
Choose the resources to import by clicking the Select button in the Path 1 section.
Click on the Add new path button, located next to the Select button, to add a new path for a new resource. You can add as many paths as you want.
Click on the X button, located next to the Select button, to cancel the corresponding path.
Click on the Delete all paths button, located under the Add new path button, to cancel all the inserted paths.
In the Concatenation type box, select either:
Detach to keep the imported files, or sheets from the same file, separate, or
Concatenate if you want to merge them. You must then specify the concatenation type:
- Inner concatenation includes only attributes that exist in both tables.
- Outer (default) concatenation final table includes all the attributes, filling in any missing values if necessary.
Select if you want the columns to be matched by their Name or Position in the the Match Column by box.
Select the MS Word format where data to be imported is stored in the Retrieve Dataset From box. You can choose among the following options:
Text: the task directly imports the selected MS Word file with a free text layout. Headers must be located in the first row of the MS Word file. Important: you must correctly format data, which involves choosing a shared data separator, which then will be specified in the Parsing Options tab to complete the import operation.
Table: the task directly imports data from a table within an MS Word file. The first row of the table contains the headers for the attributes. The subsequent rows of the table contain the corresponding values for each attribute.
Basically, a spreadsheet containing the MS Word file’s table is created and ready to be processed with the subsequent tasks.
Click on the Word Configuration tab and set the Parsing options and the Import options, as displayed in the table below.
Save and compute the task.
Parsing and import options
Setting options | Description |
---|---|
Parsing options | Here you can set:
|
Import options |
|
Focus on the Case Sensitive checkbox
We encourage you not to select the Case Sensitive checkbox, as it has a significant impact on the data analysis.
If the Case Sensitive checkbox is selected, the number of distinct values in a column can increase, causing a slight difference in the data analysis.
In fact, if we have two strings, 'Word'
and 'word'
, they will be considered as two distinct values. This means that, if you write a function valid for the string 'word'
, it won't be valid for the string 'Word'
too.
It might cause consequences also on attributes, because if we want to apply a function to the $"Word"
attribute and we type $"word"
in the formula bar, an error occurs during the computation process.