Importing Data from a Text File
In the Factory, you can import directly data from a text file, defining the basic parsing options. You can do it by either:
Dragging the text file directly onto the Stage, automatically creating an Import from Text File task.
Dragging an Import from Text File task onto the Stage: this operation allows you to make a more precise import operation, as you can:
Import a single file, specifying the parsing and import options for the specific file;
Import multiple files together, concatenating them into a single table. This means that all files imported together must have the same structure.
A preview of the imported table(s) is always displayed.
Prerequisites
You must have created a flow;
In case you are importing multiple files, they need to have the same structure.
Procedure
Drag an Import from Text File task onto the Stage.
Double click the task and open the task menu.
Select whether you want to use a Saved or a Custom source.
Choose from the drop down list if you want to import the file from your computer (Local) or from a remote connection.
If you are importing from a Remote Filesystem, choose it from the list and then click on the pencil button to set the connection information required (only if you are using a Custom source). The tables are loaded in the Files tab.
If you are using a Local Filesystem, drag the Excel files onto the file area or click on the Select File button and choose the path.
Click on the Concatenation Type’s drop-down arrow to choose Inner or Outer concatenation.
Inner concatenation final table includes only attributes that exist in both tables.
Outer concatenation final table includes all the attributes, filling in any missing values if necessary.
Click on the Match Column by drop-down arrow and choose if you want to match them by their Name or by their Position.
Click on the Text Configuration tab and set the Parsing options and Import options, as shown in the table below.
Save and compute the task.
Parsing and Import options
Settings options | Description |
---|
Parsing options | Here you can set: Data separators (comma, semicolon, space, tabbing, other). Number separators, that are the thousands separator and the decimals separator. Missing string: enter the word you want to cancel from the dataset. Text delimiter: select ' or “ if these symbols have been used as string delimiters. They will not be included in the imported file. For example, the string “apartment” will be imported as apartment. This option will remove all instances of text delimiters in the string, and not only the initial and closing symbols. The only exception to this rule will be if the symbol is proceeded by a backslash. For example, "ad\"cb" will be imported as ad"cb, while "ad"cb" will be imported as adcb. The data type for values with string delimiters is nominal, and this data type will not be altered by the removal of text delimiters. For example, “3” will be imported as 3, but will remain a nominal value, instead of being converted to an integer. Use contiguous separators as a single one: select the check box if you want to force the parser to consider any possible group of adjacent separators as one in text files. For example, if you select this option, the string ‘1,2,,,3’, with the comma as a separator, will be parsed as 1, 2, 3, while if not checked it will be parsed as 1, 2, ‘’, ‘’, 3.
|
Import options | Start importing from line: the number of the line from which the importing operations will start. Stop importing at line: the number of the line where the importing operations will end. Leave the value 0 if you want the whole dataset to be imported. Get names from line: the number of the line from which the column’s names will be taken. Get types from line: the number of the line from which the attributes' types will be taken. Remove empty rows: select the check box if you want to remove the empty rows from the imported dataset. Remove empty columns: select the check box if you want to remove the empty columns from the imported dataset. Strip spaces: select this option if you want to remove spaces surrounding strings. For example, the string “ class “ will be imported as “class”. Add an attribute containing filename: select this option to add an extra column with the name of the file to the dataset. Use old computation data if the source file is not available: if selected, data from the previous computations will be used if the source table is not available. Continue the execution if the file is missing: if selected, computation of the task continues, even if the selected source files are not available. Wait until the target file is present: if selected, Rulex polls the target file with the frequency specified (sleeptime) until it is available. Number of records to preview: it specifies how many records the table preview will display.
|