Importing Data from a Json File

You can import data directly from a Json file, specifying a specific file.

You can import Json files in two different ways:

  • By dragging and dropping the Json source file directly onto the stage.

  • By using the Import from Json file task. The advantage of using the task is that you can configure the import options and also specify whether you want to import single or multiple files:

    • Single file: only one file is imported, specifying the file from which information will be taken.

    • Multiple files: in this case the files are concatenated to form a single table. Consequently all files imported together must have the same structure.


Prerequisites

  • You must have created a flow;

  • If you are importing multiple files, they must all have the same structure.


Procedure

  1. Drag the Import from Json File task onto the stage.

  2. Double click the task.

  3. Select whether you want to use a Saved or a Custom source.

  4. Choose from the drop down list if you want to import the file from your computer (Local) or from a remote filesystem.

    1. If you are importing from a Remote Filesystem, choose it from the list and then click on the pencil button to set the connection information required (only if you are using a Custom source). The tables are loaded in the Files tab.

    2. If you are using a Local Filesystem, click on the Select File button and choose the path where the file is stored.

  5. Click on the Add new path button, located next to the Select button, to add new paths where other files to be imported are stored. You can add as many paths as you want.

  6. Click on the X button, located next to the Select button, to cancel the corresponding path.

  7. Click on the Delete all paths button, located under the Add new path button, to cancel all the inserted paths.

  8. Choose the resources to import by clicking the Select button in the Path 1 section.

  9. Click on the Add new path button, located next to the Select button, to add a new path for a new resource. You can add as many paths as you want.

  10. Click on the X button, located next to the Select button, to cancel the corresponding path.

  11. Click on the Delete all paths button, located under the Add new path button, to cancel all the inserted paths.

  12. To import a single Json file:

    • In the Options tab either click Select file (filename) to browse to the Json file you want to import, and click Open, or manually enter the name of the file in the corresponding edit box: the Table preview pane displays the data that will be imported into Rulex, and is dynamically updated each time you change any of the available options.

    • Configure the options as explained in the Single file options below.

  13. To import multiple files click on the Advanced tab and configure the options as explained in the Multiple file options below.

  14. Save and compute the task.

Single file options:

Parameter Name

Description

Number separators

Select the separators used for Thousands (thousep) and Decimals (decsep) from the corresponding drop-down lists.

Missing string

Specify a character (or string of characters), which are used to indicate missing values.

Key for types

Enter the name of the attribute that defines the data type of the columns in the JSON file.

Remove empty rows

If selected empty rows are automatically deleted.

Remove empty columns

If selected empty columns are automatically deleted.

Strip spaces

If selected, all spaces are removed from strings. For example, the string ” age ” is then imported as “age”.

Compress white spaces

Select this option to remove extra consecutive spaces from within strings. For example the string "university    program" would be imported as "university program".

Add an attribute containing filename

If selected an attribute column is added to the table with the name of the file.

Case sensitive

Select the checkbox if you want the upper cases to be considered different from lower cases. Read below all the consequences on your data if this checkbox is selected.

Use old computation data if source file is not available

If selected data from the previous computations will be used if the source table is not available.

Continue the execution if the file is missing

If selected, computation of the task continues, even if the selected source files are not available.

Add group columns

If selected, a column is added containing additional information on the group tags belong to within the dataset (for example, 1,1,1,2,2,3,3 if there are 3 possible groups).
This additional information enables Rulex to recreate the nested structure during export operations.

Turn off smart type recognition

If selected, prevents automatic recognition of data types, leaving the generic nominal type. This option is useful when manual identification is preferable, for example when there is the risk of a code being misinterpreted as a date.

However, if data types have been specifically defined in incoming MS Excel files, these data types will be maintained, even when this option has been selected.

Wait until the target file is present

If selected, Rulex polls the target file with the frequency specified (sleeptime) until it is available.

Number of records to preview

Specify how many records the table preview will display.

Multiple file options:

Drag and drop files to concatenate

Drag and drop the required files here. All imported files must have the same structure.

The list of selected files correspond to the filelist parametric option.

Select files/directories

If you are importing data from a remote connection, click this option to select the files and directories from which you want to import data.

Concatenation type

Select the required concatenation type, which may be:

  • Inner (0), where only attributes present in both tables are included in the final merged table

  • Outer (1), where all attributes are copied, filling in any missing values if necessary.

Match columns by

Select whether you want to match columns by:

  • Position (0), to select the required column according to the position in the sheet

  • Name (1), to select the required column according to its name.


Focus on the Case Sensitive checkbox

We encourage you not to select the Case Sensitive checkbox, as it has a significant impact on the data analysis.

If the Case Sensitive checkbox is selected, the number of distinct values in a column can increase, causing a slight difference in the data analysis.

In fact, if we have two strings, 'Word' and 'word', they will be considered as two distinct values. This means that, if you write a function valid for the string 'word', it won't be valid for the string 'Word' too.

It might cause consequences also on attributes, because if we want to apply a function to the $"Word" attribute and we type $"word" in the formula bar, an error occurs during the computation process.