To perform the The Data Science Engine Step integrates machine learning and predictive modeling into the Kyvos Reporting Query Object. This step allows you to perform data transformations using Data Science engines, which can enhance your data by adding variables, cleansing data, or making predictions.

Overview

The Data Science Engine step , you must have requires a Data Science connection formed to be established to any of the supported Data Science Engines. Refer to the Connections section to understand how to connect to a Data Science Environment.The Data Science Engine step takes 2 inputs. This step helps you to transmit , such as R or Python. It helps send your data to the Data Science Engines Engine to perform tasks like training machine learning and modelling. You can add models, predicting outcomes, and transforming data using advanced algorithms.

The Data Science Engine step can be placed before or after adding any other transformation step.

Adding Data Science engine step at Query Object level helps you when predictions on your data are adding new variables and columns in tables. For example, in a market basket analysis, the clusters that would form may require new columns & variables in the table. This can be achieved while data preparation and hence such algorithms need to be defined at Query Object level.

You can also perform Data Cleansing and other Data Science engine related transformation tasks by creating script at Query Object level.

Data Science Engines train on your data to bring out predictions. You can input Training as well as Prediction data based on the below conditions while transforming your data.

If you have separate data to train and predict you need to add data for training as well as prediction.
If you want training and prediction on the same data, only one data source can be added.
If you already have a trained model in your script, you need not add training data.

Figure 17: Data Science Engine Step

...

Properties

...

Property

...

Values

...

Comments

...

Data Source Engine

R Job

...

Here you can select the Data Science Engine you want to use

...

Script

...

Sample Script

...

Here you can see the Data Science Engine script you have created

...

Edit

...

Type Yourself

Click the Edit button to create Data Science Engine script or edit an already created one.

When you click the Edit button, the script editor box opens. Here you can view the fields in your script and write R script for relevant fields. You can also verify your script to check if it is error-free.

Guidelines for writing R Script

The script needs to have sections for Training and Prediction. These sections should start with #. These place holders should be surrounded by <%%> for Intellicus to be able to parse and understand the modularization. For example, #<% TRAINING.SECTION %>
The first line of the Training and Prediction script should be for reading the CSV and the last line of Prediction script should be for writing. Argument passed in the reading section should be <% Stepname.data %>. Example,Read.csv(‘<% Train.data %>’)
Previous step data should be referred as ‘StepName.data.’ For example, in the transformation area if you created the step as Train, the input must be ‘Train.data.’
The model created is by default saved as ‘myModel.’ This is a mandatory name to the model you create as it is referred to while communicating with Data Science engines.
The training will only happen if the training script is provided, otherwise it will be assumed that a trained model is used.
If a trained model is used, it is mandatory for user to provide a prediction script.

Once you have added a script, you can click the Verify button to check if it is appropriately written. Click OK. You can further click Save or Save As to save your query object to use it in reporting.

An example script for your reference is given below:

#<%TRAINING.SECTION%>

trainingDataset = read.csv(‘<%Train.Data%>’)

library(randomForest)

myModel = randomForest(x = trainingDataset[1:15], y = trainingDataset$TEMP,ntree = 500)

#<%PREDICTION.SECTION%>

predictionDataset = read.csv(‘<%Predict.Data%>’)

y_pred = predict(myModel,steps, depending on the data flow. It is especially useful when the data being processed will be augmented with new columns or variables, like in market basket analysis or cluster analysis.

Step Input

This step can take two inputs:

Training Data: Data used to train machine learning models.
Prediction Data: Data used for predictions based on the trained model.

If you are performing both training and prediction on the same dataset, you only need one data source.

If a pre-trained model is already available, you do not need to provide training data again.

Properties of Data Science Engine Step

Property	Comments
Data Source Engine	Choose the Data Science Engine to use (e.g., R).
Script	The script that defines the training and prediction logic. Click Edit to create or modify the script.
Training Section	The section in the script for training the model (using training data).
Prediction Section	The section in the script for making predictions using the trained model.

Script Writing Guidelines

The script must include two sections: Training and Prediction.
Training: This section reads training data, applies algorithms (e.g., random forest), and generates a model.
Prediction: This section reads prediction data, applies the trained model, and writes the prediction results.

Important Script Rules

Placeholders: Sections should be modularized with placeholders like <%% TRAINING.SECTION %> for training and <%% PREDICTION.SECTION %> for prediction.
Data Reference: Use the format <% Stepname.data %> to refer to data from previous steps (e.g., <% Train.data %>).
Model Naming: The model must be named myModel, as this is the default reference name used by Kyvos Reporting.
Training: Training only occurs if a training script is provided. If no script is provided, Kyvos assumes a pre-trained model is used.
Prediction: If a trained model is used, a prediction script is mandatory.

Example Script

Code Block

#<%TRAINING.SECTION%>
trainingDataset = read.csv('<%Train.Data%>')
library(randomForest)
myModel = randomForest(x = trainingDataset[1:15], y = trainingDataset$TEMP, ntree = 500)
#<%PREDICTION.SECTION%>
predictionDataset = read.csv('<%Predict.Data%>')
y_pred = predict(myModel, data.frame(predictionDataset[1:15]))

...


predictionDataset$ExpectedTemp <- y_

...

pred
write.csv(predictionDataset, file = '<%Predict.

...

Data%>')

Verification and Saving the Script

Verify: After writing your script, click the Verify button to check for any errors.
Save: Once the script is verified and error-free, click OK and save the Query Object for use in reporting.

Version	Old Version 1	New Version Current
Changes made by	sunny.gupta	sunny.gupta
Saved on	Nov 12, 2024	Dec 25, 2024

Versions Compared

Key

Overview

Guidelines for writing R Script

Step Input

Properties of Data Science Engine Step

Script Writing Guidelines

Important Script Rules

Example Script

Verification and Saving the Script

Page Comparison

Versions Compared

Key

Overview

Guidelines for writing R Script

Step Input

Properties of Data Science Engine Step

Script Writing Guidelines

Important Script Rules

Example Script

Verification and Saving the Script