The Data Science Engine Step integrates machine learning and predictive modeling into the Kyvos Reporting Query Object. This step allows you to perform data transformations using Data Science engines, which can enhance your data by adding variables, cleansing data, or making predictions.

Overview

The Data Science Engine step requires a Data Science connection to be established to any of the supported Data Science Engines, such as R or Python. It helps send your data to the Data Science Engine to perform tasks like training machine learning models, predicting outcomes, and transforming data using advanced algorithms.

The Data Science Engine step can be placed before or after other transformation steps, depending on the data flow. It is especially useful when the data being processed will be augmented with new columns or variables, like in market basket analysis or cluster analysis.

Step Input

This step can take two inputs:

Training Data: Data used to train machine learning models.
Prediction Data: Data used for predictions based on the trained model.

If you are performing both training and prediction on the same dataset, you only need one data source.

If a pre-trained model is already available, you do not need to provide training data again.

Properties of Data Science Engine Step

Property	Comments

Property	Comments
Data Source Engine	Choose the Data Science Engine to use (e.g., R).
Script	The script that defines the training and prediction logic. Click Edit to create or modify the script.
Training Section	The section in the script for training the model (using training data).
Prediction Section	The section in the script for making predictions using the trained model.

Script Writing Guidelines

The script must include two sections: Training and Prediction.
Training: This section reads training data, applies algorithms (e.g., random forest), and generates a model.
Prediction: This section reads prediction data, applies the trained model, and writes the prediction results.

Important Script Rules

Placeholders: Sections should be modularized with placeholders like <%% TRAINING.SECTION %> for training and <%% PREDICTION.SECTION %> for prediction.
Data Reference: Use the format <% Stepname.data %> to refer to data from previous steps (e.g., <% Train.data %>).
Model Naming: The model must be named myModel, as this is the default reference name used by Kyvos Reporting.
Training: Training only occurs if a training script is provided. If no script is provided, Kyvos assumes a pre-trained model is used.
Prediction: If a trained model is used, a prediction script is mandatory.

Example Script

#<%TRAINING.SECTION%>
trainingDataset = read.csv('<%Train.Data%>')
library(randomForest)
myModel = randomForest(x = trainingDataset[1:15], y = trainingDataset$TEMP, ntree = 500)
#<%PREDICTION.SECTION%>
predictionDataset = read.csv('<%Predict.Data%>')
y_pred = predict(myModel, data.frame(predictionDataset[1:15]))
predictionDataset$ExpectedTemp <- y_pred
write.csv(predictionDataset, file = '<%Predict.Data%>')

Verification and Saving the Script

Verify: After writing your script, click the Verify button to check for any errors.
Save: Once the script is verified and error-free, click OK and save the Query Object for use in reporting.

Kyvos 2024.11.x

Data Science Engine Step