Waylay IO is a low-code platform that offers the necessary tools to help developers and data scientists to experiment with data and create innovative solutions based on its superior automation technology stack. With Waylay, it is possible to deploy machine learning models from all state of the art machine learning frameworks, such as sklearn, TensorFlow, PyTorch or XGBoost and perform inference on live or historical data to guide automation workflows.
Deploying machine learning models - using our APIs or python SDK - is a rather simple task, but it still requires some level of knowledge of machine learning concepts and coding, like python and jupyter notebooks.
Nevertheless, when data scientists work on a particular use case such as anomaly detection, prediction modeling etc. they are faced with other challenges first:
- Which type of ML algorithm should be used for this problem?
- Which ML platform fits the problem best?
- Is the quality of the data good enough to solve this problem?
In this blog post, we deal with the question of how we can, as Waylay, enable data scientists to create a machine learning model without having experience in programming. We will be looking at no-code ML tools and see how they can easily fit in the Waylay automation process. No-code platforms are applications that enable non-technical users to build applications, or in this case ML models, by dragging and dropping pieces of software or data on canvas. These ML/AI platforms enable users without any previous coding experience or even machine learning experience to build a machine learning model starting from a dataset using no-code.
With BigML, as one example of this kind of AI platform, you can create a machine learning model from scratch in a simple way without needing to know a lot about coding, using their dashboard (no-code) or their Python SDK (low-code). Through this application, it is possible to experiment with a certain dataset and try out different ML algorithms and fine-tuning hundreds of hyperparameters.
In this article we address how to create an anomaly detection model in just a few steps and without using any code. Then we deploy the model using Waylay Python SDK.
The process of creating a model using BigML is the following:
- A source or data source is created from a database, csv-file, Google Drive, …
- A dataset is created from the source, enabling feature selection, sampling, filters and much more
- A model is created from the dataset
- The model can be used to perform online predictions, batch predictions, or be deployed to Waylay
The data to be used in this example is a csv file containing 1650 samples of temperature and light ambience data similar to the data used in this notebook tutorial: https://github.com/waylayio/demo-general/blob/master/byoml/tutorial.ipynb
Documentation on creating the csv file can be found here: https://docs-io.waylay.io/#/features/etl/. In the future, data export from Queries directly and through Python SDK will also be enabled.
The first step is creating a Source in the BigML dashboard by selecting the csv file.
When the created Source is clicked, you can include and exclude preferred features for use in your dataset. We will only use the features temperature and light-ambience, so we deselect the other features. It is also possible to sample the Source, but we will use 100% of the samples in this use case.
The dataset is now created and can be visualized in the dashboard.
The dataset can be visualized by both a summary of the features and a scatterplot. When viewing the scatterplot (see Figure 4), additional information is available like Pearson correlation coefficient and Spearman’s rank correlation coefficient. These factors calculate the correlation between both features. We notice that Spearman’s coefficient is higher than the Pearson correlation coefficient, this is because Pearson is only looking for a linear relationship between the features.
Some additional options are available before creating the model, e.g. adding new features, joining datasets, applying filters... but we will keep our model simple by using the dataset as it is now.
Click the button to create an anomaly and select 150 anomalies - around 10% of our current dataset.
The model can be visualized by clicking the model in the dashboard (Unsupervised ⇒ Anomaly) and an anomaly score can be calculated when a single (or multiple through batch anomaly score) data point is given as input. We are most interested in downloading the model. Since we will use Waylay Python SDK to upload the model to BYOML, we will choose to download the model using Python. The code and anomaly name can be found in BigML’s dashboard. Please note that an API key is needed and needs to be set in the environment variables BIGML_USERNAME and BIGML_API_KEY in order to execute the following code.
The following figure shows what the model sees as an anomaly, based on the complete training set.
To upload the model using Waylay Python SDK, we first need to wrap BigML’s model in a wrapper class which contains the model and a predict() method which is called upon inference. Then we pickle this wrapped model using the dill library and we create a requirements.txt file containing the Bigml library.
The folder containing both the requirements file and the pickled model will then be uploaded to our BYOML server as follows:
We test a sample prediction:
NOTE: The input of the prediction is a numpy array instead of a dictionary.
A detailed example can be found in the notebook:
This deployed model can eventually be used to make online predictions. For more information, check the video below:
For more information about BigML, please check their documentation:
For more information on Waylay, please check our documentation: