Waylay provides a flexible orchestration platform for the IoT, with a strong focus on real-time data acquisition, orchestration and rule-based actuation. Complementary to real-time data processing, access to historical data for offline analytics has an important role in an overall IoT solution. We explained this in our previous blog post The hidden powers of IoT analytics. Part 1 – the data export, where we have also presented our related ETL-export service that provides time-series and metadata exports on a daily, weekly or monthly basis.
Once you have the data export, you next need to get it into a professional data pipeline first and then perform the actual offline analysis. You can either rely on your internal resources for these two steps, or work with our data science team.
The Waylay Apache Beam ETL-BI pipeline
One of the core competencies of the Waylay platform is that it is able to ingest multiple-source, multiple-format data across vendors and technologies. And because Waylay provides lightweight integration from so many sources, the Waylay time-series data and metadata exports have a very generic structure. The first stages of a data integration pipeline should therefore provide insight into the underlying data models and map the data into more concrete data assets for your analytical use cases.
Our reference ETL pipeline uses Apache Beam, an open source big data processing framework, to load Waylay data into an analytical database (Google BigQuery), handling responsibilities such as:
- Anonymisation of sensitive data (such as resource id or names)
- Normalization of the representation of numbers, quantities, timestamps etc.
- Statistics on the metadata properties of the IoT resources
- Detection of the ‘metric classes’ that underlie the time-series data and classify the resources
- Document the input and target data models of your ETL pipeline
- Create views and partitions to handle huge datasets
If you’d like to learn more about how our pipeline works, on our technical documentation site we provide a concrete example, where we take open data that we use for our customer The City of Ghent and go through the entire process in order for you to see how we get it ready for offline analytics use cases.
What happens next?
Once your data is in the analytical data store, it is ready for multiple use cases:
- Business analysts can run analytical reports with BI reporting tools such as Tableau or Power BI
- Data scientists can use their R or Python tools to explore the data sets and provide ad hoc visualisations
- Data can be used to train and test machine learning models (e.g. using Tensorflow).
The parameters of trained models can be fed back into the Waylay rules engine, integrating prediction and anomaly detection into your Waylay tasks. This way you close the loop between offline and real-time and truly benefit from the combined powers of BI/AI and automation.
Our professional services team is ready to assist you in running sophisticated analytics on massive volumes of IoT data and to get insights to make better and more accurate decisions for IoT applications and machine learning use cases.