Items utilized in this project

Hardware component:

  • Webcam, Logitech® HD Pro Webcam, Logitech® HD Pro

Software apps and online services:


Often, companies choose to guard their premises using a video camera. This video stream is then usually uploaded to be processed in the cloud. When using high quality cameras with high frame rates, the cost to send all this data over the internet can increase quickly, leading to non-negligible costs. Because of this, a customer asked us to develop a proof-of-concept: a system where a first processing step, namely a rough classification, happens locally - on an edge device, such that the amount of data to be sent over the internet is greatly reduced.

To make this happen, there are a few things to take into consideration. Here's what we need:

  • A video camera of some sorts that can deliver frames periodically.
  • A system that can track which frames have already been processed and then group these frames into batches.
  • A machine learning model that can process a batch of frames, distinguishing interesting frames from frames where nothing happens.
  • A way to store the interesting images and possibly notify a security guard when something unusual happens.

Now, as this project was part of my summer internship at waylay, I - of course - used a camera that was pointed at the couches in our office, and used it to spy on my colleagues. My goal: get a message every time somebody sits down on the couch, so no break goes unnoticed.

The machine learning model

As the goal of this project is to detect changes within a video stream, we can use the fact that the frames are sequential in time to split the background from the foreground. Then, if things are changing in this calculated foreground, we classify this frame as interesting. This was done by following this example:

How to Use Background Subtraction Methods using OpenCV

Here's an example. First, this is a picture of the couches when nobody's sitting there.

Empty couch

Here is the black-and-white difference image (after some processing) of when a fellow intern was sitting there (or is he taking a nap?).

Person detected in couch

To obtain a score, the amount of white pixels in comparison with the total amount of pixels is used. Then, a threshold should be chosen: how many of the pixels should be white before we consider an image to be interesting? This is a parameter of the model that was adjusted along the way, to fit our needs. For example, when it is a bigger problem to miss an interesting frame than it is to throw a false alarm, the threshold can be decreased a bit. 

The architecture of the system 

It might seem as if as soon as you have such a machine learning model, the work is done. But you still need a way to upload all the images, keep track of the ones that have been classified, divide them into batches and take action when an interesting frame is detected. This is where I came in. First, I made a prototype on the Waylay system to get to know it a little better and then I converted it to work on the edge version of the Waylay system, called TinyAutomator, to do it all locally. 

In the picture below you can see the complete template of the task that does all of this. It looks like a lot is going on, but I'll walk you through it!

The template consists of a few important parts that I will discuss: first, there's the selection of the frames which will be processed. Then, there's the collection and classification of these frames and lastly, there's the question of what to do now that all frames have been assigned a score. Broadly speaking, these three parts correspond to the three rows of the template shown above. A few steps have to be taken before this task can do its thing. The frames are collected from the camera, which I did using an SSH connection. In an end product the camera would be connected via cable to minimize internet bandwidth needed, but the view from my desk wasn't that well suited for spying on colleagues. Then, these frames are uploaded to Waylay storage using the python SDK and the paths to the frames are added to a resource to keep track of them. Notice, these are usually operations that go over the internet, but as I am using TinyAutomator on a laptop, it all happens locally, sparing bandwidth. 


When selecting which frames to add to the next batch, there's two things we need to know: what is the latest frame to have been recorded and what is the latest frame to have been classified? This is what the first two blocks do. The paths to the frames are stored as a time series in a resource, and a derived resource contains these same frames with a score added, as was calculated by the model. The latest values of these time series are fetched in the first blocks and then passed to the third block, which calculates a time slot from which we will use the frames. Generally, it checks if not too much time has passed since the last prediction, and adjusts its time frame depending on that. If within reasonable bounds, it will process the batch of frames following the one last processed. This way, if the system was down for a little while, the frames that were collected during this period will still be processed. 

Collection and classification 

Now that we have a time frame to collect the images from, we make another request from the resource that contains the paths of the images. Then, we convert these paths into usable urls, that are then passed to the model for classification. So, after these steps we have the urls of the frames, with a score added by the model. As referring to frames by a url is not very clean, we then change our data to contain the path again, next to the score. 


Now we have a score per frame that represents how much change it contains. First, we store this score in a derived resource. This can then be used for determining the time frame, as explained before. Then, we want to take action on the frames that were considered interesting enough, so we check the score of every frame against a threshold. The length of the resulting list is checked and if it isn't empty, a message is sent, containing the names of the frames that were found. Other options would be to collect the interesting frames and send these for further investigation or to only save the frames classed as interesting. This is dependent on the goal of the project and can be easily modified. 


The system described here takes the individual frames from a video camera and does some early classification locally using Waylay's TinyAutomator. This can greatly lighten the load of images to be sent over the internet. The way it is implemented, it is easy to change which model is used or what actions are taken, without having to change the complete system.

Republished from by the permission of the author.