Event Camera Object Tracking and Flow estimation

Event based vision data and processing methods

Author
Sanam Shakya
Senior Embedded and Algorithm Engineer
LogicTronix Technologies

1. Event based Vision | Neuromorphic Vision

Neuromorphic vision or event based vision is one of researched and application oriented domain nowadays for vision based applications. In this article we are going to discuss about different approaches and methods by which we can process event data for the targeted application.

2. Event Based Vision – Data

Event data captured by the event camera is generated by contrast change in the field of view of the event sensor. It is generated at the position in the event sensor when there is change in light luminosity from bright to dark or dark to bright.

In comparison to frame based image sensor, which generates image frame for whole sensor resolution, event sensor only generates data for specific pixels of sensor where there is change of intensity.

For each activated pixel in event sensor, event data, e(x, y, p, t) is generated.

Here x, y – sensor pixel co – ordinates, p – polarity of event signal, t – timestamp

Here is the pictorial comparison of event data stream compared to the frame based camera :

Image courtesy : Fast Event-based Corner Detection Elias Mueggler [Ref]

Further event data is generated from event sensor as event stream and data rate of stream depends on motion of objects in FoV of event sensor. For low event conditions data rate with be below 1 KeV/sec and for high event condition data rate may go upto 10 MeV/sec.

In our test we are using Prophesee Metavision Starter Kit, which uses IMX636 event camera. Kit consists of KRIA KV260 board consisting of ARM Processor (PS) and FPGA fabric called Programmable Logic (PL). IMX636 sensor is connected to KRIA KV260 board through MIPI interface.

Further event data is captured using MIPI IPs, Event Data handler IPs and DMA IPs running in FPGA fabric while application is developed with Metavision SDK running on PS. Here is the block diagram representing hardware and software architecture:

Fig : Hardware and Software architecture for Event Data Capture

3. Data Analysis or Processing Pipelines

Event data processing can be divided into following steps :

3.1. Pre – Processing : Event Data pre processing or Event Frame Generation

3.2. Event Data or Event Frame Processing

3.3. Output Aggregation and Visualization

3.1. Pre – Processing

Event data consists of a stream of activated pixel wise data so depending upon further processing : Asynchronous Event or Frame, pre-processing is need to be performed on the event data stream.

For Asynchronous event processing, processing is done on arrival of each stream. But for real time due to large event data for 1 Mev/sec will lead to full CPU load. So to decrease CPU load one can do Temporal and Spatial filtering to decrease the high event data rate.

For frame based processing, event data is transformed to frame by accumulating event data for periodic time interval and using various frame generation algorithms like pixel wise histogram of accumulated events and Time surface/ Surface of Active Events. For frame based algorithm, metavision histoframe and time surface generation is available in Metavision OpenEB SDK.

3.2 Event Data or Event Frame Processing

Event Data processing can be done in two ways, one processing on arrival of each event, which is asynchronous and depends on event generation. Other is the frame based processing where event data is accumulated for certain time period and converted to frames, which can be further processed using vision processing algorithms. For frame based processing, there are various algorithms but most common are time surface frames and histogram of event frames. Both processing has its own pros and cons.

In case of asynchronous event data processing, high precision time processing can be done on events but as the data rate increases, it increases CPU load. So for high data rate, we need to offload some event task to accelerators like GPU or FPGA.

In case of frame based processing, already available vision algorithms can be used on event frames but as events are accumulated for time periods, high precision processing is limited to minimum accumulation time period.

3.3 Output Aggregation and Visualization

Now after event data processing, output need to be processed and visualization task need to be done. These tasks depends upon application requirements.

Further in 3.1, we have developed a openCL based accelerated sampling methods which is explained in following section:

OpenCL based sampling

Overview :

Event data collected at accumulation time are sent to various processing pipelines. These processing pipelines process data on the single event data. So in this section, decreasing redundant data in the collected time frame is done using OpenCL kernels.

Here processing is done on the arrival of each event which lead to high CPU usage as the events datarate becomes greater than 1Mev/secs. Further data capture and visualization tasks are also running in CPU based user application. This lead to high latency on event data capture and processing pipeline, even we run these task in separate threads. So to decrease the CPU load, sampling task is transferred to accelerator hardwares(GPU, FPGA) and uses OpenCL kernels. This will run the sampling preprocessing task in accelerator hardware and passes the processed event data stream to the user application.

Algorithms :

For event data accumulated for certain time period consist of repeated data at certain event data. So sampling algorithm removes the repeated data by counting the events by storing count of events in histogram array.

Kernel 1 : Histo counting based Kernel

Note :

Basic histogram counting algorithm sequential running on CPU:

For array containing data with integer ranging from 0-256, with data array containing DATA_SIZE data. Histogram of data[] array is done by running loop and incrementing array based on data index.

for( int i = 0; i < DATA_SIZE; ++i) {

bin[data[i]]++;

}

Here : bin size must be 256 to store count for values ranging from 0-256.

So in case of 2D data of x width and y column, bin size = x * y

Kernel Input : Input data array containing data ranging from 0-255

Kernel Output : Histogram for input data or Count of data in the input array

Issues : Not able to create histo data for 1280×720=9,21,600 co-ordinate points in OpenCL kernel

Solution : Map event data to new coordinate using hash function

Kernel 2 : Unique coordinates based on hashing

Maps data accumulated in event buffer to Hash Mapped array of size H.

In our case we used following hash function to map (x,y) coordinate :

(x * 1619 + y * 31) % 8192;

For each event hash value is calculated and used as index to keep track of repetition
If hash value is new store the event data in sampled array, else if hash index is already filled skip the event data

Input : Event coordinates array

Output : Sampled Event coordinates array

Source Code Link : https://github.com/LogicTronixInc/Event-Camera-Clustering-and-Optical-Flow-Estimation/tree/master/event-cam-pre-processing-opencl/event-cam-sampling

Testing :

Event data without Sampling

Here input data array size = 16384

Event data with Sampling :

After sampling, array size = 4586 corresponding to unique coordinates in the input array

So sampling decreases the sample count while maintaining the feature in input source event stream.

References :

EvDownsampling : A Robust Method For Downsampling Event Camera Data Anindya Ghosh , Thomas Nowotny , and James Knight
Event-based Spatio-temporal down-sampling Anindya Ghosh Thomas Nowotny James Knight

Paper Summary : Use Integrate and Fire algorithm, which uses concept similar to event sensor neuromorphic signal, where events are triggered at point x, y after accumulation of certain events above threshold.

“Each down-sampled pixel has a separate accumulator for each event polarity and when these reach a threshold, an event of the correct polarity is emitted and the counter is reset to zero.”

4. Object Tracking and Flow Estimation for Event Data

Tracking and Flow estimation are common problem in computer vision. It is used to track Multiple objects and estimates the position of objects in the field of view of the camera. One of the application of such system is used to to Multiple object tracking (MoT) in the Advanced Driver Assistance Systems (ADAS) and autonomous vehicles.

For object tracking and flow estimation, first detector needs to detect some feature in the input data which can be tracked and monitored.

In case of event data, creating a robust detector is a challenge due to sparse and dynamic nature of the event data. So to create of robust detector we have used clustering technique and used cluster centroids to track objects. We have tested already available clustering algorithms like DBSCAN, Optics and K-Means clustering in accumulated event data. Further we have tested asynchronous incremental clustering as mentioned in [Reference : LINK] to create event cluster.

Here are the various Clustering Algorithms for Event data.

Event Data Clustering Algorithms

K-Means clustering

Overview:

Clustering feature : Mean or centroid
Distance based clustering

Algorithm:
Initialization

Start with K random centroids, cluster array and clusterID

Update:

Adding to the cluster – For each input points calculate distance from each centroid, find minimum one and append to the cluster with minimum distance – Update new centroid

Testing:

Only tested with csv example data

DBSCAN clustering

Overview

Density-based spatial clustering of applications with noise (DBSCAN)
Clustering by running following feature :

epsilon (radius distance of points)
minimum core points inside epsilon boundary
adding new points if it satisfy above otherwise consider as noise

OPTICS Clustering

Overview:

Ordering Points To Identify the Clustering Structure
Used ordered data structure like KDTree to store data first
Clustering feature : epsilon, minimum points and core compoenets

Algorithms:

Insert data to tree data structure boost::rtree
Initialized epsilon and minimum points and iterate to get core points
Compute reachability and sort
Create reachability graph (optional for debugging)

Testing :

Input : event data csv file, captured from metavision sdk event capture app
Output :

Color coded cluster Image, Reachability plot
Extracting feature of cluster for tracking : centroid of clusters, variance, size of clusters

Test 1 : Vehicle raw data csv file

Test 2: Camera raw data in target at sitting position with hand in motion

Consecutive cluster feature capture:

//Cluster ID, Cluster Size, Centroid X, Centroid Y, Variance X, Variance Y

0,43,1016.58,433.047,68.4294,338.044

1,14,943.429,335.857,16.5306,60.1224

2,563,683.648,300.583,1726.87,6276.77

3,4,442,492.75,4.5,3.6875

4,17,762.824,330.588,52.9689,96.0069

5,253,845.281,203.668,240.21,1508.21

6,11,844.909,240.727,39.719,23.2893

7,783,1152.44,502.625,1128.66,11874.3

8,4,1145,452.5,6.5,14.25

9,165,810.855,66.7394,247.227,906.278

10,2,422,458,1,0

11,123,1034.28,297.659,75.0293,694.387

12,111,919.676,300.36,156.273,1004.21

.....

0,769,846.888,151.999,6943.72,3221.43

1,101,1018.52,301.079,31.3583,653.32

2,58,761.603,67.0172,116.86,69.7756

3,48,836.188,259.354,19.2773,133.604

4,179,380.341,409.514,546.091,138.876

5,26,853.269,404.385,7.50444,116.006

6,179,826.151,330.888,2918.83,404.289

7,57,855.737,464.088,13.3518,308.22

Test 3: Event camera data with moving target

Reference (event based ) : – https://github.com/ManWithABike/OPTICS-Clustering

Source code link : https://github.com/LogicTronixInc/Event-Camera-Clustering-and-Optical-Flow-Estimation/tree/master/event-cam-clustering/optics-clustering

Point cloud clustering

Overview:

Using timestamp of event as third parameter, to convert 2D event data into 3D point data. Then 3D point data can easily be used into point cloud library for visualization and point cloud based algorithms.

For clustering event based point cloud data is stored in KDTree – binary tree based data structure which stores 3-D spatial data by segregrating with multi dimensional planes/points in each tree node. Then the DBSCAN algorithm is used to cluster points in KDTREE.

Using graph based clustering : KDTREE with DBSCAN

KDTree – It is a binary tree based data structure to store spatial data, where each node in a tree divides spatial space in dual space. Then one can use range search feature of KDTree to cluster the data.

For clustering density based algorithm – DBSCAN is used.

Source Code Link : https://github.com/LogicTronixInc/Event-Camera-Clustering-and-Optical-Flow-Estimation/tree/master/event-cam-clustering/point-cloud-clustering

Test 1:

Input : For event cam data in sitting position – pcd file from Event data in *.xyz file for 10msec captured data

Test 2 :

Input : For event cam data with motion Output : – Point cloud cluster image using PCL library – Output cluster csv file With tolerance or epsilon value changed to 20

Further for feature in point cloud cluster Principal Component Analysis(PCA) was used. Here are the PCA feature computed for the point cloud clusters.

Point cloud feature extraction using Principal Component Analysis (PCA)

Incremental Clustering

Based on paper : “Asynchronous event-based clustering and tracking for intrusion monitoring in UAS J.P. Rodrı́guez-Gómez, A. Gómez Eguı́luz, J.R. Martı́nez-de Dios and A. Ollero”

Overview

This is an asynchronous event data clustering algorithm, which processes event data as the event data is triggered and captured. It uses simple Nearest Neighbour clustering on the basis of Manhattan distance from cluster centroid with thresholding within the radius.

Algorithm

Store event data into input buffer On each event: Update cluster arrays
Update event data by deleting/pop_front data from cluster array based on time of first event in time buffer queue
Compute distance from other cluster centroids and assign to cluster with distance less than threshold
Add event point to cluster if distance less than threshold
If point assigned to more than one cluster merge cluster
Delete cluster if empty
Update event to cluster based on assignment queue

Fig : Block Diagram for Algorithm

Testing :

Clustering config: Radius threshold With threshold = 15; 

- Timing estimates for algorithm 

- 7-15 msec for 320 events corresponding to 1 msec with 33 clusters 

- Test log : 

   count events callback : Current time in milliseconds in before event data updatecall is: 1731391267422

   count events callback : Current time in milliseconds in after event data updatecall is: 1731391267423

   Event cluster min N : 5

   cluster queue size : 36

   There were 46 events in this callback

   count events callback : Current time in milliseconds in before event data updatecall is: 1731391267425

   count events callback : Current time in milliseconds in after event data updatecall is: 1731391267438

   Event cluster min N : 5

   cluster queue size : 34

   There were 320 events in this callback count events callback : Current time in milliseconds in before event data updatecall is: 1731391267439

   count events callback : Current time in milliseconds in after event data updatecall is: 1731391267453

   Event cluster min N : 5

   cluster queue size : 34

Source Code Link : https://github.com/LogicTronixInc/Event-Camera-Clustering-and-Optical-Flow-Estimation/tree/master/event-cam-clustering/async-incremental-clustering

Test Outputs :

On raw traffic event data :

Here, Red data corresponds to current event data and trailing colored clusters corresponds to incremental event data clusters.

On raw pedestrian event data :

Here also, Red data corresponds to current event data and trailing colored clusters corresponds to incremental event data clusters.

References :

Asynchronous event-based clustering and tracking for intrusion monitoring in UAS J.P. Rodrı́guez-Gómez, A. Gómez Eguı́luz, J.R. Martı́nez-de Dios and A. Ollero

Event camera resources :

https://github.com/uzh-rpg/event-based_vision_resources
https://github.com/vlislab2022/Awesome-Events-Deep-Learning

OpenCL Based Clustering kernels

Overview

Asynchronous incremental clustering processing is CPU intensive and for high event datarate it leads to high latency. So to decrease the CPU load, clustering processing task is divided and modified to run on hardware accelerators (GPUs) with OpenCL kernels.

For this clustering event data task was done by passing event data in following openCL kernels

Kernel 1: Distance Calculation and Cluster Assignment
- Input : N events, Cluster Centroids
- Output : Assignment array consisting cluster IDs.
Kernel 2: Cluster Update
- Input : Assignment array and N event data
- Output : Updated cluster array
Kernel 3: Cluster reduction and getting the centroid
- Input : Cluster array
- Output : Cluster Centroids array

Algorithms

Get fixed N samples of event periodically
Perform distance calculation and cluster assignment using Kernel 1
Update the cluster array depending the assignment kernel

Fig : Block Diagram for Algorithm

Source Code Link : https://github.com/LogicTronixInc/Event-Camera-Clustering-and-Optical-Flow-Estimation/tree/master/event-cam-clustering/async-incremental-clustering

References :

Asynchronous event-based clustering and tracking for intrusion monitoring in UAS : Link : https://zenodo.org/records/5082144

Paper summary :

Algorithm discussed: (ROS based)
Asynchronous Event based Feature tracking using eFAST feature creation on time surface :

code : https://github.com/uzh-rpg/rpg_corner_events/tree/master

E. Mueggler, C. Bartolozzi, D. Scaramuzza: Fast Event-based Corner Detection. British Machine Vision Conference (BMVC), London, 2017. Link : https://www.research-collection.ethz.ch/bitstream/handle/20.500.11850/277131/RAL2018-camera-ready.pdf

Paper Summary :

Feature based on eFast algorithm
Do local edge finding by finding corners in the pixels in the neighbourhood of the current event pixel
Use Corner points as feature- “Along each circle, the algorithm searches for segments of contiguous pixels (arcs) that are higher (i.e., they have a more recent timestamp) than all other pixels on the circle.”

https://rpg.ifi.uzh.ch/docs/BMVC17_Mueggler.pdf

Graph based algorithm for tracking Tracking algorithm implementation : https://github.com/grvcPerception/grvc_ef_tracker

Event based clustering :

Paper Summary :

Not tree based but array based
Incremental clustering with deleting old ones based on time
Event based Attention Priority Map
As in this paper, the camera is fixed on Drone, so to detect moving objects neglecting the camera movement one has to detect and cluster points around moving object and for that thresholding is done on base of maximum motion events points.
Link : https://github.com/grvcPerception/grvc_e_clustering/tree/master

Multiple Cluster tracking and Optical flow estimation from centroids of Event Clusters

Overview

Centroids of event clusters with down sampled preprocessing is used as feature for tracking and generating Optical flow. Further tracking and optical flow is calculated using Luenbeger’s estimator.

Algorithm

Algorithm can be divided into two parts:

Clustering

Event data are sampled to decrease repeated event data in periodic time frame
Perform the incremental clustering with sampled data
Calculate centroid for clusters

2. Tracking and Flow estimation

Use Luenbeger’s estimator for estimating flow using constant velocity model and updated centroid of clusters
Update tracks of cluster centroids
Visualize centroids tracks and estimated flow

Test 1 : On traffic monitoring raw file

On pedestrian test raw file

Source Code Link : https://github.com/LogicTronixInc/Event-Camera-Clustering-and-Optical-Flow-Estimation/tree/master/event-cam-tracking/event-cam-tracking-with-estimator

References :

Prophesee Sparse Optical Flow Algorithm :- https://docs.prophesee.ai/stable/algorithms/optical_flow.html?highlight=optical%20flow#sparseopticalflowalgorithm

Algorithm broken down into :

Finding Connected Components Labels as feature for tracking (Type of Clustering)
Leuenberger’s Estimator as applied for robot motion estimation:
Predict the position using constant velocity model
Gets a measurement or position update using “L” gain matrix:
- Works like a trust factor
- If L is large: “I trust the measurements more than my predictions”
- If L is small: “I trust my predictions more than noisy measurements”

Here’s a real-world analogy:

Imagine you’re trying to catch a frisbee.

Your brain:

Predicts where the frisbee will be (prediction)
Looks at where it actually is (measurement)
Constantly adjusts your prediction (estimation)
Gets better at predicting with practice (tuning)

The key equations in simple terms:

New Estimate = Previous Estimate + Time Step × (Expected Change + L × (Measurement - Expected Measurement))

Advantage of using Lueberger’s estimator :

Works continuously and in real-time
Handles noisy measurements well
Gets more accurate over time
Can estimate things we can’t directly measure

This makes it very useful for:

Robot navigation
Drone control
Vehicle tracking
Any system where you need to know both position and velocity but can only measure position

The core idea is that it combines what we expect to happen (based on physics) with what we actually measure, weighted by how much we trust each source of information.

FAST Corner detection and tracking :

Overview :

Here, FAST corner feature in event data time surface is used to track objects and estimate optical flows.

Algorithm :

Here are the steps used in this algorithm :

Corner detection algorithm flow :

Get event time surface or Surface of Active Events (SAE) as mentioned in paper
Apply FAST corner detection on accumulated event time surface
Filter corner to remove the nearby corner

Tracking Algorithm for detected corner in consecutive time surface :

Assign ID based on proximity of the corner in last event time surface frames
Keep track of last detected frames
Keep track of last missed frames
Estimate velocity with damping

Test 1 : On pedestrian test raw file

Source Code Link :

https://github.com/LogicTronixInc/Event-Camera-Clustering-and-Optical-Flow-Estimation/tree/master/event-cam-tracking/event-cam-fast-corner-tracker

References :

BMVC17_Mueggler.pdf

Also check our Object Tracking ML Solution on Event Based Camera – build in collaboration with Prophesee and AMD-Xilinx.

Event Vision-Based ML