Time-to-Contact Map by Joint Estimation of Up-to-Scale Inverse Depth and Global Motion using a Single Event Camera

Event cameras asynchronously report brightness changes with a temporal resolution in the order of microseconds, which makes them inherently suitable to address problems that involve rapid motion perception. In this paper, we address the problem of time-to-contact (TTC) estimation using a single event camera. This problem is typically addressed by estimating a single global TTC measure, which explicitly assumes that the surface/obstacle is planar and fronto-parallel. We relax this assumption by proposing an incremental event-based method to estimate the TTC that jointly estimates the (up-to scale) inverse depth and global motion using a single event camera. The proposed method is reliable and fast while asynchronously maintaining a TTC map (TTCM), which provides per-pixel TTC estimates. As a side product, the proposed method can also estimate per-event optical flow. We achieve state-of-the-art performances on TTC estimation in terms of accuracy and runtime per event while achieving competitive performance on optical flow estimation.


Introduction
Event cameras differ from standard frame-based cameras, which capture visual data at a fixed rate and independently of the observing scene.Instead, event cameras respond asynchronously to pixel-wise brightness changes by generating events [6,25].Event cameras are thus datadriven sensors that offer several advantages, including high temporal resolution in the order of microseconds, low latency, low power consumption, and high dynamic range.These properties place event cameras as suitable candidates to address vision-based problems that involve (high-speed) motion, e.g., optical flow estimation [4,27,51], ego-motion estimation [15,21,22,34], motion segmentation [33,44], and obstacle avoidance [10,12].Due to the distinct visual sensing paradigm, however, new methods are necessary to fully exploit the potential of event cameras [13,23].PF [36] CT [43] SOFAS [45] ECMD [28] E-RAFT [17] Proposed Runtime per event (microsec) AREE (%) CPU GPU Figure 1.Runtime vs. accuracy comparison for TTC estimation methods.Average results on the Ventral Landing benchmark [28].
Event cameras have been used to address the problem of fast TTC estimation, which comes up often in the visionbased obstacle avoidance [10,12,29,39] and ventral landing [28,36] literature.The TTC is the time that would elapse before a camera reaches an obstacle/surface, assuming the current relative motion between them remains constant [10].Previous methods that use a single event camera have focused on estimating a single global TTC measure, which assumes that the surface is planar and fronto-parallel.To overcome this limitation, other methods use additional sensing, e.g., depth frames, to build a dense TTCM [20,47].
We instead propose to extend the Dispersion Minimization (DMin) framework [34] to estimate the TTC for each incoming event using a single event camera.The proposed method jointly estimates the relative global motion and per-event (up-to scale) inverse depth.We can then asynchronously maintain a semi-dense TTCM which provides per-pixel TTC estimates or compute a global TTC measure with greater accuracy by averaging over the TTC estimates.Since there is at least one scaling degree of freedom (DOF), we also propose an effective strategy to mitigate event collapse [40,41].The proposed method is also computationally fast, reaching ∼1 microsecond processing time per event on a standard laptop.Fig. 1 compares the runtime vs. accuracy for TTC estimation methods, whereby the proposed method achieves state-of-the-art performance.We also estimate the per-event optical flow as a side product and achieve competitive performance compared to state-ofthe-art optical flow methods that use events.Main contributions: 1. First event-based method that explicitly estimates the TTC for each event and maintains a semi-dense TTCM using a single event camera.2. DMin framework [34] extension to jointly handle local and global estimates, i.e., inverse depth and global motion, respectively.3. Effective approach that mitigates event collapse [40,41] for incremental event-based estimation.

Time-to-Contact
Consider a freely moving camera with angular and linear velocities ω(t) = (ω x (t), ω y (t), ω z (t)) T and ν(t) = (ν x (t), ν y (t), ν z (t)) T , respectively, that is observing a point α, with 3D coordinates α(t) = (X(t), Y (t), Z(t)) T relative to the camera, as shown in Fig. 2. Z(t) is also referred as the depth of point α relative to the camera.The instantaneous TTC between the moving camera and point α is thus given by: The minus sign disappears because we define the linear velocity w.r.t. the camera's frame of reference, not w.r.t. the point's frame of reference, i.e., ν z (t) = −dZ(t)/dt.Based on Eq. ( 1), the exact values of depth and relative approaching motion do not need to be estimated, only the ratio between them.

Related Work
We review recent related works on the following topics: TTC, global motion and optical flow estimation.We refer to [13] for a detailed survey.Time-to-Contact Estimation.The first work on eventbased TTC using a single event camera relied solely on the estimation of visual motion flows [10], whereby the motion flows were computed by fitting a local plane to the time surface [4].Other works that followed were geared towards two main use cases, namely obstacle avoidance [12,29,39] and ventral landing [28,36,43].Event-based obstacle avoidance methods are built to be fast reacting and, although they come from either bio-inspired [29,39] or mathematically grounded principles [12], they typically rely on empirically-validated heuristics to speed-up computations.Existing event-based ventral landing approaches only compute a single TTC estimate, which assumes that the surface is planar and fronto-parallel.To overcome this assumption, other works fuse events with additional sensory information, e.g.depth [47].The proposed approach uses a single event camera while being mathematically grounded and computationally fast.Global Motion Estimation.Also denoted by ego-motion estimation [13], it refers to estimating the parameters that explain the triggered events according to some global motion model.These methods can be broadly characterized by whether they rely on key-frame registration [8,9,14,21,22,38] or perform the estimation without relying on any key-frame [15,34].The former methods are reminiscent of the frame-based paradigm and also include methods based on artificial neural networks (ANN) [16,52], whereby an intermediate frame-based representation is needed for the estimation.The latter methods tend towards a more eventbased processing paradigm and include methods based on spiking neural networks (SNN) [18,37], whereby events are either processed on an event-by-event basis [34] or in batches [15].Although both approaches have advantages and disadvantages, methods that rely on key-frame registration typically require an intermediate frame-based representation, which is still an open problem in the event-based community.Similarly to [22], the proposed method jointly estimates up-to scale inverse depth and global motion.By building on the DMin framework [34], which also allows processing events on an event-by-event basis, our method does not rely on key-frame registration or background inverse depth regularization to improve convergence.Optical Flow Estimation.Several model-based methods have been previously proposed, which can be further divided into: frame-based [2,7,26,27], batch-based [42,49], and event-based [1,3,4].By selecting the most relevant events, i.e., typically the most recent, frame-based methods build frames from which the optical flow is computed using techniques from standard image-based optical flow, e.g., Lucas-Kanade [5].Since each event does not carry much information on its own, batch-based methods aggregate the most recent events by forming batches but perform the computations directly on the events.Event-based methods follow the most event-driven paradigm by performing eventby-event processing, typically being the computationally fastest.However, event-based methods tend to suffer more from the aperture problem since all the computations are performed locally, and thus frame-based and model-based methods achieve currently better accuracy.In terms of accuracy, learning-based ANN methods [11,35,46,52] generally achieve state-of-the-art performance.Besides the need to convert events into frames for more efficient processing, these methods are known to be very data hungry, sensitive to the training data [48], and consume large amounts of energy [27].Another line of research in learning-based methods has been to use SNN [19,24], which combine the eventbased processing and learning paradigms and thus do not require an intermediate frame-based representation.However, it is not trivial to train SNN, and the empirical validation is still not on par with ANN methods.Although it is not the primary objective of this work, the proposed method can provide per-event flow estimates while still being competitive in terms of accuracy w.r.t.state-of-the-art methods.

Method
In this section, we describe the proposed incremental event-based method for TTCM estimation.We first briefly review the event cameras' working principle, and the DMin framework [34], based on which we develop the proposed method.Refer to the supplementary material for the full mathematical derivations and additional details.

Event Cameras and Dispersion Minimization
Event cameras output a stream of asynchronous temporal contrast events {e i }, i ∈ N.Each event e i represents a spatio-temporal asynchronous brightness change, being defined as a tuple e i := (x i , t i , p i ), where x i = (x i , y i ) are the pixel coordinates, t i is the timestamp at which the event was generated, and p i ∈ {−1, +1} is its polarity.An event e i is generated when the change in log-brightness log I x,y (t) := Īx,y (t) is above a threshold where ∆t i is the time since the last event at the same pixel.
The DMin framework [34] estimates the parameters θ of a transformation model M from the stream of events E = {e i } Ne i=1 by minimizing a dispersion measure of the transformed events f i = M (e i ; t ref , θ).We consider the Potential measure with a Gaussian kernel N (x; µ, Σ): where I is the identity matrix.A key distinction of the DMin framework is that it allows to incrementally estimate the model parameters θ on an event-by-event basis, whereby the model parameters θ can be iteratively solved by linearizing the transformation model M, such that: where B i is the model-dependent linearization matrix.The parameters θ * that minimize Eq. ( 3) are thus given by: where

Adapted Dispersion Minimization
While the DMin method [34] provides a general framework for global incremental event-based model estimation, it can also be adapted to jointly estimate global and local measures, i.e., global angular and linear velocities and local inverse depth.However, the DMin framework may encounter estimation issues when the global model has at least one scaling DOF, as noted in [34,40,41], known as event collapse.Event collapse occurs when the events are transformed into a single point, which minimizes the events' dispersion or maximizes the image contrast while the parameters' estimates diverge.So far, to the best of our knowledge, the mitigation discussion in the literature has been on how to constrain the optimization loss to discourage divergent estimates by analyzing the effects of the scaling transformations on the event-based data.Several mitigation strategies have thus been proposed on the events [34,40] and parameters level [41] by adding terms to regularize the objective measure.While these strategies generally prevent event collapse, they increase the complexity of the optimization framework and introduce additional parameters to tune.
We instead observe that event collapse fundamentally stems from the event transformation to a common time reference t ref , e.g., given by Eq. ( 4), by identifying two problems: 1) the constant velocity assumption may not hold depending on the time difference ∆t i,ref , and 2) there is no built-in constraint on the magnitude of the model parameters, e.g., such that the difference between transformed events f i − f j explicitly penalizes divergent estimates; although, according to Eq. ( 4), f i and f j individually diverge if the model parameters also diverge, their difference f i − f j is not guaranteed to diverge and thus penalize divergent estimates.The first problem is typically addressed by heuristically making the time difference as short as possible, and its effects are of limited significance in practice.The second problem, however, is intrinsically linked to batch-based processing since the alignment of the events in a batch needs to be measured in some common time reference [15,32].
However, for incremental event-based processing, the event transformation and, consequently, the dispersion measure can be modified without loss of generality such that the event collapse is prevented by implicitly addressing the two problems identified.Fig. 3 depicts the idea whereby we only transform the event neighbors to the current event's timestamp instead of transforming all the events to a common time reference, including the current event.Formally, instead of transforming the events according to Eq. ( 4), only the neighboring events e j of the current event e i are transformed according to where 'neigh' is shorthand for neighborhood, and the dependency of x ′ j on θ was omitted for brevity.From the resultant residual r i,j = x i − x ′ j = ∆x i,j − C i,j θ, we see that the proposed modification to the DMin framework addresses both identified problems.First, the constant velocity assumption is better held since the time difference satisfies ∆t i,j ≤ ∆t i,ref .Second, if the model parameters θ diverge, then the residual r i,j also diverges, which effectively penalizes divergent estimates.We highlight that the proposed adaptation only works for incremental event-based processing: it is not suitable for batch-based processing since the proposed transformation only works locally.
The Potential measure is modified by computing the difference between the current event's coordinates x i and the neighboring events' transformed coordinates x ′ j : By minimizing Eq. ( 7) by linearizing the residual according to Taylor's formula r i,j (θ + ∆θ) ≈ r i,j (θ) + J i,j (θ)∆θ, the optimized model parameters θ * are iteratively updated: where w i,j = N x ′ j ; x i , I , J i,j = ∂r i,j /∂θ and e j ∈ neigh(e i ).

Inverse Depth and Global Motion Model
We consider that a calibrated event camera can freely move and whose global motion is parameterized by the 3D angular and linear velocities, as defined in Sec.1.1, θ = ν T , ω T T .For each event e i , we estimate its inverse depth ρ i := 1/Z i , based on the well-known expression for the apparent velocity on the image plane: where we make explicit the dependency on the camera intrinsic parameters, namely the horizontal and vertical focal lengths f x and f y , respectively, and the horizontal and vertical focal center coordinates c x and c y , respectively.In this paper, we assume that the focal center coordinates represent the focus of expansion (FOE).Fig. 4 shows the typical per-event estimation of optical flow and depth using the proposed method.

Time-to-Contact Map
For each event e i , we estimate its inverse depth ρ i and update the global motion parameters θ = ν T , ω T T .The model parameters γ i are formed by stacking the motion parameters and inverse depth γ i = (θ T , ρ i ) T .We impose a smoothness constraint to Eq. ( 6), so that neighboring inverse depth estimates are assumed to be equal to ρ i : where B i,j = ρ i V j Ω j .The iterative update ∆γ i is given by Eq. ( 8), where J i,j = −∆t i,j B i,j V j ν .We maintain the TTCM by computing the TTC for each event e i based on Eq. ( 1): τ i = 1/ (ρ i ν z ).We can also estimate the global TTC by averaging over the values maintained in the TTCM.Fig. 5 shows the typical per-event estimation of optical flow and TTCM using the proposed method.

Initialization
The motion parameters θ are global measures which are estimated by aggregating events, while each inverse depth ρ i value corresponds to the estimate of a single event's inverse depth.The initialization of the motion parameters θ can thus be almost arbitrarily set, e.g., typically set to 0, and it is only performed once at the beginning.However, a more careful initialization procedure must be considered for the case of inverse depth since it needs to be performed for each event.Based on recent advances in event-based global time decay [31], we perform a weighted average based on the previous neighboring events' inverse depth estimates as the initialization procedure.Each neighboring event's weight w i (t) is given by: where a(t) is the global event activity.If there are no previous neighboring events' inverse depth estimates, which should only occur at the start of the estimation, the initial inverse depth estimate is set to 1.

Practical Considerations
As mentioned in Sec.1.1, only the ratio between the depth and relative motion is required to compute the TTC.Since the depth and linear velocities estimates are obtained up-to a scale factor due to the monocular ambiguity, we constrain the linear velocities ν to have at most unit norm, i.e., |ν| 2 ≤ 1, which is useful to improve the method's computational stability by bounding the allowed estimates' values.This is not related to event collapse; rather, it stems from the ratio given by Eq. ( 1), whereby we can introduce an arbitrary non-zero multiplicative scalar to the numerator and denominator and have the same TTC estimate.
Since the events are generated by 3D points in the field of view (FOV) of the camera and should have positive depth values, we constrain the depth values to be strictly positive by introducing a parameterization variable λ ∈ R such that ρ(λ) = e λ > 0.
The iterative update given by Eq. ( 8) can be quite noisy since it is only computed in a small neighborhood neigh(e i ) = {e k : |x k − x i | ∞ ≤ s}.We thus introduce two prior parameters for the global motion parameters and the local inverse depth estimates l θ and l ρ , respectively.The resultant iterative update of the model parameters is given by , where L = diag(l θ , . . ., l θ , l ρ ), and γ prev are the parameters' estimates from the previous event e i−1 .We also weigh each event e j contribution according to the corresponding weight w j (t), given by Eq. ( 11), and discard any event whose weight is below a threshold w thresh [31].

Experimental Evaluation
We evaluate the proposed method in TTC estimation and optical flow estimation, given that the apparent velocity on the image plane can be estimated according to Eq. ( 9) and due to the lack of event datasets dedicated to TTC estimation.The optical flow benchmark also provides a common ground to compare the proposed method with other state-ofthe-art methods that estimate optical flow.Tab. 1 presents the hyper-parameters that were used across the experiments.Since the proposed method computes per-event estimates, we only evaluate on the respective pixel locations1 .

Datasets and Metrics
VL Dataset [28] 2 .It consists of 7 real event sequences observing planar prints of landing surfaces and 1 real event sequence observing the 3D print of a landing surface.Each sequence has 15sec of duration, totaling 120sec of dataset duration.The events were recorded by a Prophesee event camera with 1280 × 720 resolution, and the ground truth (GT) depth measurements were recorded with an Intel RealSense camera.However, only a global GT depth measurement is provided per timestamp.
To comply with the evaluation reported [28], we assess the proposed method at certain timestamps that correspond to event batches of 0.5sec.The comparison metrics are the divergence REE (%) and runtime per event (microsec).The divergence µ is the inverse of the TTC, i.e., µ = 1/τ , and the REE is given by: where μ is the estimate and µ gt is the GT.The global motion prior l θ was set to 1000.
MVSEC Dataset [50] 3 .It consists of several real indoor and outdoor sequences, providing events, standard grayscale frames, IMU data, camera poses, and scene depth.The events were recorded by a DAVIS [6] with 346 × 260 resolution.The evaluated sequences span approximately 265sec.The optical flow GT is also provided [51], generated from the scene depth and camera velocity.We generate GT TTCM's by applying Eq. ( 1) given the GT depth maps and camera velocity.
To assess optical flow accuracy, we use the following metrics: average endpoint error (AEE) (in pixel/frame, as is conventional in the literature [42,51]) outliers (Out) as the percentage of pixels with AEE greater than 3, average relative endpoint error (AREE) (%) [27] and average angular error (AAE) ( To assess TTCM accuracy, we use the AREE (%) between the divergence estimate μi and corresponding GT µ gt,i .The global motion prior l θ was set to 100.

Other Global Motion Models
Based on the general 6-DOF global motion model described in Sec.3.3, we can consider other more constrained motion models depending on the application, as follows.
Translation.This model is parameterized by the 3D linear velocities ν.Thus, B i = ρ i V i , being V i given by Eq. ( 9).Driving.This model is parameterized by the most significant DOF's when driving a car, namely the angular velocity around the camera y-axis ω y and the linear velocity in the z-axis ν z (see Fig. 2).Hence, . When using this motion model, we impose |ν z | 2 = 1 to ensure that the depth is properly estimated.
Scaling.This model is parameterized by the linear velocity in the z-axis ν z .Hence, It is only considered since the global motion model used in [28] is the 1-DOF scaling, which assumes that the scene is planar and it does not estimate the depth.

Results
Time-to-Contact.Tab. 2 reports the results on global divergence estimation on the VL benchmark [28].The proposed method using the Translation model with neighboring size s = 2 achieves the best accuracy, outperforming ECMD [28] by 36.77% on average.The proposed method using the Scaling and full 6-DOF models also outperform ECMD [28]  els with fewer parameters; conversely, a larger neighborhood benefits models with more parameters.Even though the motion for all sequences is predominantly dominated by just 1 scaling DOF, the results suggest that considering motion models with additional DOFs, e.g., Translation and 6-DOF, is beneficial.The extra DOFs may explain other small motions, whereas these small motions would just be considered noise for the Scaling model.
In terms of runtime per event, the proposed method outperforms ECMD [28] by between 95.36% and 98.47%, achieving real-time processing for all the sequences in the VL dataset [28], being capable of processing between 420k and 1.28M events per second.The results indicate that the runtime increases with the neighboring size s and the number of parameters of the global motion model.
Tab. 3 reports the results on divergence estimation on the MVSEC benchmark [50].The Translation model achieves the best performance overall, indicating that the corresponding DOFs are sufficient to explain the perceived motions in the sequences evaluated while minimizing the optimization complexity.As mentioned in Sec.4.2, the Driving model is tuned to the outdoor day1 sequence, whose performance is on par with the Translation model.However, it performs poorly on the indoor flying3 sequence since it can not handle more complex types of motions that are present.Being the most general model, the 6-DOF motion model achieves similar performance for all the sequences.Its performance is worse than the Translation model due to the increased optimization complexity, while the additional DOFs do not contribute to improving the accuracy.The proposed method achieves real-time processing in the MVSEC dataset [50] since, on average, the sequences exhibit a maximum of around 400k events per second.
Optical Flow.Since it is difficult to compare directly the results reported in Tab.methods.Tab. 4 reports the results on optical flow estimation on the MVSEC benchmark [50] in terms of AEE and Out, as is commonly found in the literature.The proposed method achieves state-of-the-art performance over the EB methods, on par performance with the FB method, and competitive performance overall.
Tab. 5 reports the results on optical flow estimation on the MVSEC benchmark [50] in terms of AREE and AAE.The proposed method achieves state-of-the-art performance on the outdoor day1 sequence using the Driving model and on par performance on the indoor flying3 sequence using the Translation model.The results on Tab. 5 indicate that the proposed method is comparatively more accurate in estimating the direction of the flow, i.e., compared with the other methods, the proposed method achieves lower values of AAE overall.In terms of accuracy, this suggests that more improvements can be achieved by improving the method's per-event inverse depth estimation since inverse depth estimates mainly contribute to the flow magnitude.
Qualitative Results.Fig. 6 presents qualitative results on sequences of the MVSEC dataset [50].The estimated TTCM and optical flow resemble the GT ones.

Limitations
Similarly to other event-based methods [15,22,35,52], the proposed method also inherits the brightness assumption from the DMin framework [34].It can thus provide wrongful estimates for events that are not caused by motion, e.g., due to flickering lights.While this limitation is somewhat mitigated when estimating global quantities, it can struggle to reliably estimate the inverse depth of events that are not caused by motion.Introducing probabilistic uncertainties to the estimates could alleviate this issue while also improving the inverse depth's initialization procedure.
Also related, although we explicitly impose a local smoothness constraint to the inverse depth estimates, given by Eq. ( 10), the proposed method can still provide inverse depth estimates that differ significantly from the neighboring inverse depth estimates.This typically occurs for events that are generated by noise.Carefully tuning the event camera biases and filtering out outliers could improve the overall inverse depth estimation.Having a back-end inverse depth regularizer [22] could also help to mitigate this issue.
The proposed method can only estimate one global motion.Thus, it can not adequately handle more than one motion simultaneously, e.g., due to cars moving [50].Considering a multi-scale approach [1,42] or explicitly modeling more than one possible motion occurring simultane-ously [33,44] are possible avenues for future research.

Conclusion
We have proposed a novel method that estimates the TTCM using a single event camera.The proposed method builds on the DMin framework to incrementally estimate local and global quantities, i.e., inverse depth, and global motion, respectively.We have also proposed an approach that effectively prevents event collapse for incremental event-based estimation without introducing regularizers or additional hyper-parameters.The proposed method also achieves state-of-the-art performance in TTC estimation in terms of accuracy and computational runtime while achieving competitive performance in optical flow estimation.Broadly, the proposed work further builds on the increasing amount of evidence that event cameras are especially suited to address visual motion-based problems; in particular, it further shows that incremental event-based processing can provide a flexible and general methodology to consider when using event cameras, which avoids issues introduced when converting events to other representations, e.g., batches and frames.

A. Adapted Dispersion Minimization
In this section, we describe the steps to obtain the optimized model parameters, given by Eq. ( 8), including the inverse depth parameterization discussed in Sec.3.6, i.e., λ ∈ R such that ρ(λ) = e λ > 0. To optimize Eq. ( 7), we differentiate it w.r.t. the global motion parameters and the parameterized inverse depth By linearizing the residual according to Taylor's formula r i,j (γ i + ∆γ i ) ≈ r i,j (γ i ) + J i,j ∆γ i and setting Eq. ( 16) to 0, we obtain thus obtaining the (parameterized) update given by Eq. ( 8).Lastly, the derivative of the residual w.r.t. the (parameterized) model parameters is given by where B i,j and V j are given by Eq. ( 9).

B. Additional Results
We provide additional results regarding the robustness of the proposed method for camera resolution resizing and event sampling.We adopt a simple strategy that resembles an integrate-and-fire model, which depends on a single parameter r that controls both the camera resolution reduction and the threshold that effectively fires an event to be processed.This is a simple strategy that improves the method's runtime by essentially working as an event filter, which can be useful when using cameras with a large resolution and/or in scenarios with limited computational power, e.g., embedded systems.Fig. 7 illustrates the strategy's main idea.For each event e i we divide its image coordinates by r, i.e., xi = x/r, and increment by 1 an integrator variable corresponding to the resized coordinates xi .Once the integrator variable crosses the firing threshold r 2 , the corresponding event is processed.This strategy ensures that the original spatial event distribution is largely preserved when the events' coordinates are resized while the number of events processed is reduced by a factor of r 2 , which effectively reduces the method's actual runtime by ≈ r 2 .The actual runtime improvement is achieved by skipping and not processing certain events since the runtime per event remains approximately the same.Also, r = 1 corresponds to considering the full original resolution.Fig. 8 plots the average global divergence estimation accuracy on the VL dataset [28] in function of the resolution reduction multiplier r.In absolute terms, on average, the performance worsens with the increase of the reduction multiplier r since fewer events are processed and thus less detail is considered.However, the drop in performance only becomes noticeable for r = 4.These results suggest that the proposed method can achieve at least a 9× speedup in actual runtime without a significant drop in accuracy, thus demonstrating its robustness.Tab.6 presents a detailed breakdown for all the sequences on the VL dataset [28].

Figure 3 .
Figure 3. Incremental event collapse mitigation.Instead of transforming all the events to some time reference (left), we locally transform the events to the current event's timestamp (right).

Figure 4 .
Figure 4. Typical per-event estimation of optical flow and depth.

Figure 5 .
Figure 5.Typical per-event estimation of optical flow and TTCM.

Table 1 .
Hyper-parameters used across the experiments.
in terms of average accuracy with neighboring size s = 2 and s = 3, respectively.The results indicate that considering a smaller neighborhood benefits global mod-

Table 5 .
[50]e evaluate the proposed method on optical flow estimation, and compare the results with other Optical flow estimation.Quantitative results on the MVSEC dataset[50], in terms of AREE (%) and AAE ( • ).Lower is better.