EventAid: Benchmarking Event-aided Image/Video Enhancement Algorithms with Real-captured Hybrid Dataset (2312.08220v1)
Abstract: Event cameras are emerging imaging technology that offers advantages over conventional frame-based imaging sensors in dynamic range and sensing speed. Complementing the rich texture and color perception of traditional image frames, the hybrid camera system of event and frame-based cameras enables high-performance imaging. With the assistance of event cameras, high-quality image/video enhancement methods make it possible to break the limits of traditional frame-based cameras, especially exposure time, resolution, dynamic range, and frame rate limits. This paper focuses on five event-aided image and video enhancement tasks (i.e., event-based video reconstruction, event-aided high frame rate video reconstruction, image deblurring, image super-resolution, and high dynamic range image reconstruction), provides an analysis of the effects of different event properties, a real-captured and ground truth labeled benchmark dataset, a unified benchmarking of state-of-the-art methods, and an evaluation for two mainstream event simulators. In detail, this paper collects a real-captured evaluation dataset EventAid for five event-aided image/video enhancement tasks, by using "Event-RGB" multi-camera hybrid system, taking into account scene diversity and spatiotemporal synchronization. We further perform quantitative and visual comparisons for state-of-the-art algorithms, provide a controlled experiment to analyze the performance limit of event-aided image deblurring methods, and discuss open problems to inspire future research.
- P. Lichtsteiner, C. Posch, and T. Delbruck, “A 128×\times×128 120 db 15 μ𝜇\muitalic_μs latency asynchronous temporal contrast vision sensor,” IEEE Journal of Solid-State Circuits, vol. 43, no. 2, 2008.
- G. Taverni, D. P. Moeys, C. Li, C. Cavaco, V. Motsnyi, D. S. S. Bello, and T. Delbruck, “Front and back illuminated dynamic and active pixel vision sensors comparison,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 65, no. 5, pp. 677–681, 2018.
- C. D. Schuman, S. R. Kulkarni, M. Parsa, J. P. Mitchell, P. Date, and B. Kay, “Opportunities for neuromorphic computing algorithms and applications,” Nature Computational Science, vol. 2, no. 1, pp. 10–19, 2022.
- G. Gallego, T. Delbrück, G. Orchard, C. Bartolozzi, B. Taba, A. Censi, S. Leutenegger, A. J. Davison, J. Conradt, K. Daniilidis, and D. Scaramuzza, “Event-based vision: A survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 1, pp. 154–180, 2022.
- C. Brandli, R. Berner, M. Yang, S.-C. Liu, and T. Delbruck, “A 240 × 180 130 db 3 µs latency global shutter spatiotemporal vision sensor,” IEEE Journal of Solid-State Circuits, vol. 49, no. 10, pp. 2333–2341, 2014.
- S. Chen and M. Guo, “Live demonstration: Celex-V: A 1m pixel multi-mode event-based sensor,” in Proc. of Computer Vision and Pattern Recognition Workshops (CVPRW), 2019.
- H. Rebecq, R. Ranftl, V. Koltun, and D. Scaramuzza, “High speed and high dynamic range video with an event camera,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 6, pp. 1964–1980, 2021.
- F. Paredes-Vallés and G. C. H. E. de Croon, “Back to event basics: Self-supervised learning of image reconstruction for event cameras via photometric constancy,” in Proc. of Computer Vision and Pattern Recognition (CVPR), 2021.
- R. Christian, G. Gottfried, and P. Thomas, “Real-time intensity-image reconstruction for event cameras using manifold regularisation,” in Proc. of British Machine Vision Conference (BMVC), 2016.
- C. Simon Chane, S.-H. Ieng, C. Posch, and R. B. Benosman, “Event-based tone mapping for asynchronous time-based image sensor,” Frontiers in Neuroscience, vol. 10, 2016.
- A. Z. Zhu, L. Yuan, K. Chaney, and K. Daniilidis, “EV-FlowNet: Self-supervised optical flow estimation for event-based cameras,” in Proc. of Robotics: Science and Systems (RSS), 2018.
- C. Lee, A. Kosta, A. Z. Zhu, K. Chaney, K. Daniilidis, and K. Roy, “Spike-FlowNet: Event-based optical flow estimation with energy-efficient hybrid neural networks,” in Proc. of European Conference on Computer Vision (ECCV), 2020.
- A. Z. Zhu, L. Yuan, K. Chaney, and K. Daniilidis, “Unsupervised event-based learning of optical flow, depth, and egomotion,” in Proc. of Computer Vision and Pattern Recognition (CVPR), pp. 989–997, 2019.
- G. Gallego, H. Rebecq, and D. Scaramuzza, “A unifying contrast maximization framework for event cameras, with applications to motion, depth, and optical flow estimation,” in Proc. of Computer Vision and Pattern Recognition (CVPR), 2018.
- H. Rebecq, G. Gallego, E. Mueggler, and D. Scaramuzza, “EMVS: Event-based multi-view stereo—3D reconstruction with an event camera in real-time,” International Journal of Computer Vision, vol. 126, no. 12, pp. 1394–1414, 2018.
- A. Baudron, Z. W. Wang, O. Cossairt, and A. K. Katsaggelos, “E3d: Event-based 3d shape reconstruction,” arXiv, vol. abs/2012.05214, 2020.
- J. Zhang, B. Dong, H. Zhang, J. Ding, F. Heide, B. Yin, and X. Yang, “Spiking transformers for event-based single object tracking,” in Proc. of Computer Vision and Pattern Recognition (CVPR), 2022.
- X. Wang, J. Li, L. Zhu, Z. Zhang, Z. Chen, X. Li, Y. Wang, Y. Tian, and F. Wu, “VisEvent: Reliable object tracking via collaboration of frame and event flows,” ArXiv, vol. abs/2108.05015, 2021.
- B. Ramesh and H. Yang, “Boosted kernelized correlation filters for event-based face detection,” in Proc. of Winter Conference on Applications of Computer Vision Workshops (WACVW), pp. 155–159, 2020.
- B. Ramesh, A. Ussa, L. Della Vedova, H. Yang, and G. Orchard, “Low-power dynamic object detection and classification with freely moving event cameras,” Frontiers in Neuroscience, vol. 14, p. 135, 2020.
- A. R. Vidal, H. Rebecq, T. Horstschaefer, and D. Scaramuzza, “Ultimate SLAM? combining events, images, and IMU for robust visual SLAM in HDR and high-speed scenarios,” IEEE Robotics and Automation Letters, vol. 3, no. 2, pp. 994–1001, 2018.
- J. Li, S. Dong, Z. Yu, Y. Tian, and T. Huang, “Event-based vision enhanced: A joint detection framework in autonomous driving,” in Proc. of IEEE International Conference on Multimedia and Expo (ICME), 2019.
- P. Duan, Y. Ma, X. Zhou, X. Shi, Z. W. Wang, T. Huang, and B. Shi, “NeuroZoom: Denoising and super resolving neuromorphic events and spikes,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1–14, 2023.
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: A large-scale hierarchical image database,” in Proc. of Computer Vision and Pattern Recognition (CVPR), pp. 248–255, 2009.
- T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft COCO: Common objects in context,” in Proc. of European Conference on Computer Vision (ECCV), pp. 740–755, 2014.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Proc. of Advances in Neural Information Processing Systems (NeurIPS), pp. 5998–6008, 2017.
- Ö. Çiçek, A. Abdulkadir, S. S. Lienkamp, T. Brox, and O. Ronneberger, “3D U-Net: Learning dense volumetric segmentation from sparse annotation,” in Proc. of Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2016.
- D. Ko, J. Choi, H. K. Choi, K.-W. On, B. Roh, and H. J. Kim, “MELTR: Meta loss transformer for learning to fine-tune video foundation models,” in Proc. of Computer Vision and Pattern Recognition (CVPR), pp. 20105–20115, 2023.
- C. Yu, Q. Zhou, J. Li, J. Yuan, Z. Wang, and F. Wang, “Foundation model drives weakly incremental learning for semantic segmentation,” in Proc. of Computer Vision and Pattern Recognition (CVPR), pp. 23685–23694, 2023.
- P. Duan, Z. Wang, B. Shi, O. Cossairt, T. Huang, and A. Katsaggelos, “Guided Event Filtering: Synergy between intensity images and neuromorphic events for high performance imaging,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 11, pp. 8261–8275, 2021.
- S. Tulyakov, D. Gehrig, S. Georgoulis, J. Erbach, M. Gehrig, Y. Li, and D. Scaramuzza, “Time Lens: Event-based video frame interpolation,” in Proc. of Computer Vision and Pattern Recognition (CVPR), 2021.
- H. Rebecq, R. Ranftl, V. Koltun, and D. Scaramuzza, “Events-to-video: Bringing modern computer vision to event cameras,” in Proc. of Computer Vision and Pattern Recognition (CVPR), 2019.
- S. M. Mostafavi I., J. Choi, and K.-J. Yoon, “Learning to super resolve intensity images from events,” in Proc. of Computer Vision and Pattern Recognition (CVPR), 2020.
- L. Pan, C. Scheerlinck, X. Yu, R. Hartley, M. Liu, and Y. Dai, “Bringing a blurry frame alive at high frame-rate with an event camera,” in Proc. of Computer Vision and Pattern Recognition (CVPR), 2019.
- S. Tulyakov, A. Bochicchio, D. Gehrig, S. Georgoulis, Y. Li, and D. Scaramuzza, “Time lens++: Event-based frame interpolation with parametric non-linear flow and multi-scale fusion,” in Proc. of Computer Vision and Pattern Recognition (CVPR), 2022.
- J. Han, C. Zhou, P. Duan, Y. Tang, C. Xu, C. Xu, T. Huang, and B. Shi, “Neuromorphic camera guided high dynamic range imaging,” in Proc. of Computer Vision and Pattern Recognition (CVPR), 2020.
- X. Zhou, P. Duan, Y. Ma, and B. Shi, “EvUnroll: Neuromorphic events based rolling shutter image correction,” in Proc. of Computer Vision and Pattern Recognition (CVPR), 2022.
- Z. W. Wang, P. Duan, O. Cossairt, A. Katsaggelos, T. Huang, and B. Shi, “Joint filtering of intensity images and neuromorphic events for high-resolution noise-robust imaging,” in Proc. of Computer Vision and Pattern Recognition (CVPR), 2020.
- T. Kim, J. Lee, L. Wang, and K.-J. Yoon, “Event-guided deblurring of unknown exposure time videos,” Proc. of European Conference on Computer Vision (ECCV), 2022.
- Y. Yang, J. Han, J. Liang, I. Sato, and B. Shi, “Learning event guided high dynamic range video reconstruction,” in Proc. of Computer Vision and Pattern Recognition (CVPR), 2023.
- C. Posch, D. Matolin, and R. Wohlgenannt, “A QVGA 143 db dynamic range frame-free PWM image sensor with lossless pixel-level video compression and time-domain CDS,” IEEE Journal of Solid-State Circuits, vol. 46, no. 1, 2010.
- P. Duan, Z. Wang, X. Zhou, Y. Ma, and B. Shi, “EventZoom: Learning to denoise and super resolve neuromorphic events,” in Proc. of Computer Vision and Pattern Recognition (CVPR), 2021.
- E. Mueggler, H. Rebecq, G. Gallego, T. Delbruck, and D. Scaramuzza, “The event-camera dataset and simulator: Event-based data for pose estimation, visual odometry, and SLAM,” The International Journal of Robotics Research, vol. 36, no. 2, pp. 142–149, 2017.
- T. Stoffregen, C. Scheerlinck, D. Scaramuzza, T. Drummond, N. Barnes, L. Kleeman, and R. Mahony, “Reducing the sim-to-real gap for event cameras,” in Proc. of European Conference on Computer Vision (ECCV), 2020.
- S. Zhang, Y. Zhang, Z. Jiang, D. Zou, J. Ren, and B. Zhou, “Learning to see in the dark with events,” in Proc. of European Conference on Computer Vision (ECCV), 2020.
- A. Z. Zhu, D. Thakur, T. Özaslan, B. Pfrommer, V. Kumar, and K. Daniilidis, “The multivehicle stereo event camera dataset: An event camera dataset for 3D perception,” IEEE Robotics and Automation Letters, vol. 3, no. 3, pp. 2032–2039, 2018.
- C. Scheerlinck, H. Rebecq, T. Stoffregen, N. Barnes, R. Mahony, and D. Scaramuzza, “CED: Color event camera dataset,” in Proc. of Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1684–1693, 2019.
- B. Wang, J. He, L. Yu, G.-S. Xia, and W. Yang, “Event enhanced high-quality image recovery,” in Proc. of European Conference on Computer Vision (ECCV), 2020.
- Z. Yu, Y. Zhang, D. Liu, D. Zou, X. Chen, Y. Liu, and J. S. Ren, “Training weakly supervised video frame interpolation with events,” in Proc. of International Conference on Computer Vision (ICCV), 2021.
- Z. Jiang, Y. Zhang, D. Zou, J. Ren, J. Lv, and Y. Liu, “Learning event-based motion deblurring,” in Proc. of Computer Vision and Pattern Recognition (CVPR), pp. 3320–3329, 2020.
- L. Sun, C. Sakaridis, J. Liang, Q. Jiang, K. Yang, P. Sun, Y. Ye, K. Wang, and L. V. Gool, “Event-based fusion for motion deblurring with cross-modal attention,” in Proc. of European Conference on Computer Vision (ECCV), 2021.
- M. Mostafavi, Y. Nam, J. Choi, and K.-J. Yoon, “E2SRI: Learning to super-resolve intensity images from events,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 10, pp. 6890–6909, 2022.
- J. Han, Y. Yang, C. Zhou, C. Xu, and B. Shi, “EvIntSR-Net: Event guided multiple latent frames reconstruction and super-resolution,” in Proc. of International Conference on Computer Vision (ICCV), 2021.
- J. Han, Y. Yang, P. Duan, C. Zhou, L. Ma, C. Xu, T. Huang, I. Sato, and B. Shi, “Hybrid high dynamic range imaging fusing neuromorphic and conventional images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 7, pp. 8553–8565, 2023.
- S. Barua, Y. Miyatani, and A. Veeraraghavan, “Direct face detection and video reconstruction from event cameras,” in Proc. of Winter Conference on Applications of Computer Vision (WACV), pp. 1–9, 2016.
- Z. Zhang, A. J. Yezzi, and G. Gallego, “Formulating event-based image reconstruction as a linear inverse problem with deep regularization using optical flow,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 7, pp. 8372–8389, 2023.
- C. Scheerlinck, H. Rebecq, D. Gehrig, N. Barnes, R. Mahony, and D. Scaramuzza, “Fast image reconstruction with an event camera,” in Proc. of Winter Conference on Applications of Computer Vision (WACV), pp. 156–163, 2020.
- P. R. G. Cadena, Y. Qian, C. Wang, and M. Yang, “SPADE-E2VID: Spatially-adaptive denormalization for event-based video reconstruction,” IEEE Transactions on Image Processing, vol. 30, pp. 2488–2500, 2021.
- C. Brandli, L. Muller, and T. Delbruck, “Real-time, high-speed video decompression using a frame- and event-based DAVIS sensor,” in Proc. of International Symposium on Circuits and Systems (ISCAS), pp. 686–689, 2014.
- Z. Wang, Y. Ng, C. Scheerlinck, and R. Mahony, “An asynchronous kalman filter for hybrid event cameras,” in Proc. of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 448–457, 2021.
- Y. Gao, S. Li, Y. Li, Y. Guo, and Q. Dai, “SuperFast: 200× video frame interpolation via event camera,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 6, pp. 7764–7780, 2023.
- G. Paikin, Y. Ater, R. Shaul, and E. Soloveichik, “EFI-Net: Video frame interpolation from fusion of events and frames,” in Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1291–1301, 2021.
- L. Sun, C. Sakaridis, J. Liang, P. Sun, J. Cao, K. Zhang, Q. Jiang, K. Wang, and L. Van Gool, “Event-based frame interpolation with ad-hoc deblurring,” in Proc. of Computer Vision and Pattern Recognition (CVPR), 2023.
- H. Chen, M. Teng, B. Shi, Y. Wang, and T. Huang, “A residual learning approach to deblur and generate high frame rate video with an event camera,” IEEE Transactions on Multimedia, pp. 1–14, 2022.
- S. Lin, J. Zhang, J. Pan, Z. Jiang, D. Zou, Y. Wang, J. Chen, and J. Ren, “Learning event-driven video deblurring and interpolation,” in Proc. of European Conference on Computer Vision (ECCV), 2020.
- L. Wang, T.-K. Kim, and K.-J. Yoon, “EventSR: From asynchronous events to image reconstruction, restoration, and super-resolution via end-to-end adversarial learning,” in Proc. of Computer Vision and Pattern Recognition (CVPR), 2020.
- X. Wang, K. Yu, S. Wu, J. Gu, Y. Liu, C. Dong, Y. Qiao, and C. C. Loy, “ESRGAN: Enhanced super-resolution generative adversarial networks,” in Proc. of European Conference on Computer Vision Workshops (ECCVW), 2018.
- T. Köhler, M. Bätz, F. Naderi, A. Kaup, A. Maier, and C. Riess, “Toward bridging the simulated-to-real gap: Benchmarking super-resolution on real data,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 42, no. 11, pp. 2944–2959, 2020.
- J. Rim, H. Lee, J. Won, and S. Cho, “Real-world blur dataset for learning and benchmarking deblurring algorithms,” in Proc. of European Conference on Computer Vision (ECCV), 2020.
- S. Tulyakov, F. Fleuret, M. Kiefel, P. Gehler, and M. Hirsch, “Learning an event sequence embedding for dense event-based deep stereo,” in Proc. of International Conference on Computer Vision (ICCV), pp. 1527–1537, 2019.
- Y. Hu, S.-C. Liu, and T. Delbruck, “V2E: From video frames to realistic dvs events,” in Proc. of Computer Vision and Pattern Recognition Workshops (CVPRW), 2021.
- S. Lin, Y. Ma, Z. Guo, and B. Wen, “DVS-Voltmeter: Stochastic process-based event simulator for dynamic vision sensors,” in Proc. of European Conference on Computer Vision (ECCV), 2022.
- C. Scheerlinck, H. Rebecq, D. Gehrig, N. Barnes, R. E. Mahony, and D. Scaramuzza, “Fast image reconstruction with an event camera,” in Proc. of Winter Conference on Applications of Computer Vision (WACV), pp. 156–163, 2020.
- W. Weng, Y. Zhang, and Z. Xiong, “Event-based video reconstruction using transformer,” in Proc. of International Conference on Computer Vision (ICCV), pp. 2563–2572, 2021.
- L. Zhu, X. Wang, Y. Chang, J. Li, T. Huang, and Y. Tian, “Event-based video reconstruction via potential-assisted spiking neural network,” in Proc. of Computer Vision and Pattern Recognition (CVPR), pp. 3594–3604, 2022.
- O. S. Kılıç, A. Akman, and A. A. Alatan, “E-VFIA: Event-based video frame interpolation with attention,” in Proc. of IEEE International Conference on Robotics and Automation (ICRA), pp. 8284–8290, 2023.
- L. Pan, R. Hartley, C. Scheerlinck, M. Liu, X. Yu, and Y. Dai, “High frame rate video reconstruction based on an event camera,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 5, pp. 2519–2533, 2022.
- F. Xu, L. Yu, B. Wang, W. Yang, G.-S. Xia, X. Jia, Z. Qiao, and J. Liu, “Motion deblurring with real events,” in Proc. of International Conference on Computer Vision (ICCV), pp. 2583–2592, 2021.
- W. Shang, D. Ren, D. Zou, J. S. Ren, P. Luo, and W. Zuo, “Bringing events into video deblurring with non-consecutively blurry frames,” in Proc. of International Conference on Computer Vision (ICCV), pp. 4531–4540, 2021.
- X. Zhang and L. Yu, “Unifying motion deblurring and frame interpolation with events,” in Proc. of Computer Vision and Pattern Recognition (CVPR), pp. 17765–17774, 2022.
- L. Sun, C. Sakaridis, J. Liang, Q. Jiang, K. Yang, P. Sun, Y. Ye, K. Wang, and L. Van Gool, “Event-based fusion for motion deblurring with cross-modal attention,” in Proc. of European Conference on Computer Vision (ECCV), 2022.
- M. Teng, C. Zhou, H. Lou, and B. Shi, “NEST: Neural event stack for event-based image enhancement,” in Proc. of European Conference on Computer Vision (ECCV), 2022.
- R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in Proc. of Computer Vision and Pattern Recognition (CVPR), 2018.
- D. Neil, M. Pfeiffer, and S.-C. Liu, “Phased LSTM: Accelerating recurrent network training for long or event-based sequences,” in Proc. of Advances in Neural Information Processing Systems (NeurIPS), 2016.
- W. Weng, Y. Zhang, and Z. Xiong, “Boosting event stream super-resolution with a recurrent neural network,” in Proc. of European Conference on Computer Vision (ECCV), 2022.
- Z.-S. Liu, L.-W. Wang, C.-T. Li, and W.-C. Siu, “Image super-resolution via attention based back projection networks,” 2019.
- P. E. Debevec and J. Malik, “Recovering high dynamic range radiance maps from photographs,” in ACM SIGGRAPH, 1997.