CNN-LSTM implementation methodology on SoC FPGA for human action recognition based on video

Suárez Plata, Daniel Nicolás; Fernández Solórzano, Víctor Manuel; Posadas Cobo, Héctor

doi:10.1109/DSD64264.2024.00035

Ver/Abrir

CNNLSTMImplementatio ... (504.5Kb)

Identificadores

URI: https://hdl.handle.net/10902/36330

DOI: 10.1109/DSD64264.2024.00035

ISBN: 979-8-3503-8038-5

Fecha

2024

Derechos

© 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

Publicado en

27th Euromicro Conference on Digital System Design (DSD), París, 2024, 202-209

Editorial

Institute of Electrical and Electronics Engineers Inc.

Enlace a la publicación

https://doi.org/10.1109/DSD64264.2024.00035

Palabras clave

AMD-xilinx

CNN-LSTM

Deep learning

HAR

SoC FPGA

UCFI0l

Vitis-AI DPU

ZCUI02

Zynq ultrascale+MPSoC

Resumen/Abstract

The growing use of AI -driven video applications like surveillance or healthcare monitoring underscores the need for embedded solutions capable of accurately categorizing human actions in real-time videos. A methodology is proposed for implementing a customized CNN-LSTM architecture on AMD-Xilinx SoC FPGA devices for human action categorization from video data. In this approach, CNN operations are accelerated by the Vitis-AI DPU within the FPG A, offering flexibility to support a range of CNN architectures without requiring individual hardware description language development. This adaptability is crucial given the varying performance of CNN models across datasets. LSTM operations are executed on the SoC processors, overcoming limitations in the support provided by DPU IP cores for such networks, while maintaining flexibility to assess different configurations. Additionally, a pipeline strategy is proposed to enable parallel execution of both CNN and LSTM components, optimizing resource utilization and minimizing idle times. To demonstrate the validity of the proposed implementation methodology, experiments were conducted on the ZCUI02 de-velopment board, equipped with a Zynq Ultrascale+ MP-SoC, and involved the use of the VGG 16 CNN model along with the exploration of different LSTM configurations. The results demonstrate remarkable computational performance, achieving frame rates of up to 44.34 FPS for videos recorded at a resolution of 320×240 pixels, surpassing real-time requirements. Aditionally, the proposed implementation maintains high accuracy levels, exemplified by the single bidirectional LSTM layer achieving a competitive accuracy of 73.33% based on the UCF10l dataset.

Colecciones a las que pertenece

D50 Congresos [476]
D50 Proyectos de Investigación [445]