CNN-LSTM implementation methodology on SoC FPGA for human action recognition based on video

Suárez Plata, Daniel Nicolás; Fernández Solórzano, Víctor Manuel; Posadas Cobo, Héctor

doi:10.1109/DSD64264.2024.00035

dc.contributor.author	Suárez Plata, Daniel Nicolás
dc.contributor.author	Fernández Solórzano, Víctor Manuel
dc.contributor.author	Posadas Cobo, Héctor
dc.contributor.other	Universidad de Cantabria	es_ES
dc.date.accessioned	2025-05-05T07:13:00Z
dc.date.available	2025-05-05T07:13:00Z
dc.date.issued	2024
dc.identifier.isbn	979-8-3503-8038-5
dc.identifier.other	PID2020-116417RB-C43	es_ES
dc.identifier.uri	https://hdl.handle.net/10902/36330
dc.description.abstract	The growing use of AI -driven video applications like surveillance or healthcare monitoring underscores the need for embedded solutions capable of accurately categorizing human actions in real-time videos. A methodology is proposed for implementing a customized CNN-LSTM architecture on AMD-Xilinx SoC FPGA devices for human action categorization from video data. In this approach, CNN operations are accelerated by the Vitis-AI DPU within the FPG A, offering flexibility to support a range of CNN architectures without requiring individual hardware description language development. This adaptability is crucial given the varying performance of CNN models across datasets. LSTM operations are executed on the SoC processors, overcoming limitations in the support provided by DPU IP cores for such networks, while maintaining flexibility to assess different configurations. Additionally, a pipeline strategy is proposed to enable parallel execution of both CNN and LSTM components, optimizing resource utilization and minimizing idle times. To demonstrate the validity of the proposed implementation methodology, experiments were conducted on the ZCUI02 de-velopment board, equipped with a Zynq Ultrascale+ MP-SoC, and involved the use of the VGG 16 CNN model along with the exploration of different LSTM configurations. The results demonstrate remarkable computational performance, achieving frame rates of up to 44.34 FPS for videos recorded at a resolution of 320×240 pixels, surpassing real-time requirements. Aditionally, the proposed implementation maintains high accuracy levels, exemplified by the single bidirectional LSTM layer achieving a competitive accuracy of 73.33% based on the UCF10l dataset.	es_ES
dc.description.sponsorship	This work has been supported by Project PID2020-116417RB-C43, funded by Spanish MCIN/AEI/10.13039/501100011033 and by Project No 101007273 ECSEL DAIS, funded by EU H2020 and by Spanish pci2021-121988.	es_ES
dc.format.extent	8 p.	es_ES
dc.language.iso	eng	es_ES
dc.publisher	Institute of Electrical and Electronics Engineers Inc.	es_ES
dc.rights	© 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.	es_ES
dc.source	27th Euromicro Conference on Digital System Design (DSD), París, 2024, 202-209	es_ES
dc.subject.other	AMD-xilinx	es_ES
dc.subject.other	CNN-LSTM	es_ES
dc.subject.other	Deep learning	es_ES
dc.subject.other	HAR	es_ES
dc.subject.other	SoC FPGA	es_ES
dc.subject.other	UCFI0l	es_ES
dc.subject.other	Vitis-AI DPU	es_ES
dc.subject.other	ZCUI02	es_ES
dc.subject.other	Zynq ultrascale+MPSoC	es_ES
dc.title	CNN-LSTM implementation methodology on SoC FPGA for human action recognition based on video	es_ES
dc.type	info:eu-repo/semantics/conferenceObject	es_ES
dc.relation.publisherVersion	https://doi.org/10.1109/DSD64264.2024.00035	es_ES
dc.rights.accessRights	openAccess	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/EC/H2020/101007273/EU/Distributed Artificial Intelligent Systems/DAIS/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2020-116417RB-C43/ES/TECNOLOGIAS PARA INTELIGENCIA ARTIFICIAL RECONFIGURABLE APLICADAS A LA E-SALUD Y LA GANADERIA/	es_ES
dc.identifier.DOI	10.1109/DSD64264.2024.00035
dc.type.version	acceptedVersion	es_ES

Ficheros en el ítem

Nombre:: CNNLSTMImplementationMethodolo ...
Tamaño:: 504.5Kb
Formato:: PDF
Descripción:: CNNLSTMImplementationMethodology

Este ítem aparece en la(s) siguiente(s) colección(ones)

D50 Congresos [476]
D50 Proyectos de Investigación [445]

Mostrar el registro sencillo

CNN-LSTM implementation methodology on SoC FPGA for human action recognition based on video

Ficheros en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)

Listar

Mi cuenta

Estadísticas

Sobre UCrea

Piensa en abierto

Compartir