Author(s): Shijun Pan; Keisuke Yoshida; Takashi Kojima; Yutaro Hashimoto
Linked Author(s): Takashi Kojima, Keisuke Yoshida, Shijun Pan
Keywords: River Space Utilisation Analysis 4K Camera LLaVA Multi-Modal Model YOLOv8
Abstract: The Japanese Census of Rivers and Waterfront Areas, which is undertaken every five years through nationwide surveys, is critical in the development and administration of river projects. However, due to considerable human resource constraints, these surveys are often limited to seven days per year, with annual conditions calculated using the limited data. This temporal constraint makes it difficult to reliably estimate usage patterns throughout weekdays, holidays, and different times of day, while also impeding the quantitative evaluation of river maintenance work effects over time. This study suggests an automated approach to detecting human movement on riverbanks using 4K camera images mounted at the Asahi River diversion weir in Okayama Prefecture. By combining the YOLO (You Look Only Once) object detection model with the LLaVA (Large Language-and-Vision Assistant) multimodal model, the results show that it is possible to continuously monitor not only human presence and location, but also specific activities like walking, running, and skateboarding. This automated approach provides potential answers for thorough river space usage study.
Year: 2025