A Preliminary Result of Food Object Detection using Swin Transformer
ACM International Conference Proceeding Series
An inappropriate diet is one of the main causes of poor health. However, it is difficult to sustain a quantitative diet assessment in the general living environment. Food object detection is a key method for solving this problem; still, it is difficult to find studies that apply recent object detection techniques. In addition, the currently used high-performance food object detection models have a special architecture that combines two deep learning models-food localization and food classification-in series to achieve high accuracy. The disadvantage of this architecture is that it is difficult to predict the scalability of a model. In this study, we built an end-to-end food object detection model using the Swin Transformer, which is one of the latest backbone models. The experiment was conducted to compare the performance of the UECFOOD dataset with other food object detection studies. For the UECFOOD-100 dataset, a mAP(mean Average Precision) of 0.522 was obtained; also, a mAP of 0.52 was obtained for the UECFOOD-256 dataset. The findings show that the proposed model that uses only end-to-end object detection produces better performance than previous studies using a combination of food localization and food classification.
Ministry of Science, ICT and Future Planning
Deep learning, Food Object Detection, Food recognition, Vision Transformer
Applied Data Science; Electrical Engineering
Daeil Jung, Simon Shim, Charles Choo, Doosung Hwang, Yunmook Nah, and Sejong Oh. "A Preliminary Result of Food Object Detection using Swin Transformer" ACM International Conference Proceeding Series (2022): 183-187. https://doi.org/10.1145/3543712.3543731