Detection of Aerobics Action Based on Convolutional Neural Network

Siyu Zhang

doi:10.1155/2022/1857406

Detection of Aerobics Action Based on Convolutional Neural Network

Zhang, Siyu 2022-01-05 00:00:00 Hindawi Computational Intelligence and Neuroscience Volume 2022, Article ID 1857406, 10 pages https://doi.org/10.1155/2022/1857406 Research Article Detection of Aerobics Action Based on Convolutional Neural Network Siyu Zhang Sangmyung University Seoul, 20 Hongjimun 2-gil, Jongno-gu, Seoul 03016, Republic of Korea Correspondence should be addressed to Siyu Zhang; zhangsiyu@cumt.edu.cn Received 4 November 2021; Revised 13 December 2021; Accepted 16 December 2021; Published 5 January 2022 Academic Editor: Bai Yuan Ding Copyright © 2022 Siyu Zhang. *is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. To further improve the accuracy of aerobics action detection, a method of aerobics action detection based on improving multiscale characteristics is proposed. In this method, based on faster R-CNN and aiming at the problems existing in faster R-CNN, the feature pyramid network (FPN) is used to extract aerobics action image features. So, the low-level semantic information in the images can be extracted, and it can be converted into high-resolution deep-level semantic information. Finally, the target detector is constructed by the above-extracted anchor points so as to realize the detection of aerobics action. *e results show that the loss function of the neural network is reduced to 0.2 by using the proposed method, and the accuracy of the proposed method can reach 96.5% compared with other methods, which proves the feasibility of this study. objects, but it mainly designed the trunk network of the deep 1. Related Work neural network [7]. Feng et al. proposed a multiscale feature extraction method to track targets in optical remote sensing Object detection, as an important part of the current research images. *e results show that the method can detect images ﬁeld of computer vision technology, mainly detects objects in quickly [8]. Based on the above-related research work, this images or videos. It integrates multiple technologies such as study takes aerobics action detection as the research object. artiﬁcial intelligence, and image recognition. So, it is widely A method of aerobics action detection based on multiscale used in national defense, military, and other ﬁelds [1–5]. features is proposed, and it veriﬁes the method. Traditional object detection is mainly for simple action scenes. But the scenes in complex environments have been the focus of the current discussion. As a new method of object detection in 2. Faster R-CNN Model the current complex scene, the deep neural network can realize feature transformation with its powerful feature extraction *e faster R-CNN algorithm proposed by Ren Shaoqing is ability. It can be seen that object tracking is better achieved. famous for its eﬃcient detection, and other scholars have In previous research, He et al. combined the deep neural proposed an improved algorithm based on the faster R-CNN network to detect small brown object targets. *e results algorithm. *e implementation process of the faster R-CNN show that small target objects can be accurately detected by algorithm is shown in Figure 1 [9, 10]. the deep neural network, and the accuracy can reach 98.46% As can be seen from Figure 1 that, ﬁrst of all, a con- [6]. Zheng et al. proposed a multiscale feature fusion method volutional neural network is used to extract the features of for the problem in the target detection such as occlusion. In the tested image. *en, a network (RPN) is generated by this method, a featured channel is constructed based on a using candidate regions to process the feature maps, and directional gradient, and the features obtained by the above multiple target candidate regions are also identiﬁed. Finally, feature channels are used as the input of the deep neural the classiﬁcation regression network is used to make network to detect the target [3]. Yan et al. also proposed to judgment, and the characteristics within the candidate re- use the neural network to simplify and detect the small target gion are screened; thus, the judgment value is output. 2 Computational Intelligence and Neuroscience classdifier 256-d ROI pooling 1 × 1,64 relu proposals 3 × 3,64 relu 1 × 1,256 Region Proposal Network feature maps relu Figure 2: A residual block of the ResNet-101. conv layers 2.2. Candidate Region Generation Networks. To compre- hensively analyze the color, texture, edge, and other infor- mation in the image and select the candidate areas for the image target to be tested, this process is actually to roughly detect the target test, such can reduce the pressure on the subse- Figure 1: Faster R-CNN structure. quent classiﬁcation network. *e candidate region generation network selected and used by the faster R-CNN algorithm is a convolutional 2.1. ResNet Deep Feature Network. ResNet-101 network is an neural network, shown as follows. important version of the ResNet deep feature network series. To analyze with the combination of Figure 3, ﬁrst, the It has deep layers and can extract deeper features of the sliding window traverses the feature map by sliding, mapping target so as to achieve the eﬀect of eﬀectively expressing the the features in the pathway at each position into a 256-di- target without causing delay to the network training, net- mensional feature vector; then, each eigenvector is led into work test, and other processes. *ere is a residual block in two fully connected layers, exporting 2 × 9 �18 scores and the ResNet-101 network [11] shown in Figure 2. 4 × 9 � 36 correction parameters, respectively. Each sliding In the residual block shown in Figure 2, the 1 × 1 con- window position contains 9 benchmark rectangular boxes, volutional layer is able to adjust the number of channels of which are used to correct the benchmark rectangular box after the feature map, and the 3 × 3 convolutional layer is able to obtaining 36 correction parameters, resulting in 9 candidate extract feature information. Deep ResNet-101 networks can regions. In addition, the 18 scores characterize the scoring be established by stacking more residual blocks. *e ad- results of the candidate regions, each corresponding to two vantage of the ResNet convolutional architecture is that it is scores, representing the probability of containing and not able to greatly increase the number of convolutional layers of containing targets within the candidate region to be tested. the neural network to extract the deep semantic information A benchmark rectangular box was corrected by using in the image and ultimately accurately express the target to four correction parameters t , t , t , t to obtain the can- x y w h be detected. didate regions, and the correction formula is listed as ReLU function is used as the activation function by ZF- x � w t + x , (2) Net which replaces the traditional sigmoid function and a x a further strengthens the application performance of deep learning neural networks [12–14]. *e sigmoid function is y � h t + y , (3) a y a able to map the input of continuous variable values into sections 0-1. However, if the input variable value is very w � w exp(tw), (4) small or large, the derivative of the sigmoid function tends to be 0, which makes the gradient disappear during back- h � h exp(th). (5) propagation, making the network diﬃcult to learn. *e ReLU function eﬀectively avoids the above defects, and its In the formula, x, y, w, h and x , y , w , h represent the a a a a mathematical expression is as follows [15]: center transverse coordinate, center ordinate, width, height of the candidate region, and the benchmark rectangular box, f(x) � max(0, x). (1) respectively. Computational Intelligence and Neuroscience 3 9 datum rectangles adaptability of the convolutional neural network structure. 36 correction 18 points *e working principle of the SPP is shown in Figure 4. parameters To analyze with Figure 4, SPP contains multiple scales of pooling layers that can be applicable to convolutional layer features of any size and ﬁnally output eigenvectors of ﬁxed 256 dimensional dimensions. eigenvector 3. Method Improvement *e faster R-CNN algorithm remains to be improved when handling multiscale problems, for which this paper intro- duces the feature pyramid network (FPN) into the faster Characteristic R-CNN framework. *e improved algorithmic framework is diagram shown as below in Figure 5. Combined with the analysis in Figure 5, the original Figure 3: *e candidate region generation networks. image is feature extracted by using ResNet. Considering the several layers in ResNet output feature maps of the same size, the comparative analysis found that the feature map of Multitask loss function is quoted in the faster R-CNN Conv1 occupies too much memory. So, only the output of algorithm, as described as follows [16]: Conv2∼Conv5 after nonlinear activation is used as the u u L p, u, t , v 􏼁 � L (p, u) + λ[u≥ 1]L t , v􏼁 . (6) cls box reference feature map for the current stage; then, Con- v2–Conv5 are obtained [17]. In the formula, the classiﬁcation loss function is as *e positive and negative labels of the anchor are set follows: based on the current anchor intersection ratio (IOU) to the L (p, u) � −log p . (7) actual position of the target to train the RPN network. *e cls u modiﬁed RPN network slides a network head at all levels of Bound regression loss function is as follows: the FPN network to determine the regional location that u u may contain the target to be tested, and the improved RPN L t , v􏼁 � 􏽘 smooth t − v􏼁 , box i network maintains a high level of parameter sharing. i∈ x,y,w,h { } *e improved low-level feature map of the FPN feature (8) ⎧ ⎨ 0.5x , if |x< 1|, pyramid has high-resolution characteristics, can extract smooth (x) � Li deep semantic information, and can achieve accurate re- |x| − 0.5, otherwise. trieval of multiscale and small goals. From the formulas, the purpose of the faster R-CNN training is to minimize the loss function L , minimality of L . cls box 3.1. FPN Multiscale Features. *e FPN network has an A target mask is also set to meet the application requirements. advantage in multiscale target detection to introduce it into x − x 􏼁 ∗ a the faster R-CNN that can further improve the model t � , adaptability to multiscale target, small target detection based on maintaining the eﬃciency of model detection. *e FPN y − y 􏼁 ∗ a network is able to accept pictures of any size, conﬁgure them t � , in the CNN convolutional neural architecture, and can more (9) eﬀectively extract deep feature maps. According to the re- w 􏼁 ∗ quirements of each convolutional layer of CNN, the feature t � log , graph corresponding to the proportional size is then output respectively, thus establishing the feature graph pyramid. It h 􏼁 is shown here that the FPN network is able to output feature t � log . maps of diﬀerent scales and also to integrate feature maps at diﬀerent layers. *e realization process of this function is ∗ ∗ ∗ ∗ In the formula, x , y , w , h represent the center that the FPN network ﬁrst sorts the feature maps of all layers abscissa, center ordinate, width, and height of the target label in the CNN network, then magniﬁes the length and width of box. the deeper feature maps to 2 times of the original, and adds to the feature maps of the corresponding shallow layer, thus realizing the fusion operation between the feature maps at 2.3. Principle of the Pooling of the Spatial Pyramid. Space diﬀerent levels. After this operation, the shallow feature map pyramid pooling (SPP) can map local features and fuse the contains both deep semantic information and a high-res- space of diﬀerent dimensions with the advantage of the olution ratio, thus improving the detection accuracy for ability to generate ﬁxed-size feature vectors and improve the multiscale targets. 4 Computational Intelligence and Neuroscience Fixed length feature expression Feature fusion Characteristic diagram input image Figure 4: Principle of spatial pyramid pooling. Regional suggestion box Improved RPN extraction ROIs Target ROI Pooling detector Building Input original Output test Characteristic diagram ResNet extracting FPN feature picture results corresponding to different deep feature map pyramid levels of FPN Figure 5: Basic framework of improved fast R-CNN. 1 1 3.2. /e Multiscale Multiplayer Aerobics Action Target De- ∗ ∗ ∗ L p , t � 􏽘 L p , p + λ 􏽘 p L t , t . 􏼈 􏼉 􏼈 􏼉􏼁 􏼁 􏼁 1 i i cls I i reg 1 i i i i tection Algorithm of CNN. After above optimized pro- N N cls reg i i cessing, the ﬁnally established multiscale RPN can (10) accurately extract multiscale ROIs as well as multiscale human targets. In the formula, i refers to the serial number of anchors in a small training batch, L refers to the total loss function of the RPN stage, p is the probability of predicting the i 3.2.1. ROIs Extracted from Multiscale RPN. Using the FPN anchor as the target, t is the RPN stage prediction, t is the i i for the RPN stage is able to extract multiscale candidate real boundary box position of the target, using p to regional ROIs. In diﬀerent layers of the FPN, ROIs were discriminate positive and negative anchors, N indi- cls extracted from sliding anchor point, and the score and cating the size of a small batch of training, N is the reg regression position of each candidate region were deter- number of anchors, λ as the balance parameter, and L reg 1 mined. Although the improved RPN stage contains multi- is the regression loss function which uses the smooth L layer feature maps, there is no need to extract ROIs loss. independently. Instead, pooling the anchors is acquired by Considering that the feature maps at each layer of the the layers within the same set, from which higher scoring FPN have diﬀerent scales, this subsection uses only a single- regions were selected as ROIs. *e loss function cited in this scale anchor, such can enable the improvement of the process refers to [18] multiscale target detection eﬀect. Computational Intelligence and Neuroscience 5 Using the IOU as the classiﬁcation basis, the anchors CPUE5-2678v3@2.50 GHz, NVIDIA GeForceGTX 1080 Ti, contain two categories as positive and negative anchors. and 32 GB of memory. *e deep learning framework used in However, if there are only small-scale targets in the image and this experiment is that TensorFlow-GPU 1.10, which is the number of small-scale targets is small, the ratio of negative equipped with cuDNN 6.0, CUDA 8.0 and has an Ana- and positive anchors obtained based on the RPN network is conda3 version of the Python library as well as Python 3.5. too large, which means that the extracted background se- To meet the needs of target detection of aerobics mantic information is too rich but aﬀects the feature ex- movements, the aerobics action of this experiment was traction eﬀect on the foreground target, and the obtained equipped mainly acquired through the collection. target detector for the foreground target identiﬁcation eﬀect is not ideal. To avoid this problem, it is necessary to reasonably limit the number ratio of positive and negative anchors 4.2. Training Results and Analysis. *is experiment selects generated in the RPN stage to prevent interference with the the faster R-CNN model and uses the feature map of various target detection eﬀect due to the too large ratio. FPN layers to detect the human targets, in order to achieve *is paper studies the problem of aerobics action object the multiscale target detection eﬀect. To achieve the goal, the detection. Combined with the morphological characteristics number ratio of positive and negative anchors in the RPN is of the aerobics action, the anchor aspect ratio is set at 1 : 2, 1 : limited in this experiment to prevent interference with the 2, 1 :1, 2 :1, and 1 : 3 in the paper, which can maintain the human object detection eﬀect due to the too large ratio. detection eﬃciency and detection eﬀect of the neural net- Combined with the morphological characteristics of the work for the aerobics action target. human body, the anchor aspect ratio was adjusted, and a 3 :1 proportional scale was added, which can more accurately identify the human targets. After this improvement, coor- 3.2.2. Multiscale Aerobics Action Target Detector. *e aer- dinating with INRIA, PETS 2009, and Caltech, three stan- obics action target detection network uses the FPN feature dard datasets, the aerobics action target can be detected pyramid to extract the characteristics of the aerobics action eﬀectively. target. *e realization process is according to the scale of the Taking the above dataset as an example, the TensorBoard ROls output by RPN, corresponding to the corresponding tool is used to demonstrate the training process of neural layer of the feature pyramid, and it then extracts the target networks, speciﬁcally as follows: features. Deep feature maps have been extracted at the FPN First, the visualization results of the feature maps of each stage, where it is only necessary to extract ﬁxed-size target layer of the FPN network are shown in Figure 6. feature maps using ROI pooling and then input the extracted Figure 7(a) shows the feature map output of layers 2–5 of features into the fast R-CNN target detector. *e target the feature extraction network, with the layer map of C2–C5 detector has 2 full connection layers in the front to assess the from bottom to top; Figure 7(b) shows the feature map conﬁdence of the target while performing a regression output of layers 2–5 obtained after the introduction of FPN analysis of the target region. *e loss function of this and layers P2–P5 from bottom to top. In comparison, al- procedure is listed in the below equation [19, 20]: though the semantic information contained in the C5 layer n M and P5 layer feature maps is basically similar, there is a large L p, u, t , v􏼁 � L (p, u) + δ[u≥ 1]L 􏼐t , v􏼑. (11) 2 ds2 loc2 gap between the semantic information contained in the C2 layer and P2 layer feature maps. In the formula, L represents the total loss function of the Raw images were input into the RPN network, and the second stage target detection, L represents the classiﬁ- cls2 RPN network detected the candidate region ROIs and to cation loss, L represents the smooth L regression loss loc2 score each candidate region ROIs. After the introduction of function, [u≥ 1] represents the rank indicator function, u the FPN pyramid architecture, four-scale feature maps are represents the true category of the target, p represents the output at the RPN stage, corresponding to four layers, and conﬁdence level of the predicted target, t and v represent the number of ROIs output from each layer is shown in the prediction boundary box corresponding to the u cate- Figure 7. gory and the position of the real boundary box respectively, In Figure 7, the number of ROIs in P2∼P5 layers of FPN and δ represents the equilibrium parameter. is shown, respectively, in which the number of ROIs in Previous studies are conducted on whether target de- lower-level feature maps is signiﬁcantly greater than the tectors at the fast R-CNN stage share parameters at diﬀerent number of ROIs in high-level feature maps. *e cause is that FPN layers, conﬁrming that the diﬀerences between diﬀerent the Caltech dataset used for training contains a large number layers were small. *us, this paper decides to share the of small-scale targets, while the number of large-scale targets weights between diﬀerent layers of the feature pyramid is signiﬁcantly less, which interferes with human target hierarchy, such can eﬀectively improve the target detection detection. eﬃciency. Variation tendency in the values of various loss functions in the improved RPN stage is shown in Figure 8. 4. Experimental Analysis For the analysis combined with Figure 8, the value of the 4.1. Experimental Environment and Dataset. *is experi- RPN loss function ﬂuctuates as the number of training it- erations increases, but the overall trend remains to decrease, ment uses the Ubuntu 16.04 version of the Linux system, and the server conﬁguration conditions are Intel@Xeon@ and the total loss value of the RPN stage decreases from 0.12 6 Computational Intelligence and Neuroscience C5: P5: C4: P4: C3: P3: C2: P2: (a) (b) Figure 6: Comparison of the feature maps at each level of the FPN network. (a) Original feature map. (b) FPN feature map. 140 30 -20 0 0.000 10.00 k 20.00 k 30.00 k 40.00 k 50.00 k 0.000 10.00 k 20.00 k 30.00 k 40.00 k 50.00 k (a) (b) 0.5 0.4 0.3 0.2 0.1 0 0 0.000 10.00 k 20.00 k 30.00 k 40.00 k 50.00 k 0.000 15.00 k 30.00 k 45.00 k (c) (d) Figure 7: *e number P2∼P5 of ROIs from multifeature graph. (a) P2. (b) P3. (c) P4. (d) P5. Computational Intelligence and Neuroscience 7 0.1 0.1 0.08 0.08 0.06 0.06 0.04 0.04 0.02 0.02 0 0 0.000 15.00 k 30.00 k 45.00 k 0.000 15.00 k 30.00 k 45.00 k 0.12 0.08 0.04 0.000 15.00 k 30.00 k 45.00 k Figure 8: Loss function value in the RPN stage. Combined with Figure 11, with the increasing of number to 0.02. *us, it is seen that the trained ROIs regions suggest that the network can accurately extract human target can- of training iterations, the updated weight decreases from didate region to enter higher-rated ROIs into the second 0.627 to 0.581. stage fast R-CNN target detection framework, then output two-loss values, namely, boundary box regression loss and 4.3. Comparison of Experimental Results. *e baseline faster classiﬁed loss, with trends shown in Figure 9. For the analysis combined with Figure 9, with the R-CNN is compared. Meanwhile, average accuracy was used as the algorithm performance (AP) evaluation index, with an number of training iterations increases, all three curves show some volatility, but the overall trend decreases, and the total AP value equal to the lower area of the PR curve. From the perspective of the application eﬀect, by combining the loss value in the fast R-CNN phase decreases from 0.9 to 0.2. It can be seen that the ability of the target detection network precision-recall diaries under diﬀerent threshold conditions, the comprehensive index AP of the target detection accuracy to detect human targets is gradually ﬁtted to the datasets is obtained, which can objectively and comprehensively used for training. *e trend of overall loss function values of the entire evaluate the algorithm detection accuracy. *e average ac- curacy AP values of the baseline faster R-CNN, CNN, neural network is outlined from the trial data, as shown in Figure 10. R-CNN, and Mask R-CNN, the method in this paper summarized in Figure 12. Combined with Figure 11, as the number of iterative training increases, the overall loss value of the whole neural According to Figure 12, the average accuracy AP value of the baseline faster R-CNN is 90.7%, the average ac- network decreases from 1.7 to 0.8, which indicates that the trained neural network can more accurately detect aerobics curacy AP value of the proposed algorithm is 96.5%, 91.1%, and 88.6%. By comparison, the method of this movement targets. paper is based on faster R-CNN. Object features were To circumvent overﬁtting during training, this experi- extracted by using FPN, combined with the correlation ment limited the weight attenuation, that is, to maintain a optimization algorithm to achieve a higher ground de- negative correlation with the update weights of the neural network and the number of training iterations during tection accuracy generally and signiﬁcantly outperformed the detection accuracy of the baseline faster R-CNN training, under which the trend of update weights is shown in Figure 11 below: model. 8 Computational Intelligence and Neuroscience 0.3 0.6 0.2 0.4 0.1 0.2 0 0 0.000 15.00 k 30.00 k 45.00 k 0.000 10.00 k 20.00 k 30.00 k 40.00 k 50.00 k (a) (b) 0.8 0.6 0.4 0.2 0.000 10.00 k 20.00 k 30.00 k 40.00 k 50.00 k (c) Figure 9: Loss function value in the second stage detection. (a) Classiﬁed losses in the second stage. (b) Returned losses in the second stage. (c) Total losses in the second stage. 0.635 1.6 0.62 1.2 0.605 0.8 0.59 0.4 0 0.575 0.000 10.00 k 20.00 k 30.00 k 40.00 k 50.00 k 0.000 10.00 k 20.00 k 30.00 k 40.00 k 50.00 k Figure 10: Loss function value of total neural network. Figure 11: Neural network weight attenuation. Computational Intelligence and Neuroscience 9 100 [2] M. Rajasekar, A. Celine Kavida, and M. Anto Bennet, “A pattern analysis based underwater video segmentation system for target object detection,” Multidimensional Systems and Signal Processing, vol. 31, no. 4, pp. 1579–1602, 2020. [3] H. Zheng, J. Chen, L. Chen, Y. Li, and Z. Yan, “Feature enhancement for multi-scale object detection,” Neural Pro- cessing Letters, vol. 51, no. 1, pp. 1907–1919, 2020. [4] B. Cui and J.-C. Cre´put, “A systematic algorithm for moving object detection with application in real-time surveillance,” SN Computer Science, vol. 1, no. 2, Faster CNN R-CNN Mask Proposed pp. 164–180, 2020. R-CNN R-CNN method [5] S. Singh Sengar and S. Mukhopadhyay, “Moving object de- algorithm tection using statistical background subtraction in wavelet Figure 12: Baseline fast R-CNN and proposed method. compressed domain,” Multimedia Tools and Applications: An International Journal, vol. 79, no. 12, pp. 5919–5940, 2020. [6] Y. He, Z. Zhou, L. Tian, Y. Liu, and X. Luo, “Brown rice 5. Conclusion planthopper (nilaparvata lugens stal) detection based on deep learning,” Precision Agriculture, vol. 21, no. 6, pp. 1385–1402, In summary, it is diﬃcult to adapt to the detection needs of multiscale and small target application scenarios in the [7] Z. Yan, H. Zheng, Y. Li, and L. Chen, “Detection-oriented process of aerobics action target detection. *erefore, this backbone trained from near scratch and local feature re- paper proposes a multiscale and multitarget aerobics action ﬁnement for small object detection,” Neural Processing Let- target detection algorithm based on CNN network archi- ters, vol. 53, no. 3, pp. 1921–1943, 2021. tecture. After introducing the two-stage target detection [8] Y. Feng, L. Wang, and M. Zhang, “A multi-scale target de- framework of the faster R-CNN model, this paper system- tection method for optical remote sensing images,” Multi- atically analyzes the process of using the ResNet deep main media Tools and Applications, vol. 78, no. 7, pp. 8751–8766, network to extract and utilize the process of multiscale target feature extraction by using the FPN network. *en, the FPN [9] G. L. Hung, M. S. B. Sahimi, H. Samma, T. A. Almohamad, and B. Lahasan, “Faster R-CNN deep learning model for feature pyramid was fused separately in the two stages of the pedestrian detection from drone images,” SN Computer faster R-CNN framework to construct a multiscale RPN Science, vol. 1, no. 2, pp. 17–23, 2020. proposal network as well as a multiscale aerobics action [10] H. You, L. Yu, S. Tian et al., “MC-Net: multiple max-pooling objective detector to set the number of positive and negative integration module and cross multi-scale deconvolution anchors reasonably is conducive to improving the detection network,” Knowledge-Based Systems, vol. 231, Article ID eﬃciency of the aerobics action targets of the algorithm. 107456, 2021. Limiting the vertical and horizontal ratio of anchors sci- [11] C. Yan, G. Pang, X. Bai, J. Zhou, and L. Gu, “Beyond triplet entiﬁcally and also optimizing the overall neural network loss: person Re-identiﬁcation with ﬁne-grained diﬀerence- can improve the detection eﬀect of multiscale and multi- aware pairwise loss,” IEEE Transactions on Multimedia, p. 1, person aerobics movements. Finally, in this paper, baseline faster R-CNN methods were used for comparison experi- [12] G. Wang, W. Li, L. Zhang et al., “Encoder-X: solving unknown ments separately. Finally, conﬁrming the baseline faster coeﬃcients automatically in polynomial ﬁtting by using an R-CNN, this method in the paper can achieve preferable autoencoder,” IEEE Transactions on Neural Networks and detection results. *e innovation of this study is to identify Learning Systems, pp. 1–13, 2021. the aerobics movement from the perspective of target de- [13] B. Liu, J. Luo, and H. Huang, “Toward automatic quantiﬁ- cation of knee osteoarthritis severity using improved faster tection and recognition so as to provide more reference R-CNN,” International Journal of Computer Assisted Radi- paths for the auxiliary training of sports. ology and Surgery, vol. 15, no. 9, pp. 457–466, 2020. [14] Z. Xiao, L. Pei, L. Geng, Y. Sun, F. Zhang, and J. Wu, “Surface Data Availability parameter measurement of braided composite preform based on faster R-CNN,” Fibers and Polymers, vol. 21, no. 3, *e data are available from the corresponding author upon pp. 590–603, 2020. request. [15] S. Tu, J. Pang, H. Liu et al., “Passion fruit detection and counting based on multiple scale faster R-CNN using RGB-D Conflicts of Interest images,” Precision Agriculture, vol. 21, no. 5, pp. 1072–1091, *e author declares no conﬂicts of interest regarding this [16] M. Quintana, S. Karaoglu, F. Alvarez, J. Menendez, and work. T. Gevers, “*ree-D wide faces (3DWF): facial landmark detection and 3D reconstruction over a new RGB-D multi- camera dataset,” Sensors, vol. 19, no. 5, p. 1103, 2019. References [17] Z. Zhong, L. Sun, and Q. Huo, “An anchor-free region [1] L. Shao, F. Nagata, H. Ochi et al., “Visual feedback control of proposal network for faster R-CNN-based text detection approaches,” International Journal on Document Analysis and quadrotor by object detection in movies,” Artiﬁcial Life and Robotics, vol. 25, no. 3, pp. 488–494, 2020. Recognition (IJDAR), vol. 22, no. 3, pp. 315–327, 2019. Detection accuracy 10 Computational Intelligence and Neuroscience [18] T. Zhou, Z. Li, and C. Zhang, “Enhance the recognition ability to occlusions and small objects with robust faster R-CNN,” International Journal of Machine Learning and Cybernetics, vol. 10, no. 11, pp. 3155–3166, 2019. [19] W. Tang, D. Zou, S. Yang, J. Shi, J. Dan, and G. Song, “A two- stage approach for automatic liver segmentation with faster R-CNN and deeplab,” Neural Computing and Applications, vol. 32, no. 11, pp. 6769–6778, 2020. [20] A. M. Wildridge, P. C. *omson, S. C. Garcia, E. C. Jongman, and K. L. Kerrisk, “Transitioning from conventional to au- tomatic milking: eﬀects on the human-animal relationship,” Journal of Dairy Science, vol. 103, no. 2, pp. 1608–1619, 2020. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Computational Intelligence and Neuroscience Hindawi Publishing Corporation http://www.deepdyve.com/lp/hindawi-publishing-corporation/detection-of-aerobics-action-based-on-convolutional-neural-network-Sf0nbhXngT

Loading next page...

References (21)

M. Rajasekar, A. Kavida, M. Bennet (2020)
A pattern analysis based underwater video segmentation system for target object detection
Multidimensional Systems and Signal Processing
Zhuoyao Zhong, Lei Sun, Qiang Huo (2018)
An anchor-free region proposal network for Faster R-CNN-based text detection approaches
International Journal on Document Analysis and Recognition (IJDAR), 22
A. Wildridge, P. Thomson, S. Garcia, E. Jongman, K. Kerrisk (2019)
Transitioning from conventional to automatic milking: Effects on the human-animal relationship.
Journal of dairy science
Hongfeng You, Long Yu, Shengwei Tian, Xiang Ma, Yan Xing, Ning Xin, Weiwei Cai (2021)
MC-Net: Multiple max-pooling integration module and cross multi-scale deconvolution network
Knowl. Based Syst., 231
Yue He, Zhiyan Zhou, Luhong Tian, Youfu Liu, Xiwen Luo (2020)
Brown rice planthopper (Nilaparvata lugens Stal) detection based on deep learning
Precision Agriculture, 21
Bin Liu, Jianxu Luo, Huan Huang (2020)
Toward automatic quantification of knee osteoarthritis severity using improved Faster R-CNN
International Journal of Computer Assisted Radiology and Surgery, 15
Yanqing Feng, Lunwen Wang, Mengbo Zhang (2018)
A multi-scale target detection method for optical remote sensing images
Multimedia Tools and Applications, 78
(2019)
ree-D wide faces (3DWF): facial landmark detection and 3D reconstruction over a new RGB-D multicamera dataset
S. Tu, Jing Pang, Haofeng Liu, Nan Zhuang, Yong Chen, Chan Zheng, Hua Wan, Yueju Xue (2020)
Passion fruit detection and counting based on multiple scale faster R-CNN using RGB-D images
Precision Agriculture
Huicheng Zheng, Jiajie Chen, Lvran Chen, Ye Li, Zhiwei Yan (2020)
Feature Enhancement for Multi-scale Object Detection
Neural Processing Letters, 51
Tao Zhou, Zhixin Li, Canlong Zhang (2019)
Enhance the recognition ability to occlusions and small objects with Robust Faster R-CNN
International Journal of Machine Learning and Cybernetics, 10
Zhiwei Yan, Huicheng Zheng, Ye Li, Lvran Chen (2021)
Detection-Oriented Backbone Trained from Near Scratch and Local Feature Refinement for Small Object Detection
Neural Processing Letters, 53
Cheng Yan, Guansong Pang, Xiao Bai, Changhong Liu, X. Ning, Lin Gu, Jun Zhou (2020)
Beyond Triplet Loss: Person Re-Identification With Fine-Grained Difference-Aware Pairwise Loss
IEEE Transactions on Multimedia, 24
Zhitao Xiao, Lei Pei, Lei Geng, Ying Sun, Fang Zhang, Jun Wu (2020)
Surface Parameter Measurement of Braided Composite Preform Based on Faster R-CNN
Fibers and Polymers, 21
Marcos Quintana, Sezer Karaoglu, F. Álvarez, J. Menéndez, T. Gevers (2019)
Three-D Wide Faces (3DWF): Facial Landmark Detection and 3D Reconstruction over a New RGB–D Multi-Camera Dataset
Sensors (Basel, Switzerland), 19
Goon Hung, Mohamad Sahimi, Hussein Samma, T. Almohamad, B. Lahasan (2020)
Faster R-CNN Deep Learning Model for Pedestrian Detection from Drone Images
SN Computer Science, 1
Guojun Wang, Weijun Li, Lu Zhang, Linjun Sun, Peng Chen, Lina Yu, X. Ning (2021)
Encoder-X: Solving Unknown Coefficients Automatically in Polynomial Fitting by Using an Autoencoder
IEEE Transactions on Neural Networks and Learning Systems, 33
Lu Shao, F. Nagata, Hiroaki Ochi, A. Otsuka, Takeshi Ikeda, Keigo Watanabe, M. Habib (2020)
Visual feedback control of quadrotor by object detection in movies
Artificial Life and Robotics, 25
Beibei Cui, Jean-Charles Créput (2020)
A Systematic Algorithm for Moving Object Detection with Application in Real-Time Surveillance
SN Computer Science, 1
Wei Tang, D. Zou, Su Yang, Jing Shi, Jingpei Dan, Guowu Song (2020)
A two-stage approach for automatic liver segmentation with Faster R-CNN and DeepLab
Neural Computing and Applications, 32
S. Sengar, S. Mukhopadhyay (2019)
Moving object detection using statistical background subtraction in wavelet compressed domain
Multimedia Tools and Applications, 79

Publisher: Hindawi Publishing Corporation
Copyright: Copyright © 2022 Siyu Zhang. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
ISSN: 1687-5265
eISSN: 1687-5273
DOI: 10.1155/2022/1857406
Publisher site: See Article on Publisher Site

Abstract

Hindawi Computational Intelligence and Neuroscience Volume 2022, Article ID 1857406, 10 pages https://doi.org/10.1155/2022/1857406 Research Article Detection of Aerobics Action Based on Convolutional Neural Network Siyu Zhang Sangmyung University Seoul, 20 Hongjimun 2-gil, Jongno-gu, Seoul 03016, Republic of Korea Correspondence should be addressed to Siyu Zhang; zhangsiyu@cumt.edu.cn Received 4 November 2021; Revised 13 December 2021; Accepted 16 December 2021; Published 5 January 2022 Academic Editor: Bai Yuan Ding Copyright © 2022 Siyu Zhang. *is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. To further improve the accuracy of aerobics action detection, a method of aerobics action detection based on improving multiscale characteristics is proposed. In this method, based on faster R-CNN and aiming at the problems existing in faster R-CNN, the feature pyramid network (FPN) is used to extract aerobics action image features. So, the low-level semantic information in the images can be extracted, and it can be converted into high-resolution deep-level semantic information. Finally, the target detector is constructed by the above-extracted anchor points so as to realize the detection of aerobics action. *e results show that the loss function of the neural network is reduced to 0.2 by using the proposed method, and the accuracy of the proposed method can reach 96.5% compared with other methods, which proves the feasibility of this study. objects, but it mainly designed the trunk network of the deep 1. Related Work neural network [7]. Feng et al. proposed a multiscale feature extraction method to track targets in optical remote sensing Object detection, as an important part of the current research images. *e results show that the method can detect images ﬁeld of computer vision technology, mainly detects objects in quickly [8]. Based on the above-related research work, this images or videos. It integrates multiple technologies such as study takes aerobics action detection as the research object. artiﬁcial intelligence, and image recognition. So, it is widely A method of aerobics action detection based on multiscale used in national defense, military, and other ﬁelds [1–5]. features is proposed, and it veriﬁes the method. Traditional object detection is mainly for simple action scenes. But the scenes in complex environments have been the focus of the current discussion. As a new method of object detection in 2. Faster R-CNN Model the current complex scene, the deep neural network can realize feature transformation with its powerful feature extraction *e faster R-CNN algorithm proposed by Ren Shaoqing is ability. It can be seen that object tracking is better achieved. famous for its eﬃcient detection, and other scholars have In previous research, He et al. combined the deep neural proposed an improved algorithm based on the faster R-CNN network to detect small brown object targets. *e results algorithm. *e implementation process of the faster R-CNN show that small target objects can be accurately detected by algorithm is shown in Figure 1 [9, 10]. the deep neural network, and the accuracy can reach 98.46% As can be seen from Figure 1 that, ﬁrst of all, a con- [6]. Zheng et al. proposed a multiscale feature fusion method volutional neural network is used to extract the features of for the problem in the target detection such as occlusion. In the tested image. *en, a network (RPN) is generated by this method, a featured channel is constructed based on a using candidate regions to process the feature maps, and directional gradient, and the features obtained by the above multiple target candidate regions are also identiﬁed. Finally, feature channels are used as the input of the deep neural the classiﬁcation regression network is used to make network to detect the target [3]. Yan et al. also proposed to judgment, and the characteristics within the candidate re- use the neural network to simplify and detect the small target gion are screened; thus, the judgment value is output. 2 Computational Intelligence and Neuroscience classdifier 256-d ROI pooling 1 × 1,64 relu proposals 3 × 3,64 relu 1 × 1,256 Region Proposal Network feature maps relu Figure 2: A residual block of the ResNet-101. conv layers 2.2. Candidate Region Generation Networks. To compre- hensively analyze the color, texture, edge, and other infor- mation in the image and select the candidate areas for the image target to be tested, this process is actually to roughly detect the target test, such can reduce the pressure on the subse- Figure 1: Faster R-CNN structure. quent classiﬁcation network. *e candidate region generation network selected and used by the faster R-CNN algorithm is a convolutional 2.1. ResNet Deep Feature Network. ResNet-101 network is an neural network, shown as follows. important version of the ResNet deep feature network series. To analyze with the combination of Figure 3, ﬁrst, the It has deep layers and can extract deeper features of the sliding window traverses the feature map by sliding, mapping target so as to achieve the eﬀect of eﬀectively expressing the the features in the pathway at each position into a 256-di- target without causing delay to the network training, net- mensional feature vector; then, each eigenvector is led into work test, and other processes. *ere is a residual block in two fully connected layers, exporting 2 × 9 �18 scores and the ResNet-101 network [11] shown in Figure 2. 4 × 9 � 36 correction parameters, respectively. Each sliding In the residual block shown in Figure 2, the 1 × 1 con- window position contains 9 benchmark rectangular boxes, volutional layer is able to adjust the number of channels of which are used to correct the benchmark rectangular box after the feature map, and the 3 × 3 convolutional layer is able to obtaining 36 correction parameters, resulting in 9 candidate extract feature information. Deep ResNet-101 networks can regions. In addition, the 18 scores characterize the scoring be established by stacking more residual blocks. *e ad- results of the candidate regions, each corresponding to two vantage of the ResNet convolutional architecture is that it is scores, representing the probability of containing and not able to greatly increase the number of convolutional layers of containing targets within the candidate region to be tested. the neural network to extract the deep semantic information A benchmark rectangular box was corrected by using in the image and ultimately accurately express the target to four correction parameters t , t , t , t to obtain the can- x y w h be detected. didate regions, and the correction formula is listed as ReLU function is used as the activation function by ZF- x � w t + x , (2) Net which replaces the traditional sigmoid function and a x a further strengthens the application performance of deep learning neural networks [12–14]. *e sigmoid function is y � h t + y , (3) a y a able to map the input of continuous variable values into sections 0-1. However, if the input variable value is very w � w exp(tw), (4) small or large, the derivative of the sigmoid function tends to be 0, which makes the gradient disappear during back- h � h exp(th). (5) propagation, making the network diﬃcult to learn. *e ReLU function eﬀectively avoids the above defects, and its In the formula, x, y, w, h and x , y , w , h represent the a a a a mathematical expression is as follows [15]: center transverse coordinate, center ordinate, width, height of the candidate region, and the benchmark rectangular box, f(x) � max(0, x). (1) respectively. Computational Intelligence and Neuroscience 3 9 datum rectangles adaptability of the convolutional neural network structure. 36 correction 18 points *e working principle of the SPP is shown in Figure 4. parameters To analyze with Figure 4, SPP contains multiple scales of pooling layers that can be applicable to convolutional layer features of any size and ﬁnally output eigenvectors of ﬁxed 256 dimensional dimensions. eigenvector 3. Method Improvement *e faster R-CNN algorithm remains to be improved when handling multiscale problems, for which this paper intro- duces the feature pyramid network (FPN) into the faster Characteristic R-CNN framework. *e improved algorithmic framework is diagram shown as below in Figure 5. Combined with the analysis in Figure 5, the original Figure 3: *e candidate region generation networks. image is feature extracted by using ResNet. Considering the several layers in ResNet output feature maps of the same size, the comparative analysis found that the feature map of Multitask loss function is quoted in the faster R-CNN Conv1 occupies too much memory. So, only the output of algorithm, as described as follows [16]: Conv2∼Conv5 after nonlinear activation is used as the u u L p, u, t , v 􏼁 � L (p, u) + λ[u≥ 1]L t , v􏼁 . (6) cls box reference feature map for the current stage; then, Con- v2–Conv5 are obtained [17]. In the formula, the classiﬁcation loss function is as *e positive and negative labels of the anchor are set follows: based on the current anchor intersection ratio (IOU) to the L (p, u) � −log p . (7) actual position of the target to train the RPN network. *e cls u modiﬁed RPN network slides a network head at all levels of Bound regression loss function is as follows: the FPN network to determine the regional location that u u may contain the target to be tested, and the improved RPN L t , v􏼁 � 􏽘 smooth t − v􏼁 , box i network maintains a high level of parameter sharing. i∈ x,y,w,h { } *e improved low-level feature map of the FPN feature (8) ⎧ ⎨ 0.5x , if |x< 1|, pyramid has high-resolution characteristics, can extract smooth (x) � Li deep semantic information, and can achieve accurate re- |x| − 0.5, otherwise. trieval of multiscale and small goals. From the formulas, the purpose of the faster R-CNN training is to minimize the loss function L , minimality of L . cls box 3.1. FPN Multiscale Features. *e FPN network has an A target mask is also set to meet the application requirements. advantage in multiscale target detection to introduce it into x − x 􏼁 ∗ a the faster R-CNN that can further improve the model t � , adaptability to multiscale target, small target detection based on maintaining the eﬃciency of model detection. *e FPN y − y 􏼁 ∗ a network is able to accept pictures of any size, conﬁgure them t � , in the CNN convolutional neural architecture, and can more (9) eﬀectively extract deep feature maps. According to the re- w 􏼁 ∗ quirements of each convolutional layer of CNN, the feature t � log , graph corresponding to the proportional size is then output respectively, thus establishing the feature graph pyramid. It h 􏼁 is shown here that the FPN network is able to output feature t � log . maps of diﬀerent scales and also to integrate feature maps at diﬀerent layers. *e realization process of this function is ∗ ∗ ∗ ∗ In the formula, x , y , w , h represent the center that the FPN network ﬁrst sorts the feature maps of all layers abscissa, center ordinate, width, and height of the target label in the CNN network, then magniﬁes the length and width of box. the deeper feature maps to 2 times of the original, and adds to the feature maps of the corresponding shallow layer, thus realizing the fusion operation between the feature maps at 2.3. Principle of the Pooling of the Spatial Pyramid. Space diﬀerent levels. After this operation, the shallow feature map pyramid pooling (SPP) can map local features and fuse the contains both deep semantic information and a high-res- space of diﬀerent dimensions with the advantage of the olution ratio, thus improving the detection accuracy for ability to generate ﬁxed-size feature vectors and improve the multiscale targets. 4 Computational Intelligence and Neuroscience Fixed length feature expression Feature fusion Characteristic diagram input image Figure 4: Principle of spatial pyramid pooling. Regional suggestion box Improved RPN extraction ROIs Target ROI Pooling detector Building Input original Output test Characteristic diagram ResNet extracting FPN feature picture results corresponding to different deep feature map pyramid levels of FPN Figure 5: Basic framework of improved fast R-CNN. 1 1 3.2. /e Multiscale Multiplayer Aerobics Action Target De- ∗ ∗ ∗ L p , t � 􏽘 L p , p + λ 􏽘 p L t , t . 􏼈 􏼉 􏼈 􏼉􏼁 􏼁 􏼁 1 i i cls I i reg 1 i i i i tection Algorithm of CNN. After above optimized pro- N N cls reg i i cessing, the ﬁnally established multiscale RPN can (10) accurately extract multiscale ROIs as well as multiscale human targets. In the formula, i refers to the serial number of anchors in a small training batch, L refers to the total loss function of the RPN stage, p is the probability of predicting the i 3.2.1. ROIs Extracted from Multiscale RPN. Using the FPN anchor as the target, t is the RPN stage prediction, t is the i i for the RPN stage is able to extract multiscale candidate real boundary box position of the target, using p to regional ROIs. In diﬀerent layers of the FPN, ROIs were discriminate positive and negative anchors, N indi- cls extracted from sliding anchor point, and the score and cating the size of a small batch of training, N is the reg regression position of each candidate region were deter- number of anchors, λ as the balance parameter, and L reg 1 mined. Although the improved RPN stage contains multi- is the regression loss function which uses the smooth L layer feature maps, there is no need to extract ROIs loss. independently. Instead, pooling the anchors is acquired by Considering that the feature maps at each layer of the the layers within the same set, from which higher scoring FPN have diﬀerent scales, this subsection uses only a single- regions were selected as ROIs. *e loss function cited in this scale anchor, such can enable the improvement of the process refers to [18] multiscale target detection eﬀect. Computational Intelligence and Neuroscience 5 Using the IOU as the classiﬁcation basis, the anchors CPUE5-2678v3@2.50 GHz, NVIDIA GeForceGTX 1080 Ti, contain two categories as positive and negative anchors. and 32 GB of memory. *e deep learning framework used in However, if there are only small-scale targets in the image and this experiment is that TensorFlow-GPU 1.10, which is the number of small-scale targets is small, the ratio of negative equipped with cuDNN 6.0, CUDA 8.0 and has an Ana- and positive anchors obtained based on the RPN network is conda3 version of the Python library as well as Python 3.5. too large, which means that the extracted background se- To meet the needs of target detection of aerobics mantic information is too rich but aﬀects the feature ex- movements, the aerobics action of this experiment was traction eﬀect on the foreground target, and the obtained equipped mainly acquired through the collection. target detector for the foreground target identiﬁcation eﬀect is not ideal. To avoid this problem, it is necessary to reasonably limit the number ratio of positive and negative anchors 4.2. Training Results and Analysis. *is experiment selects generated in the RPN stage to prevent interference with the the faster R-CNN model and uses the feature map of various target detection eﬀect due to the too large ratio. FPN layers to detect the human targets, in order to achieve *is paper studies the problem of aerobics action object the multiscale target detection eﬀect. To achieve the goal, the detection. Combined with the morphological characteristics number ratio of positive and negative anchors in the RPN is of the aerobics action, the anchor aspect ratio is set at 1 : 2, 1 : limited in this experiment to prevent interference with the 2, 1 :1, 2 :1, and 1 : 3 in the paper, which can maintain the human object detection eﬀect due to the too large ratio. detection eﬃciency and detection eﬀect of the neural net- Combined with the morphological characteristics of the work for the aerobics action target. human body, the anchor aspect ratio was adjusted, and a 3 :1 proportional scale was added, which can more accurately identify the human targets. After this improvement, coor- 3.2.2. Multiscale Aerobics Action Target Detector. *e aer- dinating with INRIA, PETS 2009, and Caltech, three stan- obics action target detection network uses the FPN feature dard datasets, the aerobics action target can be detected pyramid to extract the characteristics of the aerobics action eﬀectively. target. *e realization process is according to the scale of the Taking the above dataset as an example, the TensorBoard ROls output by RPN, corresponding to the corresponding tool is used to demonstrate the training process of neural layer of the feature pyramid, and it then extracts the target networks, speciﬁcally as follows: features. Deep feature maps have been extracted at the FPN First, the visualization results of the feature maps of each stage, where it is only necessary to extract ﬁxed-size target layer of the FPN network are shown in Figure 6. feature maps using ROI pooling and then input the extracted Figure 7(a) shows the feature map output of layers 2–5 of features into the fast R-CNN target detector. *e target the feature extraction network, with the layer map of C2–C5 detector has 2 full connection layers in the front to assess the from bottom to top; Figure 7(b) shows the feature map conﬁdence of the target while performing a regression output of layers 2–5 obtained after the introduction of FPN analysis of the target region. *e loss function of this and layers P2–P5 from bottom to top. In comparison, al- procedure is listed in the below equation [19, 20]: though the semantic information contained in the C5 layer n M and P5 layer feature maps is basically similar, there is a large L p, u, t , v􏼁 � L (p, u) + δ[u≥ 1]L 􏼐t , v􏼑. (11) 2 ds2 loc2 gap between the semantic information contained in the C2 layer and P2 layer feature maps. In the formula, L represents the total loss function of the Raw images were input into the RPN network, and the second stage target detection, L represents the classiﬁ- cls2 RPN network detected the candidate region ROIs and to cation loss, L represents the smooth L regression loss loc2 score each candidate region ROIs. After the introduction of function, [u≥ 1] represents the rank indicator function, u the FPN pyramid architecture, four-scale feature maps are represents the true category of the target, p represents the output at the RPN stage, corresponding to four layers, and conﬁdence level of the predicted target, t and v represent the number of ROIs output from each layer is shown in the prediction boundary box corresponding to the u cate- Figure 7. gory and the position of the real boundary box respectively, In Figure 7, the number of ROIs in P2∼P5 layers of FPN and δ represents the equilibrium parameter. is shown, respectively, in which the number of ROIs in Previous studies are conducted on whether target de- lower-level feature maps is signiﬁcantly greater than the tectors at the fast R-CNN stage share parameters at diﬀerent number of ROIs in high-level feature maps. *e cause is that FPN layers, conﬁrming that the diﬀerences between diﬀerent the Caltech dataset used for training contains a large number layers were small. *us, this paper decides to share the of small-scale targets, while the number of large-scale targets weights between diﬀerent layers of the feature pyramid is signiﬁcantly less, which interferes with human target hierarchy, such can eﬀectively improve the target detection detection. eﬃciency. Variation tendency in the values of various loss functions in the improved RPN stage is shown in Figure 8. 4. Experimental Analysis For the analysis combined with Figure 8, the value of the 4.1. Experimental Environment and Dataset. *is experi- RPN loss function ﬂuctuates as the number of training it- erations increases, but the overall trend remains to decrease, ment uses the Ubuntu 16.04 version of the Linux system, and the server conﬁguration conditions are Intel@Xeon@ and the total loss value of the RPN stage decreases from 0.12 6 Computational Intelligence and Neuroscience C5: P5: C4: P4: C3: P3: C2: P2: (a) (b) Figure 6: Comparison of the feature maps at each level of the FPN network. (a) Original feature map. (b) FPN feature map. 140 30 -20 0 0.000 10.00 k 20.00 k 30.00 k 40.00 k 50.00 k 0.000 10.00 k 20.00 k 30.00 k 40.00 k 50.00 k (a) (b) 0.5 0.4 0.3 0.2 0.1 0 0 0.000 10.00 k 20.00 k 30.00 k 40.00 k 50.00 k 0.000 15.00 k 30.00 k 45.00 k (c) (d) Figure 7: *e number P2∼P5 of ROIs from multifeature graph. (a) P2. (b) P3. (c) P4. (d) P5. Computational Intelligence and Neuroscience 7 0.1 0.1 0.08 0.08 0.06 0.06 0.04 0.04 0.02 0.02 0 0 0.000 15.00 k 30.00 k 45.00 k 0.000 15.00 k 30.00 k 45.00 k 0.12 0.08 0.04 0.000 15.00 k 30.00 k 45.00 k Figure 8: Loss function value in the RPN stage. Combined with Figure 11, with the increasing of number to 0.02. *us, it is seen that the trained ROIs regions suggest that the network can accurately extract human target can- of training iterations, the updated weight decreases from didate region to enter higher-rated ROIs into the second 0.627 to 0.581. stage fast R-CNN target detection framework, then output two-loss values, namely, boundary box regression loss and 4.3. Comparison of Experimental Results. *e baseline faster classiﬁed loss, with trends shown in Figure 9. For the analysis combined with Figure 9, with the R-CNN is compared. Meanwhile, average accuracy was used as the algorithm performance (AP) evaluation index, with an number of training iterations increases, all three curves show some volatility, but the overall trend decreases, and the total AP value equal to the lower area of the PR curve. From the perspective of the application eﬀect, by combining the loss value in the fast R-CNN phase decreases from 0.9 to 0.2. It can be seen that the ability of the target detection network precision-recall diaries under diﬀerent threshold conditions, the comprehensive index AP of the target detection accuracy to detect human targets is gradually ﬁtted to the datasets is obtained, which can objectively and comprehensively used for training. *e trend of overall loss function values of the entire evaluate the algorithm detection accuracy. *e average ac- curacy AP values of the baseline faster R-CNN, CNN, neural network is outlined from the trial data, as shown in Figure 10. R-CNN, and Mask R-CNN, the method in this paper summarized in Figure 12. Combined with Figure 11, as the number of iterative training increases, the overall loss value of the whole neural According to Figure 12, the average accuracy AP value of the baseline faster R-CNN is 90.7%, the average ac- network decreases from 1.7 to 0.8, which indicates that the trained neural network can more accurately detect aerobics curacy AP value of the proposed algorithm is 96.5%, 91.1%, and 88.6%. By comparison, the method of this movement targets. paper is based on faster R-CNN. Object features were To circumvent overﬁtting during training, this experi- extracted by using FPN, combined with the correlation ment limited the weight attenuation, that is, to maintain a optimization algorithm to achieve a higher ground de- negative correlation with the update weights of the neural network and the number of training iterations during tection accuracy generally and signiﬁcantly outperformed the detection accuracy of the baseline faster R-CNN training, under which the trend of update weights is shown in Figure 11 below: model. 8 Computational Intelligence and Neuroscience 0.3 0.6 0.2 0.4 0.1 0.2 0 0 0.000 15.00 k 30.00 k 45.00 k 0.000 10.00 k 20.00 k 30.00 k 40.00 k 50.00 k (a) (b) 0.8 0.6 0.4 0.2 0.000 10.00 k 20.00 k 30.00 k 40.00 k 50.00 k (c) Figure 9: Loss function value in the second stage detection. (a) Classiﬁed losses in the second stage. (b) Returned losses in the second stage. (c) Total losses in the second stage. 0.635 1.6 0.62 1.2 0.605 0.8 0.59 0.4 0 0.575 0.000 10.00 k 20.00 k 30.00 k 40.00 k 50.00 k 0.000 10.00 k 20.00 k 30.00 k 40.00 k 50.00 k Figure 10: Loss function value of total neural network. Figure 11: Neural network weight attenuation. Computational Intelligence and Neuroscience 9 100 [2] M. Rajasekar, A. Celine Kavida, and M. Anto Bennet, “A pattern analysis based underwater video segmentation system for target object detection,” Multidimensional Systems and Signal Processing, vol. 31, no. 4, pp. 1579–1602, 2020. [3] H. Zheng, J. Chen, L. Chen, Y. Li, and Z. Yan, “Feature enhancement for multi-scale object detection,” Neural Pro- cessing Letters, vol. 51, no. 1, pp. 1907–1919, 2020. [4] B. Cui and J.-C. Cre´put, “A systematic algorithm for moving object detection with application in real-time surveillance,” SN Computer Science, vol. 1, no. 2, Faster CNN R-CNN Mask Proposed pp. 164–180, 2020. R-CNN R-CNN method [5] S. Singh Sengar and S. Mukhopadhyay, “Moving object de- algorithm tection using statistical background subtraction in wavelet Figure 12: Baseline fast R-CNN and proposed method. compressed domain,” Multimedia Tools and Applications: An International Journal, vol. 79, no. 12, pp. 5919–5940, 2020. [6] Y. He, Z. Zhou, L. Tian, Y. Liu, and X. Luo, “Brown rice 5. Conclusion planthopper (nilaparvata lugens stal) detection based on deep learning,” Precision Agriculture, vol. 21, no. 6, pp. 1385–1402, In summary, it is diﬃcult to adapt to the detection needs of multiscale and small target application scenarios in the [7] Z. Yan, H. Zheng, Y. Li, and L. Chen, “Detection-oriented process of aerobics action target detection. *erefore, this backbone trained from near scratch and local feature re- paper proposes a multiscale and multitarget aerobics action ﬁnement for small object detection,” Neural Processing Let- target detection algorithm based on CNN network archi- ters, vol. 53, no. 3, pp. 1921–1943, 2021. tecture. After introducing the two-stage target detection [8] Y. Feng, L. Wang, and M. Zhang, “A multi-scale target de- framework of the faster R-CNN model, this paper system- tection method for optical remote sensing images,” Multi- atically analyzes the process of using the ResNet deep main media Tools and Applications, vol. 78, no. 7, pp. 8751–8766, network to extract and utilize the process of multiscale target feature extraction by using the FPN network. *en, the FPN [9] G. L. Hung, M. S. B. Sahimi, H. Samma, T. A. Almohamad, and B. Lahasan, “Faster R-CNN deep learning model for feature pyramid was fused separately in the two stages of the pedestrian detection from drone images,” SN Computer faster R-CNN framework to construct a multiscale RPN Science, vol. 1, no. 2, pp. 17–23, 2020. proposal network as well as a multiscale aerobics action [10] H. You, L. Yu, S. Tian et al., “MC-Net: multiple max-pooling objective detector to set the number of positive and negative integration module and cross multi-scale deconvolution anchors reasonably is conducive to improving the detection network,” Knowledge-Based Systems, vol. 231, Article ID eﬃciency of the aerobics action targets of the algorithm. 107456, 2021. Limiting the vertical and horizontal ratio of anchors sci- [11] C. Yan, G. Pang, X. Bai, J. Zhou, and L. Gu, “Beyond triplet entiﬁcally and also optimizing the overall neural network loss: person Re-identiﬁcation with ﬁne-grained diﬀerence- can improve the detection eﬀect of multiscale and multi- aware pairwise loss,” IEEE Transactions on Multimedia, p. 1, person aerobics movements. Finally, in this paper, baseline faster R-CNN methods were used for comparison experi- [12] G. Wang, W. Li, L. Zhang et al., “Encoder-X: solving unknown ments separately. Finally, conﬁrming the baseline faster coeﬃcients automatically in polynomial ﬁtting by using an R-CNN, this method in the paper can achieve preferable autoencoder,” IEEE Transactions on Neural Networks and detection results. *e innovation of this study is to identify Learning Systems, pp. 1–13, 2021. the aerobics movement from the perspective of target de- [13] B. Liu, J. Luo, and H. Huang, “Toward automatic quantiﬁ- cation of knee osteoarthritis severity using improved faster tection and recognition so as to provide more reference R-CNN,” International Journal of Computer Assisted Radi- paths for the auxiliary training of sports. ology and Surgery, vol. 15, no. 9, pp. 457–466, 2020. [14] Z. Xiao, L. Pei, L. Geng, Y. Sun, F. Zhang, and J. Wu, “Surface Data Availability parameter measurement of braided composite preform based on faster R-CNN,” Fibers and Polymers, vol. 21, no. 3, *e data are available from the corresponding author upon pp. 590–603, 2020. request. [15] S. Tu, J. Pang, H. Liu et al., “Passion fruit detection and counting based on multiple scale faster R-CNN using RGB-D Conflicts of Interest images,” Precision Agriculture, vol. 21, no. 5, pp. 1072–1091, *e author declares no conﬂicts of interest regarding this [16] M. Quintana, S. Karaoglu, F. Alvarez, J. Menendez, and work. T. Gevers, “*ree-D wide faces (3DWF): facial landmark detection and 3D reconstruction over a new RGB-D multi- camera dataset,” Sensors, vol. 19, no. 5, p. 1103, 2019. References [17] Z. Zhong, L. Sun, and Q. Huo, “An anchor-free region [1] L. Shao, F. Nagata, H. Ochi et al., “Visual feedback control of proposal network for faster R-CNN-based text detection approaches,” International Journal on Document Analysis and quadrotor by object detection in movies,” Artiﬁcial Life and Robotics, vol. 25, no. 3, pp. 488–494, 2020. Recognition (IJDAR), vol. 22, no. 3, pp. 315–327, 2019. Detection accuracy 10 Computational Intelligence and Neuroscience [18] T. Zhou, Z. Li, and C. Zhang, “Enhance the recognition ability to occlusions and small objects with robust faster R-CNN,” International Journal of Machine Learning and Cybernetics, vol. 10, no. 11, pp. 3155–3166, 2019. [19] W. Tang, D. Zou, S. Yang, J. Shi, J. Dan, and G. Song, “A two- stage approach for automatic liver segmentation with faster R-CNN and deeplab,” Neural Computing and Applications, vol. 32, no. 11, pp. 6769–6778, 2020. [20] A. M. Wildridge, P. C. *omson, S. C. Garcia, E. C. Jongman, and K. L. Kerrisk, “Transitioning from conventional to au- tomatic milking: eﬀects on the human-animal relationship,” Journal of Dairy Science, vol. 103, no. 2, pp. 1608–1619, 2020.

Journal

Computational Intelligence and Neuroscience – Hindawi Publishing Corporation

Published: Jan 5, 2022

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Detection of Aerobics Action Based on Convolutional Neural Network

Detection of Aerobics Action Based on Convolutional Neural Network

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Detection of Aerobics Action Based on Convolutional Neural Network

Detection of Aerobics Action Based on Convolutional Neural Network

References (21)

Abstract

Journal

Recommended Articles

There are no references for this article.

Our policy towards the use of cookies