Access the full text.
Sign up today, get DeepDyve free for 14 days.
applied sciences Editorial Artiﬁcial Intelligence for Multimedia Signal Processing 1 , 2 Byung-Gyu Kim * and Dong-San Jun Department of IT Engineering, Sookmyung Women’s University, Seoul 04310, Korea Department of Computer Engineering, Dong-A University, Busan 49315, Korea; email@example.com * Correspondence: firstname.lastname@example.org 1. Introduction At the ImageNet Large Scale Visual Re-Conversion Challenge (ILSVRC), a 2012 global image recognition contest, the University of Toronto Supervision team led by Prof. Geoffrey Hinton took ﬁrst and second place by a landslide, sparking an explosion of interest in deep learning. Since then, global experts and companies such as Google, Microsoft, nVidia, and Intel have been competing to lead artiﬁcial intelligence technologies, such as deep learning. Now, they are developing deep-learning-based technologies that can applied to all industries to solve many classiﬁcation and recognition problems. These artiﬁcial intelligence technologies are also actively applied to broadcasting and multimedia processing technologies based on recognition and classiﬁcation [1–3]. A vast amount of research has been conducted in a wide variety of ﬁelds, such as content creation, transmission, and security, and attempts have been made in the past two to three years to improve image, video, speech, and other data compression efﬁciency in areas related to MPEG media processing technology [4–6]. Additionally, technologies such as media creation, processing, editing, and creating scenarios are very important areas of research in multimedia processing and engineering. In this issue, we present excellent papers related to advanced computational intelligence algorithms and technologies for emerging multimedia processing. Citation: Kim, B.-G.; Jun, D.-S. 2. Emerging Multimedia Signal Processing Artiﬁcial Intelligence for Multimedia Thirteen papers related to artiﬁcial intelligence for multimedia signal processing have Signal Processing. Appl. Sci. 2022, 12, been published in this Special Issue. They deal with a broad range of topics concerning 7358. https://doi.org/10.3390/ advanced computational intelligence algorithms and technologies for emerging multimedia app12157358 signal processing. Received: 15 July 2022 We present the following works in relation to the computer vision ﬁeld. Lee et al. Accepted: 21 July 2022 propose a densely cascading image restoration network (DCRN) consisting of an input layer, Published: 22 July 2022 a densely cascading feature extractor, a channel attention block, and an output layer . The densely cascading feature extractor has three densely cascading (DC) blocks, and each Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in DC block contains two convolutional layers. From this design, they achieved better quality published maps and institutional afﬁl- measures for the compressed joint photographic experts group (JPEG) images compared iations. with the existing methods. In , an image de-raining approach is developed using the generative capabilities of recently introduced conditional generative adversarial networks (cGANs). This method could be very useful to recover visual quality when degraded due to diverse weather conditions, recording conditions, or motion blur. Copyright: © 2022 by the authors. Additionally, Wu et al. suggest a framework to leverage the sentimental interaction Licensee MDPI, Basel, Switzerland. characteristic based on a graph convolutional network (GCN) . They ﬁrst utilize an This article is an open access article off-the-shelf tool to recognize the objects and build a graph over them. Visual features distributed under the terms and are represented as nodes, and the emotional distances between the objects act as edges. conditions of the Creative Commons Then, they employ GCNs to obtain the interaction features among the objects, which are Attribution (CC BY) license (https:// fused with the CNN output of the whole image to predict the result. This approach is creativecommons.org/licenses/by/ very useful to analyze human sentiment analysis. In , two lightweight neural networks 4.0/). Appl. Sci. 2022, 12, 7358. https://doi.org/10.3390/app12157358 https://www.mdpi.com/journal/applsci Appl. Sci. 2022, 12, 7358 2 of 3 with a hybrid residual and dense connection structure are suggested by Kim et al. to improve super-resolution performance. They show that the proposed methods could signiﬁcantly reduce both the inference speed and the memory required to store parameters and intermediate feature maps, while maintaining similar image quality compared to the previous methods. Kim et al. propose an efﬁcient scene classiﬁcation algorithm for three different classes by detecting objects in the scene . The authors utilize a pre-trained semantic segmenta- tion model to extract objects from an image. After that, they construct a weighting matrix to better determine the scene class. Finally, this classiﬁes an image into one of three scene classes (i.e., indoor, nature, city) using the designed weighting matrix. This technique can be utilized for semantic searches in multimedia databases. Lastly, an estimation method for human height is proposed by Lee et al. using color and depth information . They use color images for deep learning by mask R-CNN to detect a human body and a human head separately. If color images are not available for extracting the human body region due to a low light environment, then the human body region is extracted by comparison with the current frame in the depth video. For speech, sound, and text processing, Lin et al. improve the raw-signal-input network from other research using deeper network architectures . They also propose a network architecture that can combine different kinds of network feeds with different features. In the experiment, the proposed scheme achieves an accuracy of 73.55% in the open audio dataset, “Dataset for Environmental Sound Classiﬁcation 50” (ESC50). A multi-scale discriminator that discriminates between real and generated speech at various sampling rates is devised by Kim et al. to stabilize GAN training . In this paper, the proposed structure is compared with conventional GAN-based speech enhancement algorithms using the VoiceBank-DEMAND dataset. They show that the proposed approach can make the training faster and more stable. To translate the speech, a multimodal unsupervised scheme is proposed by Lee and Park . They make a variational autoencoder (VAE)-based speech conversion network by decomposing the spectral features of the speech into a speaker-invariant content factor and a speaker-speciﬁc style factor to estimate diverse and robust speech styles. This approach can help second language (L2) speech education. To develop a 3D avatar-based sign language learning system, Chakladar et al. suggest a system that converts the input speech/text into corresponding sign movements for Indian Sign Language (ISL) . The translation module achieves a 10.50 SER (sign error rate) score in the actual test. Two papers concern content analysis and information mining. The ﬁrst one, by Krishna Kumar Thirukokaranam Chandrasekar and Steven Verstockt, regards a context- based structure mining pipeline . The proposed scheme not only attempts to enrich the content, but also simultaneously splits it into shots and logical story units (LSU). They demonstrate quantitatively that the pipeline outperforms existing state-of-the-art methods for shot boundary detection, scene detection, and re-identiﬁcation tasks. The other paper outlines a framework which can learn the multimodal joint representation of pins, including text representation, image representation, and multimodal fusion . In this work, the authors combine image representations and text representations in a multimodal form. It is shown that the proposed multimodal joint representation outperforms unimodal representation in different recommendation tasks. For ECG signal processing, Tanoh and Napoletano propose a 1D convolutional neural network (CNN) that exploits a novel analysis of the correlation between the two leads of the noisy electrocardiogram (ECG) to classify heartbeats . This approach is one-dimensional, enabling complex structures while maintaining reasonable computational complexity. I hope that the technical papers published in this Special Issue can help researchers and readers to understand the emerging theories and technologies in the ﬁeld of multimedia signal processing. Funding: This research received no external funding. Appl. Sci. 2022, 12, 7358 3 of 3 Acknowledgments: We thank all authors who submitted excellent research work to this Special Issue. We are grateful to all reviewers who contributed evaluations of scientiﬁc merits and quality of the manuscripts and provided countless valuable suggestions to improve their quality and the overall value for the scientiﬁc community. Our special thanks go to the editorial board of MDPI Applied Sciences journal for the opportunity to guest edit this Special Issue, and to the Applied Sciences Editorial Ofﬁce staff for the hard and precise work required to keep to a rigorous peer-review schedule and complete timely publication. Conﬂicts of Interest: The authors declare no conﬂict of interest. References 1. Kim, J.-H.; Hong, G.-S.; Kim, B.-G.; Dogra, D.P. deepGesture: Deep Learning-based Gesture Recognition Scheme using Motion Sensors. Displays 2018, 55, 38–45. [CrossRef] 2. Kim, J.-H.; Kim, B.-G.; Roy, P.P.; Jeong, D.-M. Efﬁcient Facial Expression Recognition Algorithm Based on Hierarchical Deep Neural Network Structure. IEEE Access 2019, 7, 2907327. [CrossRef] 3. Jeong, D.; Kim, B.-G.; Dong, S.-G. Deep Joint Spatio-Temporal Network (DJSTN) for Efﬁcient Facial Expression Recognition. Sensors 2020, 20, 1936. [CrossRef] [PubMed] 4. Lee, Y.; Jun, D.; Kim, B.-G.; Lee, H. Enhanced Single Image Super Resolution Method using a Lightweight Multi-scale Channel Dense Network for Small Object Detection. Sensors 2021, 21, 3351. [CrossRef] [PubMed] 5. Park, S.-J.; Kim, B.-G.; Chilamkurti, N. A Robust Facial Expression Recognition Algorithm Based on Multi-Rate Feature Fusion Scheme. Sensors 2021, 21, 6954. [CrossRef] [PubMed] 6. Choi, Y.-J.; Lee, Y.-W.; Kim, B.-G. Residual-based Graph Convolutional Network (RGCN) for Emotion Recognition in Conversation (ERC) for Smart IoT. Big Data 2021, 9, 279–288. [CrossRef] [PubMed] 7. Lee, Y.; Park, S.-H.; Rhee, E.; Kim, B.-G.; Jun, D. Reduction of Compression Artifacts Using a Densely Cascading Image Restoration Network. Appl. Sci. 2021, 11, 7803. [CrossRef] 8. Hettiarachchi, P.; Nawaratne, R.; Alahakoon, D.; De Silva, D.; Chilamkurti, N. Rain Streak Removal for Single Images Using Conditional Generative Adversarial Networks. Appl. Sci. 2021, 11, 2214. [CrossRef] 9. Wu, L.; Zhang, H.; Deng, S.; Shi, G.; Liu, X. Discovering Sentimental Interaction via Graph Convolutional Network for Visual Sentiment Prediction. Appl. Sci. 2021, 11, 1404. [CrossRef] 10. Kim, S.; Jun, D.; Kim, B.-G.; Lee, H.; Rhee, E. Single Image Super-Resolution Method Using CNN-Based Lightweight Neural Networks. Appl. Sci. 2021, 11, 1092. [CrossRef] 11. Yeo, W.-H.; Heo, Y.-J.; Choi, Y.-J.; Kim, B.-G. Place Classiﬁcation Algorithm Based on Semantic Segmented Objects. Appl. Sci. 2020, 10, 9069. [CrossRef] 12. Lee, D.-S.; Kim, J.-S.; Jeong, S.C.; Kwon, S.-K. Human Height Estimation by Color Deep Learning and Depth 3D Conversion. Appl. Sci. 2020, 10, 5531. [CrossRef] 13. Lin, Y.-K.; Su, M.-C.; Hsieh, Y.-Z. The Application and Improvement of Deep Neural Networks in Environmental Sound Recognition. Appl. Sci. 2020, 10, 5965. [CrossRef] 14. Kim, H.Y.; Yoon, J.W.; Cheon, S.J.; Kang, W.H.; Kim, N.S. A Multi-Resolution Approach to GAN-Based Speech Enhancement. Appl. Sci. 2021, 11, 721. [CrossRef] 15. Lee, Y.K.; Park, J.G. Multimodal Unsupervised Speech Translation for Recognizing and Evaluating Second Language Speech. Appl. Sci. 2021, 11, 2642. [CrossRef] 16. Das Chakladar, D.; Kumar, P.; Mandal, S.; Roy, P.P.; Iwamura, M.; Kim, B.-G. 3D Avatar Approach for Continuous Sign Movement Using Speech/Text. Appl. Sci. 2021, 11, 3439. [CrossRef] 17. Thirukokaranam Chandrasekar, K.K.; Verstockt, S. Context-Based Structure Mining Methodology for Static Object Re- Identiﬁcation in Broadcast Content. Appl. Sci. 2021, 11, 7266. [CrossRef] 18. Liu, H.; Deng, S.; Wu, L.; Jian, M.; Yang, B.; Zhang, D. Recommendations for Different Tasks Based on the Uniform Multimodal Joint Representation. Appl. Sci. 2020, 10, 6170. [CrossRef] 19. Tanoh, I.-C.; Napoletano, P. A Novel 1-D CCANet for ECG Classiﬁcation. App. Sci. 2021, 11, 2758. [CrossRef]
Applied Sciences – Multidisciplinary Digital Publishing Institute
Published: Jul 22, 2022
Access the full text.
Sign up today, get DeepDyve free for 14 days.