Access the full text.

Sign up today, get DeepDyve free for 14 days.

Photonics
, Volume 9 (3) – Mar 9, 2022

/lp/multidisciplinary-digital-publishing-institute/self-supervised-deep-learning-for-improved-image-based-wave-front-BF57f13DQj

- Publisher
- Multidisciplinary Digital Publishing Institute
- Copyright
- © 1996-2022 MDPI (Basel, Switzerland) unless otherwise stated Disclaimer The statements, opinions and data contained in the journals are solely those of the individual authors and contributors and not of the publisher and the editor(s). MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. Terms and Conditions Privacy Policy
- ISSN
- 2304-6732
- DOI
- 10.3390/photonics9030165
- Publisher site
- See Article on Publisher Site

hv photonics Communication Self-Supervised Deep Learning for Improved Image-Based Wave-Front Sensing 1 , 2 , 3 1 , 2 , 3 1 , 2 , 3 1 , 2 , 3 1 , 2 , 3 1 , 2 , 3 , Yangjie Xu , Hongyang Guo , Zihao Wang , Dong He , Yi Tan and Yongmei Huang * Key Laboratory of Optical Engineering, Chinese Academy of Sciences, No. 1 Guangdian Road, Chengdu 610209, China; xuyangjie17@mails.ucas.ac.cn (Y.X.); guohongyang15@mails.ucas.ac.cn (H.G.); wangzihao201@mails.ucas.ac.cn (Z.W.); hedong@ioe.ac.cn (D.H.); tanyi@ioe.ac.cn (Y.T.) Institute of Optics and Electronics, Chinese Academy of Sciences, Chengdu 610209, China School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing 100049, China * Correspondence: huangym@ioe.ac.cn Abstract: Phase retrieval from supervised learning neural networks is restricted due to the problem of obtaining labels. To address this situation, in the present paper, we propose a phase retrieval model of self-supervised physical deep learning combined with a complete physical model to represent the image-formation process. The model includes two parts: one is MobileNet V1, which is used to map the input samples to the Zernike coefﬁcients, the other one is an optical imaging system and it is used to obtain the point spread function for training the model. In addition, the loss function is calculated based on the similarity between the input and the output to realize self-supervised learning. The root-mean-square (RMS) of the wave-front error (WFE) between the input and reconstruction is 0.1274 waves in the situation of D/r0 = 20 in the simulation. By comparison, The RMS of WFE is 0.1069 waves when using the label to train the model. This method retrieves numerous wave- front errors in real time in the presence of simulated detector noise without relying on label values. Moreover, this method is more suitable for practical applications and is more robust than supervised learning. We believe that this technology has great applications in free-space optical communication. Citation: Xu, Y.; Guo, H.; Wang, Z.; Keywords: self-supervised learning; free-space optical communication; phase retrieval He, D.; Tan, Y.; Huang, Y. Self-Supervised Deep Learning for Improved Image-Based Wave-Front Sensing. Photonics 2022, 9, 165. 1. Introduction https://doi.org/10.3390/ photonics9030165 As shown in Figure 1, space light needs to be coupled into single-mode ﬁber at the re- ceiving end in free-space optical communication (FSOC). However, wave-front aberrations Received: 7 February 2022 generated by atmospheric turbulence affect the beam quality, then deteriorate the ﬁber Accepted: 4 March 2022 coupling efﬁciency and the quality of communication. Therefore, it is necessary to detect Published: 9 March 2022 and correct the wave-front aberration. The measurement of the wave-front aberration in Publisher’s Note: MDPI stays neutral FSOC differs from other scenarios in that the wave-front aberration changes continuously. with regard to jurisdictional claims in Therefore, an accurate real-time measurement is required in FSOC. There are two main published maps and institutional afﬁl- methods for measuring wave-front aberrations. The ﬁrst method uses additional wave- iations. front sensors, such as a Hartmann sensor or interferometer [1–3], to monitor the wave-front slope. The other method uses the spot quality as the objective function and optimizes the objective function with continuous iteration, such as an image-based sensor [4–7]. The real-time performance of this second method is poor, so its application range is very lim- Copyright: © 2022 by the authors. ited. Deep learning is applied to the image-based sensor in order to improve the real-time Licensee MDPI, Basel, Switzerland. performance of this method. This article is an open access article Image-based wave-front sensing uses parametric physical models and a measured distributed under the terms and point spread function (PSF) with nonlinear optimization for calculation [8–10]. Artiﬁcial conditions of the Creative Commons neural networks are nonlinear, autonomous, and regulated information-processing systems Attribution (CC BY) license (https:// for learning and generalized nonlinear mapping between inputs and outputs [11–13]. At creativecommons.org/licenses/by/ 4.0/). the end of the twentieth century, artiﬁcial neural networks were used to measure the optical Photonics 2022, 9, 165. https://doi.org/10.3390/photonics9030165 https://www.mdpi.com/journal/photonics Photonics 2021, 8, x FOR PEER REVIEW 2 of 11 neural networks are nonlinear, autonomous, and regulated information-processing sys- tems for learning and generalized nonlinear mapping between inputs and outputs [11– Photonics 2022, 9, 165 2 of 11 13]. At the end of the twentieth century, artificial neural networks were used to measure the optical phase distortion caused by atmospheric disturbance [14–16], although the structure of the neural networks in the system was too simple at that time, resulting in phase distortion caused by atmospheric disturbance [14–16], although the structure of the poor generalization. Since then, significant research has optimized the neural network neural networks in the system was too simple at that time, resulting in poor generalization. structure. Since then, signiﬁcant research has optimized the neural network structure. Figure 1. Schematic diagram of ﬁber coupling based on an image sensor in free-space optical Figure 1. Schematic diagram of fiber coupling based on an image sensor in free-space optical com- communication. PM: primary mirror; SM: secondary mirror; M = mirror; BS = beam splitter; munication. PM: primary mirror; SM: secondary mirror; M = mirror; BS = beam splitter; FSM = fast F steering mirror; DM = deformable mirror; SM = fast steering mirror; DM = deformab L le = lens; and CCD = charge-cou mirror; L = lens; and CCD = c pled hargdevice. e-coupled device. Convolutional neural networks (CNNs) [17–19] use convolutional kernels and down- Convolutional neural networks (CNNs) [17–19] use convolutional kernels and down- sampling to perform machine-learning tasks on images, which reduces the dimensionality sampling to perform machine-learning tasks on images, which reduces the dimensionality of large data volumes and retains image features. In another approach, Paine et al. used of large data volumes and retains image features. In another approach, Paine et al. used Inception V3 for phase detection [20], but the PSFs used in the training set were set Inception V3 for phase detection [20], but the PSFs used in the training set were set at the at the focal point with less effective information, which led to ﬁtting error. Nishizaki focal point with less effective information, which led to fitting error. Nishizaki proposed proposed the generalized wave-front sensing framework [21] to estimate a single intensity the generalized wave-front sensing framework [21] to estimate a single intensity distribu- distribution and showed that the proper preprocessing of a single intensity image improves tion and showed that the proper preprocessing of a single intensity image improves the the calculation and ﬁtting accuracy. calculation and fitting accuracy. Compared to the output of CNNs is the class label of the entire image, UNet is a Compared to the output of CNNs is the class label of the entire image, UNet is a pixel-level classiﬁcation that outputs the pixel category and is often used on biomedical pixel-level classification that outputs the pixel category and is often used on biomedical images [22,23]. Given that image data in this task are often small, Ciresan et al. trained images [22,23]. Given that image data in this task are often small, Ciresan et al. trained a a CNN using a sliding window to provide the pixel and its surroundings as input to CNN using a sliding window to provide the pixel and its surroundings as input to predict predict the class label of each pixel. UNets are also widely used in the wave-front detection the class label of each pixel. UNets are also widely used in the wave-front detection field ﬁeld [24–26], but the network structure is complex and too slow for real-time tasks. [24–26], but the network structure is complex and too slow for real-time tasks. To summarize, conventional supervised learning neural networks have good detec- To summarize, conventional supervised learning neural networks have good detec- tion accuracy, but the many labeled data samples for training greatly increase applica- tion accuracy, but the many labeled data samples for training greatly increase application tion costs and difﬁculty. The essence of this approach is to map PSF image pixels to costs and difficulty. The essence of this approach is to map PSF image pixels to Zernike Zernike coefﬁcients. UNet uses data enhancement to solve the problem of few training coefficients. UNet uses data enhancement to solve the problem of few training images, but images, but it cannot extract the control parameters required for phase correction, such as it cannot extract the control parameters required for phase correction, such as Zernike Zernike coefﬁcients. coefficients. Unsupervised learning uses pretexts to mine its own supervised information from Unsupervised learning uses pretexts to mine its own supervised information from large-scale unsupervised data and learns valuable representations for downstream tasks. In large-scale unsupervised data and learns valuable representations for downstream tasks. another approach, Wang proposed phase imaging with an untrained neural network [27] In another approach, Wang proposed phase imaging with an untrained neural network to iterate over a single damaged image, learn the prior information on the image, and [27] to iterate over a single damaged image, learn the prior information on the image, and restore the image. However, when the important features of an image are damaged, the image is not easily repaired. Because training in advance is not possible, every time the image is processed, it must be iterated many times to achieve an ideal result. Emrah Bostan proposed using an untrained neural network to restore the phase without a label [28]. Poor generalization is the same problem as above, because ofﬂine processing does not require Photonics 2021, 8, x FOR PEER REVIEW 3 of 11 restore the image. However, when the important features of an image are damaged, the image is not easily repaired. Because training in advance is not possible, every time the image is processed, it must be iterated many times to achieve an ideal result. Emrah Bos- Photonics 2022, 9, 165 tan proposed using an untrained neural network to restore the phase without a label 3 [2 of 8] 11. Poor generalization is the same problem as above, because offline processing does not require high real-time performance. However, high real-time performance is important in the measurement of wave-front aberration in free-space optical communication. Ramos high real-time performance. However, high real-time performance is important in the [29] proposed the use of blind convolution to restore multiple targets, but the network measurement of wave-front aberration in free-space optical communication. Ramos [29] structure is complex, the calculation time is long, and the accuracy is poor. Tobias Liaudat proposed the use of blind convolution to restore multiple targets, but the network structure proposed a paradigm shift in the data-driven modeling of the instrumental response field is complex, the calculation time is long, and the accuracy is poor. Tobias Liaudat proposed a of telescopes by adding a differentiable optical forward model into the modeling frame- paradigm shift in the data-driven modeling of the instrumental response ﬁeld of telescopes work [30]. This approach is also an unsupervised training based on optical modeling, but by adding a differentiable optical forward model into the modeling framework [30]. This the aim is to simplify the building of the instrumental response model. approach is also an unsupervised training based on optical modeling, but the aim is to To solve these problems, we propose a phase retrieval method based on self-super- simplify the building of the instrumental response model. vised physical deep learning (PDL), which maps the input data to hidden vectors through To solve these problems, we propose a phase retrieval method based on self-supervised a deep neural network and combines the optical imaging physical model to fit the output. physical deep learning (PDL), which maps the input data to hidden vectors through a In addition, a retrieval model is established according to the mapping relationship be- deep neural network and combines the optical imaging physical model to ﬁt the output. tween input and output. Compared with traditional CNNs, this method does not rely on In addition, a retrieval model is established according to the mapping relationship between input labels, and but output. only eCompar stablishe ed s t with he re traditional lationship bet CNNs, weethis n th method e collect does ed linot ght spot rely on sampl labels, es but through the only establishes intrinsicthe characteristics of t relationship between he PSF. Compared the collected with light UN spot et, the Ze samples rnike c through oeffi- the cient i intrinsic s more q characteristics uickly extraof cted, gi the PSF ving thi . Compar s method better real-time performance. This ed with UNet, the Zernike coefﬁcient is mor method a e quickly lso a extracted, voids the techni giving cal this bottl method eneck of better traditi real-time onal superv performance. ised learni This ng tha method t re- also avoids the technical bottleneck of traditional supervised learning that requires many quires many label samples, which facilitates practical applications. Compared with the label samples, which facilitates practical applications. Compared with the existing self- existing self-supervised learning network, the reasoning for new information requires supervised learning network, the reasoning for new information requires only one step and only one step and the accuracy is higher, but the method we proposed can only be applied the accuracy is higher, but the method we proposed can only be applied to the detection of to the detection of point objects. point objects. 2. Method 2. Method As shown in Figure 2, PDL includes three parts: network encoding, a hidden layer As shown in Figure 2, PDL includes three parts: network encoding, a hidden layer vector, and network decoding. vector, and network decoding. Figure 2. Structure diagram of the self-supervised learning neural network model. Figure 2. Structure diagram of the self-supervised learning neural network model. 2.1. Encoding Part 2.1. Encoding Part The encoder maps the input samples to the hidden layer vector. We tested the The encoder maps the input samples to the hidden layer vector. We tested the ImageNet classiﬁcation CNN and chose MobileNet [31], which is more suitable for mobile ImageNet classification CNN and chose MobileNet [31], which is more suitable for mobile terminals. MobileNet consists of a 3 3 standard convolution, followed by a stacked depth- terminals. MobileNet consists of a 3 × 3 standard convolution, followed by a stacked wise separable convolution, some of which involves down-sampling through strides = 2. depth-wise separable convolution, some of which involves down-sampling through Average pooling is used to change the feature to 1 1, and a fully connected layer is added strides = 2. Average pooling is used to change the feature to 1 × 1, and a fully connected according to the predicted category size. The batch size is 32. Since we are not dealing layer is added according to the predicted category size. The batch size is 32. Since we are with a classiﬁcation problem, we removed the last SoftMax layer. The core consists of a de- composable depth-wise separable convolution, which not only reduces the computational complexity of the model, but also reduces the size of the model. For real mobile application scenarios that require low latency, networks, such as MobileNet, are the focus of continuous research. Depth-wise separable convolution is a factorized convolution operation that can be decomposed into two smaller opera- tions: depth-wise convolution and pointwise convolution. Depth-wise convolution uses Photonics 2022, 9, 165 4 of 11 a different convolution kernel for each input channel and pointwise convolution uses a 1 1 convolution kernel. The former uses depth-wise convolution to separately convolve different input channels and then uses pointwise convolution to combine the above outputs. 2.2. Decoding Part The model of optical imaging system is shown in Figure 3 and it is the decoding part in the network. The object plane is regarded as the weighted superposition of primitive objects according to the imaging principle of convolution. For the image plane, this is the coherent superposition of the diffraction image produced by the corresponding object point [32]. The relationship between the image plane and the object plane can be expressed as the following convolution: g (x, y) = g (x , y ) h(x, y), (1) i o 0 0 where g (x, y) is the complex optical amplitude distribution of the image plane; g (x , y ) i o 0 0 is the complex optical amplitude of the object plane under ideal conditions; x and y are in the image plane and x and y are in the object plane; is the convolution operator; and h is 0 0 the impulse response function of the system, also called the PSF, which is used to describe the two-dimensional distribution of light on the focal plane from a point-source imaging system. For a point target, this is equivalent to a linear space-invariant incoherent imaging system. The object and image plane functions satisfy the linear transformation of intensity: I (x, y) = I (u, v)h(x u, v y)dudv, (2) where I is the intensity distribution system at the image plane; I is the ideal intensity i o distribution system at the object plane; and u and v are the coordinates of the pupil plane. In the case of point targets, the object–image relation of the system and the PSF in the case of the diffraction limitation is expressed as g (x, y) = h(x, y) = jFFTfP(u, v)gj , (3) where FFT is the fast Fourier transform operator and P is the generalized pupil function. In the case of aberration, the pupil function on the image plane at the exit-pupil position is expressed as 2pif(u,v) P(u, v) = O(u, v)e , (4) where O(u, v) is the aperture function; the value inside the aperture is unity; the value outside the aperture is zero; and f(u, v) is the phase-distribution function at the pupil plane. The Zernike mode method makes use of a set of polynomials that reﬂect the orthogonal properties of a circular region. Therefore, the Zernike mode method is usually used as an orthogonal basis for wave-front reconstruction. Wave-front phase functions (x, y) in the circular domain can be expanded into Zernike polynomials as follows: i=¥ '(x, y) = a z (x, y), (5) å i i i=1 where a is the coefﬁcient of the pattern of term i and z is the Zernike pattern corresponding i i to term i. The matrix distribution of the Zernike coefﬁcient is composed of the all-order polynomials and the corresponding coefﬁcients: jmj N R (r) cos(mq), m 0 n n Z (r, q) = (6) jmj N R (r) sin(mq), m 0, where 0 r 1, 0 q 2p; n is a non-negative integer; the step length m = 2; the outputs jmj are betweenn and n; and R is the radial function for the polynomials. The hidden layer n Photonics 2021, 8, x FOR PEER REVIEW 5 of 11 a z where is the coefficient of the pattern of term i and is the Zernike pattern corre- i i sponding to term i. The matrix distribution of the Zernike coefficient is composed of the all-order polynomials and the corresponding coefficients: m m NR ()r cos(mθ),m ≥ 0 nn Zr (,θ) = (6) −≤ NR ()r sin(mθ),m 0, nn Photonics 2022, 9, 165 5 of 11 where 0 ≤ r ≤ 1 , 0 ≤ θ ≤ 2π ; n is a non-negative integer; the step length m = 2; the outputs are between −n and n; and R is the radial function for the polynomials. The hidden layer vector we used in the network is a Zernike coefﬁcient. The aberration can be calculated vector we used in the network is a Zernike coefficient. The aberration can be calculated with the Zernike coefﬁcients and polynomials. Then, the PSF can be obtained by the optical with the Zernike coefficients and polynomials. Then, the PSF can be obtained by the opti- imaging system for training the model. cal imaging system for training the model. Figure 3. The model of optical imaging system. Figure 3. The model of optical imaging system. 2.3. Loss Function 2.3. Loss Function We strived to make the inputs and outputs of the network consistent by using the We strived to make the inputs and outputs of the network consistent by using the correlation coefﬁcient between the input and output PSF as a loss function. The formula for correlation coefficient between the input and output PSF as a loss function. The formula calculating the correlation coefﬁcients is for calculating the correlation coefficients is (A A)(B B) å å mn mn −− m n () AA−−(B B) r = s , (7) mn m n mn r = , å å (A A) å å (B B) (7) mn22 mn −− m n m n () AA−− (B B) mn mn mn m n where A and B are the pixel values of the two PSFs, and A and B are the mean values of A − − and B, respectively. When the model training is completed, the model interferes without where A and B are the pixel values of the two PSFs, and A and B are the mean values of the decoding part. The output of the network is a Zernike coefﬁcient and can be used to A and B, respectively. When the model training is completed, the model interferes without correct aberrations. Therefore, there is no difference in the resource consumption and time the decoding part. The output of the network is a Zernike coefficient and can be used to consumption, compared to the supervised networks. correct aberrations. Therefore, there is no difference in the resource consumption and time consumption, compared to the supervised networks. 3. Simulation Demonstration 3.1. Simulation Demonstration of 3–20 Orders of Zernike Polynomials 3. Simulation Demonstration The simulation data set is generated by using the Zernike pattern method. The piston 3.1. Simulation Demonstration of 3–20 Orders of Zernike Polynomials error (The Zernike polynomial Z ) is ignored because it is the translation of the beam phase The simulation data set is generated by using the Zernike pattern method. The piston and does not affect the beam quality. The tip and tilt terms (Z and Z ), which can be error (The Zernike polynomial 𝑍 ) is ignored because it is the translation of the beam quickly estimated by using the centroid algorithm or other registration methods, are also phase and does not affect the beam quality. The tip and tilt terms (𝑍 and 𝑍 ), which can ignored. The PSF simulation model in this article sets the pupil size to 8.6 mm, the focal be quick length of lythe estimated by lens to 150 using mm, and the centr the defocus oid alto gorithm 8 mm. or other The wavelength registration meth of the optical ods, sour are ce also ignored. The PSF simulation model in this article sets the pupil size to 8.6 mm, the is 532 nm, and to simulate the actual environment, Gaussian white noise with a mean value fo of cal 0 land ength of a variance the lens to of 0.01 150is mm, added and t tohthe e def PSF ocus to . In the 8 mm. The wa case of differ vel ent ength of mode the opti coefﬁcients, cal source is 532 22,000 distorted nm, spot and to simula images arte the a e generated ctual env withiron D/r0 ment, Gaussian wh = 20, 20,000 out-of-focus ite noise with PSFs a are mean value input into o the f 0 self-supervised and a variance of 0.01 is add learning neural ed to the PSF network, and . In the PDLcase o is applied f differ accor ent mode ding to the correlation coefﬁcient between the input and output PSFs. A total of 2000 PSFs were coefficients, 22,000 distorted spot images are generated with D/r0 = 20, 20,000 out-of-focus tested, and the Zernike coefﬁcients were inferred on the basis of the mapping relationship. Figures 4 and 5 shows the test results, and the PSF images are normalized. The image size is 256 256 and the pixel size is 5.5 m. Moreover, Table 1 lists the root mean square errors (RMSEs) obtained from testing and inference. To reﬂect the retrieval effect of PDL more clearly, we compared the results with those of the MobileNet supervised learning method. For coefﬁcients 3–20, the RMSE of the original distortion wave-front is 0.8284. The correlation coefﬁcient of the test set was 97%, and the RMSE of the one-time calibration is 0.0648. In contrast, the supervised corrected wave-plane RMSE is 0.0447. Although the detection accuracy is slightly lower compared with supervised learning, no a priori knowledge of the Zernike coefﬁcients is required, which facilitates practical applications. Photonics 2021, 8, x FOR PEER REVIEW 6 of 11 Photonics 2021, 8, x FOR PEER REVIEW 6 of 11 PSFs are input into the self-supervised learning neural network, and PDL is applied ac- PSFs are input into the self-supervised learning neural network, and PDL is applied ac- cording to the correlation coefficient between the input and output PSFs. A total of 2000 cording to the correlation coefficient between the input and output PSFs. A total of 2000 PSFs were tested, and the Zernike coefficients were inferred on the basis of the mapping PSFs were tested, and the Zernike coefficients were inferred on the basis of the mapping relationship. relationship. Figures 4 and 5 shows the test results, and the PSF images are normalized. The image Figures 4 and 5 shows the test results, and the PSF images are normalized. The image size is 256 × 256 and the pixel size is 5.5 µm. Moreover, Table 1 lists the root mean square size is 256 × 256 and the pixel size is 5.5 µm. Moreover, Table 1 lists the root mean square errors (RMSEs) obtained from testing and inference. To reflect the retrieval effect of PDL errors (RMSEs) obtained from testing and inference. To reflect the retrieval effect of PDL more clearly, we compared the results with those of the MobileNet supervised learning more clearly, we compared the results with those of the MobileNet supervised learning method. For coefficients 3–20, the RMSE of the original distortion wave-front is 0.8284λ. method. For coefficients 3–20, the RMSE of the original distortion wave-front is 0.8284λ. The correlation coefficient of the test set was 97%, and the RMSE of the one-time calibra- The correlation coefficient of the test set was 97%, and the RMSE of the one-time calibra- tion is 0.0648λ. In contrast, the supervised corrected wave-plane RMSE is 0.0447λ. Alt- tion is 0.0648λ. In contrast, the supervised corrected wave-plane RMSE is 0.0447λ. Alt- hough the detection accuracy is slightly lower compared with supervised learning, no a hough the detection accuracy is slightly lower compared with supervised learning, no a priori knowledge of the Zernike coefficients is required, which facilitates practical appli- priori knowledge of the Zernike coefficients is required, which facilitates practical appli- Photonics 2022, 9, 165 6 of 11 cations. cations. Figure Figure 4. 4. Com Comparison parison of the of the Zernike coeffi Zernike coefﬁcient cient between between ori original ginal distributi distributions, ons, supervi supervised sed learning, learning, Figure 4. Comparison of the Zernike coefficient between original distributions, supervised learning, and self-supervised learning. and and se self-supervised lf-supervised l learning. earning. (a) (a) (b) (b) (c) (c) Figure 5. Comparison of the PSF image with 3–20 orders of Zernike polynomials (intensity reversed): (a) original distribution, (b) supervised learning recovery spots, and (c) self-supervised learning recovery spots. (The pixel size is 5.5 m and the image size is 256 256). Table 1. Comparison of accuracies (RMSE) of equivalent wave-fronts from the Zernike coefﬁcients estimated in the experiments between PDL and MobileNet (3–20 orders). Zernike Order RMSE of Testing Set RMSE of MobileNet RMSE of PDL 3–20 0.8284 0.0447 0.0648 To test the generalization ability of the network, 3000 distorted spot images are gen- erated with D/r0 interval 5 from 5 to 30. The network trained before is used to inference the 3000 distorted spot images. Table 2 shows the results. The results show that the generalization ability of the network is good. Photonics 2021, 8, x FOR PEER REVIEW 7 of 11 Figure 5. Comparison of the PSF image with 3–20 orders of Zernike polynomials (intensity re- versed): (a) original distribution, (b) supervised learning recovery spots, and (c) self-supervised learning recovery spots. (The pixel size is 5.5 µm and the image size is 256 × 256). Table 1. Comparison of accuracies (RMSE) of equivalent wave-fronts from the Zernike coefficients estimated in the experiments between PDL and MobileNet (3–20 orders). Zernike Order RMSE of Testing Set RMSE of MobileNet RMSE of PDL 3–20 0.8284λ 0.0447λ 0.0648λ To test the generalization ability of the network, 3000 distorted spot images are gen- erated with D/r0 interval 5 from 5 to 30. The network trained before is used to inference the 3000 distorted spot images. Table 2 shows the results. The results show that the gen- eralization ability of the network is good. Table 2. Comparison of the accuracies (RMSE) of the equivalent wave-fronts from the Zernike co- efficients estimated in the experiments between PDL and MobileNet with different D/r0 (3–20 or- Photonics 2022, 9, 165 7 of 11 der). D/r0 RMSE of Testing Set RMSE of MobileNet RMSE of PDL Table 2. Comparison of the accuracies (RMSE) of the equivalent wave-fronts from the Zernike coefficients 30 1.1991λ 0.1176λ 0.1469λ estimated in the experiments between PDL and MobileNet with different D/r0 (3–20 order). 25 0.9962λ 0.0621λ 0.0762λ D/r0 RMSE of Testing Set RMSE of MobileNet RMSE of PDL 20 0.8606λ 0.0462λ 0.0634λ 30 1.1991 0.1176 0.1469 15 0.6651λ 0.0396λ 0.0519λ 25 0.9962 0.0621 0.0762 10 0. 20 0.86064641 λ 0. 0.046202 76λ 0. 0.06340352λ 15 0.6651 0.0396 0.0519 5 0.2615λ 0.0232λ 0.0256λ 10 0.4641 0.0276 0.0352 5 0.2615 0.0232 0.0256 3.2. Simulation Demonstration of 3–64 Orders of Zernike Polynomials 3.2. Simulation Demonstration of 3–64 Orders of Zernike Polynomials The influence of higher-order wave-front aberrations on the network performance is The inﬂuence of higher-order wave-front aberrations on the network performance is considered as follows: the 3–64 Zernike order is used to train the network; the RMSE of considered as follows: the 3–64 Zernike order is used to train the network; the RMSE of the original distortion wave-front is 0.8852λ; and the corrected wave front RMSE of PDL the original distortion wave-front is 0.8852; and the corrected wave front RMSE of PDL is 0.1274λ and the RMSE of MobileNet is 0.1069λ. Figure 6, Tables 3 and 4 shows the test is 0.1274 and the RMSE of MobileNet is 0.1069. Figure 6, Tables 3 and 4 shows the test results, and the PSF images are normalized. The image size is 256 × 256 and the pixel size results, and the PSF images are normalized. The image size is 256 256 and the pixel size is 5.5 µm. The results show that the performance of the self-supervised network is also is 5.5 m. The results show that the performance of the self-supervised network is also close to the close to the supervised net supervised network work in in the the h high-or igh-or derder c case.ase. (a) (b) (c) Figure 6. Comparison of the PSF image with 3–64 orders of Zernike polynomials (intensity reversed): (a) original distribution, (b) self-supervised learning recovery spots, and (c) the correction spot. (The pixel size is 5.5 m and the image size is 256 256). Table 3. Comparison of accuracies (RMSE) of equivalent wave-fronts from the Zernike coefﬁcients estimated in the experiments between PDL and MobileNet (3–64 orders). Zernike Order RMSE of Testing Set RMSE of MobileNet RMSE of PDL 3–64 0.8852 0.1069 0.1274 Photonics 2022, 9, 165 8 of 11 Table 4. Comparison of the accuracies (RMSE) of the equivalent wave-fronts from the Zernike coefficients estimated in the experiments between PDL and MobileNet with different D/r0 (3–64 orders). D/r0 RMSE of Testing Set RMSE of MobileNet RMSE of PDL 30 1.2021 0.3052 0.3487 25 1.0438 0.1887 0.2136 20 0.8784 0.1084 0.1255 15 0.676 0.0857 0.0984 10 0.4864 0.0671 0.0696 5 0.2715 0.0610 0.0611 4. Experimental Demonstration Figure 7 shows the optical platform of the phase retrieval experiment, which used a liquid crystal spatial light modulator (LCSLM) to simulate the PSF. A 532 nm laser served as a light source. A collimator expanded the beam, which was normal to the incident aperture (A), polarizer (P), and the LCSLM. The size of the aperture was limited to 8.6 mm to ensure that the incident light was evenly distributed over the center of the LCSLM. The BS changed the direction of the light beam modulated by the LCSLM. Phase distortion was simulated by loading different grayscale phase screens onto the LCSLM. Due to a small gap between the pixels, higher diffraction orders and a high-energy zero-order spot exist at the output. An X tilt was added to displace the spot from the central zero-order spot. The modulated light was imaged through the lens (L) before impinging on the CCD. The effective resolution of the CCD acquisition window was 256 256, which was compressed to 128 128 for training and testing. To ensure the detection accuracy of the neural network and the range of the CCD ﬁeld of view, we set the CCD defocus to 8 mm. To ensure clear imaging with the CCD, a suitable attenuator was positioned in front of the CCD. The model pluto-2-NIR-011LCSLM used in this experiment was produced by HoloEye and has a pixel resolution of 1920 1080, with 8 m pixels. As with the simulations, we trained with 20,000 experimental images and tested with 2000 PSFs. The overall tilt of the experiment used the Zernike coefﬁcients of order 1–20 to produce a test set correlation coefﬁcient of 95%. Figure 8 shows the calculation results obtained by inputting the collected light spot into the experimental training model. The image size is 256 256 and the pixel size is 5.5 m. The light spots are similarly shaped. A Jetson TX2 from NVIDIA was used for Photonics 2021, 8, x FOR PEER REVIEW 9 of 11 inference calculations. After using tensor optimization on the TX2, the single-inference time is 3.72 ms, which is adequate for real-time performance. BS P A Laser CCD Slide Guide Figure 7. Experimental diagram of the physical deep learning wave-front sensor. C = collimator; A Figure 7. Experimental diagram of the physical deep learning wave-front sensor. C = collimator; = aperture; P = polarizer; BS = beam splitter; LCSLM = liquid crystal spatial light modulator; and L A = aperture; P = polarizer; BS = beam splitter; LCSLM = liquid crystal spatial light modulator; and = lens; CCD = charge-coupled device. L = lens; CCD = charge-coupled device. (a) (b) (c) Figure 8. Experimentally acquired PSF compared with the PDL recovery with Zernike coefficients 4–20 (intensity reversed). (a) PSF collected experimentally; (b) simulation PSF; and (c) restored PSF by simulation. (The pixel size is 5.5 µm and the image size is 256 × 256). 5. Discussion The experimental results are the same as we expected. The detection accuracy of the self-supervised deep learning model proposed in this paper is slightly inferior to the su- pervised deep learning model. However, the self-supervised model reduces the difficulty of sample acquisition, so it is easier to apply in practice. The supervised model is related to the method proposed in reference [22]. We made some adjustments to the method, ac- cording to the new application scenario. Due to the change of atmosphere turbulence in- tensity, the result of the supervised model is different from those in reference [22]. Compared to other self-supervised models based on the optical imaging system men- tioned in the Introduction, we proposed a new network model. Additionally, a new loss function, the correlation coefficient, was used to train the network with batch-size in prac- tice. Training with batch-size presents our network with a high detection accuracy and good generalization ability. However, the training of the method needs to know the optical parameters, such as optical aperture, the focal length of lens, etc. In most application scenarios, these parame- ters are known. LCSLM Photonics 2021, 8, x FOR PEER REVIEW 9 of 11 BS A Laser CCD Slide Guide Figure 7. Experimental diagram of the physical deep learning wave-front sensor. C = collimator; A Photonics 2022, 9, 165 9 of 11 = aperture; P = polarizer; BS = beam splitter; LCSLM = liquid crystal spatial light modulator; and L = lens; CCD = charge-coupled device. (a) (b) (c) Figure 8. Experimentally acquired PSF compared with the PDL recovery with Zernike coefﬁcients Figure 8. Experimentally acquired PSF compared with the PDL recovery with Zernike coefficients 4–20 (intensity reversed). (a) PSF collected experimentally; (b) simulation PSF; and (c) restored PSF 4–20 (intensity reversed). (a) PSF collected experimentally; (b) simulation PSF; and (c) restored PSF by simulation. (The pixel size is 5.5 m and the image size is 256 256). by simulation. (The pixel size is 5.5 µm and the image size is 256 × 256). 5. Discussion 5. Discussion The experimental results are the same as we expected. The detection accuracy of The experimental results are the same as we expected. The detection accuracy of the the self-supervised deep learning model proposed in this paper is slightly inferior to the self-supervised deep learning model proposed in this paper is slightly inferior to the su- supervised deep learning model. However, the self-supervised model reduces the difﬁculty pervised deep learning model. However, the self-supervised model reduces the difficulty of sample acquisition, so it is easier to apply in practice. The supervised model is related of sample acquisition, so it is easier to apply in practice. The supervised model is related to the method proposed in reference [22]. We made some adjustments to the method, to the method proposed in reference [22]. We made some adjustments to the method, ac- according to the new application scenario. Due to the change of atmosphere turbulence cording to the new application scenario. Due to the change of atmosphere turbulence in- intensity, the result of the supervised model is different from those in reference [22]. tensity, the result of the supervised model is different from those in reference [22]. Compared to other self-supervised models based on the optical imaging system Compared to other self-supervised models based on the optical imaging system men- mentioned in the Introduction, we proposed a new network model. Additionally, a new tioned in the Introduction, we proposed a new network model. Additionally, a new loss loss function, the correlation coefﬁcient, was used to train the network with batch-size in function, the correlation coefficient, was used to train the network with batch-size in prac- practice. Training with batch-size presents our network with a high detection accuracy and tice. Training with batch-size presents our network with a high detection accuracy and good generalization ability. good generalization ability. However, the training of the method needs to know the optical parameters, such as However, the training of the method needs to know the optical parameters, such as optical aperture, the focal length of lens, etc. In most application scenarios, these parameters optical aperture, the focal length of lens, etc. In most application scenarios, these parame- are known. ters are known. 6. Conclusions Herein, we proposed a self-supervised model that retrieves the phase without labels. The model is generalizable, can be used with different networks, and can solve different problems. In addition, for reasoning, it does not affect the time consumption of the network. The model is suitable for practical applications of deep learning for phase detection and other optical problems. Although the detection accuracy is slightly less than that provided by supervised learning, no a priori knowledge of the Zernike coefﬁcients is required, which facilitates practical applications. In future work, we will validate the effectiveness of PDL in satellite-to-ground laser communication. Additionally, we will use different loss functions and train them to estimate other terms. PDL will be adapted to different wavelengths, apertures, and focal lengths, in order to directly measure spots of different wavelengths, apertures, and focal lengths without retraining or changing the models. LCSLM Photonics 2022, 9, 165 10 of 11 Author Contributions: Conceptualization, Y.X.; methodology, Y.X.; software, Y.X.; validation, Y.X., H.G. and Z.W.; formal analysis, Y.X.; investigation, Y.X.; resources, D.H.; data curation, Z.W.; writing— original draft preparation, Y.X. and H.G.; writing—review and editing, Y.X., H.G. and D.H.; visual- ization, Z.W.; supervision, Y.H.; project administration, Y.H.; funding acquisition, Y.T. and Y.H. All authors have read and agreed to the published version of the manuscript. Funding: The National Key Research and Development Program of China (2017YFB11030002). Institutional Review Board Statement: This study did not involve humans or animals. Informed Consent Statement: Not applicable. Data Availability Statement: Data underlying the results presented in this paper are not publicly available at this time, but may be obtained from the authors upon reasonable request. Conﬂicts of Interest: The authors declare no conﬂict of interest. References 1. Platt, B.C.; Shack, R. History and principles of Shack-Hartmann wavefront sensing. J. Refract. Surg. 1995, 17, 573–577. [CrossRef] [PubMed] 2. Vargas, J.; González-Fernandez, L.; Quiroga, J.A.; Belenguer, T. Calibration of a Shack-Hartmann wavefront sensor as an orthographic camera. Opt. Lett. 2010, 35, 1762–1764. [CrossRef] [PubMed] 3. Gonsalves, R.A. Phase retrieval and diversity in adaptive optics. Opt. Eng. 1982, 21, 829–832. [CrossRef] 4. Nugent, K.A. The measurement of phase through the propagation of intensity: An introduction. Contemp. Phys. 2011, 52, 55–69. [CrossRef] 5. Misell, D.L. An examination of an iterative method for the solution of the phase problem in optics and electronoptics: I. Test calculations. J. Phys. D Appl. Phys. 1973, 6, 2200–2216. [CrossRef] 6. Fienup, J.R. Phase-retrieval algorithms for a complicated optical system. Appl. Opt. 1993, 32, 1737–1746. [CrossRef] 7. Allen, L.J.; Oxley, M.P. Phase retrieval from series of images obtained by defocus variation. Opt. Commun. 2001, 199, 65–75. [CrossRef] 8. Carrano, C.J.; Olivier, S.S.; Brase, J.M.; Macintosh, B.A.; An, J.R. Phase retrieval techniques for adaptive optics. Adapt. Opt. Syst. Technol. 1998, 3353, 658–667. 9. Gerchberg, R.W.; Saxton, W.O. A practical algorithm for the determination of phase from image and diffraction plane pictures. Optik 1972, 35, 237–246. 10. Yang, G.Z.; Dong, B.Z.; Gu, B.Y.; Zhuang, J.Y.; Ersoy, O.K. Gerchberg–Saxton and Yang–Gu algorithms for phase retrieval in a nonunitary transform system: A comparison. Appl. Opt. 1994, 33, 209–218. [CrossRef] 11. Hagan, M.T.; Beale, M. Neural Network Design; China Machine Press: Beijing, China, 2002. 12. Mello, A.T.; Kanaan, A.; Guzman, D.; Guesalaga, A. Artiﬁcial neural networks for centroiding elongated spots in Shack-Hartmann wave-front sensors. Mon. Not. R. Astron. Soc. 2014, 440, 2781–2790. [CrossRef] 13. Guo, H.J.; Xin, Q.; Hong, C.M.; Chang, X.Y. Feature-based phase retrieval wave front sensing approach using machine learning. Opt. Express 2018, 26. 14. Fienup, J.R.; Marron, J.C.; Schulz, T.J.; Seldin, J.H. Hubble Space Telescope characterized by using phase-retrieval algorithms. Appl. Opt. 1993, 32, 1747–1767. [CrossRef] [PubMed] 15. Roddier, C.; Roddier, F. Wave-front reconstruction from defocused images and the testing of ground-based optical telescopes. J. Opt. Soc. Am. A 1993, 10, 2277–2287. [CrossRef] 16. Redding, D.; Dumont, P.; Yu, J. Hubble Space Telescope prescription retrieval. Appl. Opt. 1993, 32, 1728–1736. [CrossRef] 17. Shin, H.C.; Roth, H.R.; Gao, M.; Lu, L.; Xu, Z.; Nogues, I.; Yao, J.; Mollura, D.; Summers, R.M. Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Trans. Med. Imaging 2016, 35, 1285–1298. [CrossRef] 18. Goodfellow, I.; Bengio, Y.; Courville, A. Learning D[M]; MIT Press: Cambridge, MA, USA, 2016. 19. Gu, J.; Wang, Z.; Kuen, J.; Ma, L.; Shahroudy, A.; Shuai, B.; Liu, T.; Wang, X.; Wang, G.; Cai, J.; et al. Recent advances in convolutional neural networks. Pattern Recognit. 2018, 77, 354–377. [CrossRef] 20. Paine, S.W.; Fienup, J.R. Machine learning for improved image-based wavefront sensing. Opt. Lett. 2018, 43, 1235–1238. [CrossRef] 21. Nishizaki, Y.; Valdivia, M.; Horisaki, R. Deep learning wave front sensing. Opt. Express 2019, 27, 240–251. [CrossRef] 22. Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 640–651. 23. Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. arXiv 2015, arXiv:1505.04597. 24. Swanson, R.; Lamb, M.; Correia, C.; Sivanandam, S.; Kutulakos, K. Wave-front reconstruction and prediction with convolutional neural networks. Adapt. Opt. Syst. VI 2018, 10703, 107031F. Photonics 2022, 9, 165 11 of 11 25. Dubose, T.B.; Gardner, D.F.; Watnik, A.T. Intensity-enhanced deep network wave-front reconstruction in Shack Hartmann sensors. Opt. Lett. 2020, 45, 1699–1702. [CrossRef] 26. Hu, L.J.; Hu, S.W.; Gong, W.; Si, K. Deep learning assisted Shack-Hartmann wave-front sensor for direct wave-front detection. Opt. Lett. 2020, 45, 3741–3744. [CrossRef] [PubMed] 27. Fei, W.; Bian, Y.; Wang, H.; Lyu, M.; Pedrini, G.; Osten, W.; Barbastathis, G.; Situ, G. Phase imaging with an untrained neural network. Light Sci. Appl. 2020, 9, 77. 28. Bostan, E.; Heckel, R.; Chen, M.; Kellman, M.; Waller, L. Deep phase decoder: Self-calibrating phase microscopy with an untrained deep neural network. Optica 2020, 7, 559. [CrossRef] 29. Ramos, A.A.; Olspert, N. Learning to do multiframe wave front sensing unsupervised: Applications to blind deconvolution. Astron. Astrophys. 2021, 646, A100. [CrossRef] 30. Liaudat, T.; Starck, J.; Kilbinger, M.; Frugier, P. Rethinking the modeling of the instrumental response of telescopes with a differentiable optical model. arXiv 2021, arXiv:2111.12541. 31. Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efﬁcient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. 32. Wen, Z. Photon Foundation; Zhejiang University Press: Hangzhou, China, 2000.

Photonics – Multidisciplinary Digital Publishing Institute

**Published: ** Mar 9, 2022

**Keywords: **self-supervised learning; free-space optical communication; phase retrieval

Loading...

You can share this free article with as many people as you like with the url below! We hope you enjoy this feature!

Read and print from thousands of top scholarly journals.

System error. Please try again!

Already have an account? Log in

Bookmark this article. You can see your Bookmarks on your DeepDyve Library.

To save an article, **log in** first, or **sign up** for a DeepDyve account if you don’t already have one.

Copy and paste the desired citation format or use the link below to download a file formatted for EndNote

Access the full text.

Sign up today, get DeepDyve free for 14 days.

All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.