## Abstract

Free space optical communications utilizing orbital angular momentum beams have recently emerged as a new technique for communications with potential for increased channel capacity. Turbulence due to changes in the index of refraction emanating from temperature, humidity, and air flow patterns, however, add nonlinear effects to the received patterns, thus making the demultiplexing task more difficult. Deep learning techniques have been previously been applied to solve the demultiplexing problem as an image classification task. Here we make use of a newly developed theory suggesting a link between image turbulence and photon transport through the continuity equation to describe a method that utilizes a “shallow” learning method instead. The decoding technique is tested and compared against previous approaches using deep convolutional neural networks. Results show that the new method can obtain similar classification accuracies (bit error ratio) at a small fraction (1/90) of the computational cost, thus enabling higher bit rates.

© 2018 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

## 1. Introduction

To accommodate the ever increasing rates at which data are generated and transmitted, the development of new information encoding schemes remains an active area of research in optical communications. A typical strategy is to encode information on as many independent data channels as possible thereby allowing both fast transmission and easy separation at the receiver. In optical communications, information can be encoded on different wavelengths [1, 2], states of polarization [3], or even spatially distributed channels [4]. Recently it has been suggested that different states of orbital angular momentum (OAM) could be used to encode information on different orthogonal, spatial modes, each one generated by altering the helical phasefront of a propagating beam [5]. OAM is a property of a coherent light beam that arises from the azimuthal components of linear momentum acting at the radius of the beam with a dependency of exp(*imθ*). The parameter, *m* ∈ ℤ, is the topological charge or mode number and indicates that there is a theoretically infinite number of modes possible. To achieve even higher bandwidths, one can combine multiple approaches. In fact, one recent study combined wavelength, polarization, and OAM encoding schemes to demonstrate > 100 Tera-bit/s data rates [6].

Despite these successes, challenges remain, particularly in free-space implementations where atmospheric turbulence can significantly degrade performance. In the case of OAM multiplexing, turbulence is known to corrupt the received spatial patterns, thereby causing “cross-talk” among the received modes [7]. Possible turbulence mitigation schemes include adaptive optics (analog) and image processing (digital) approaches. In the former, one sends a probe beam to sense the distortion and then correct for its influence using a wavefront corrector [8] while in the latter, a classification algorithm is trained to recognize the turbulence-corrupted modes [9–12].

In this work, we also consider the atmospheric perturbations to the OAM modes and how they might be mitigated using image processing. Our proposed method models the process by which an electro-magnetic (EM) field is altered by fluctuations in refractive index of the atmosphere and then uses those models to perform classification. Specifically, the model minimizes the work required to transform a “clean” mode into a corrupted mode and it has previously been demonstrated to more accurately describe the physics of image propagation in a turbulent medium than phenomenological models such as “optical flow” [13]. The proposed decoding scheme utilizes this model on subsequently acquired, corrupted imagery. By removing part of the burden of “learning” from the classifier, but rather supplying the classifier with a reduced set of possibilities from a physical model, we significantly reduce the computational burden associated with de-multiplexing the received bit patterns.

The main contribution of this work is therefore to introduce optimal transport as a modeling technique for distinguishing among turbulence-corrupted modes, and then demonstrate improved performance over a previously used machine learning approach. In what follows, we review OAM, the associated communications scheme (see section 2), and then briefly discuss transport-based modeling in section 3. We then show how these models can be used to design simple pattern recognition algorithms for classifying vortex patterns. Experimental results using real and simulated data are shown and compared to previous state of the art.

## 2. Background and prior art

#### Orbital angular momentum

A monochromatic, electromagnetic wave propagating in the *z* direction will be modeled as the vector $\mathbf{E}\left(\overrightarrow{x},z\right)=\mathbf{U}\left(\overrightarrow{x},z\right){e}^{i\omega t}$ where *ω* = 2*πc*/*λ* = *k*_{0}*c* is the temporal frequency of oscillation, *c* is the speed of light, *λ*_{0} is the wavelength, and *k*_{0} the wavenumber, all defined in vacuum.

If the medium through which the field is propagating is homogeneous (or nearly so) the amplitude of the electric field is known to be governed by the Helmholtz Eq.

where $U\left(\overrightarrow{x},z\right)$ is a complex, scalar component of the full amplitude vector, and encodes both a magnitude and phase at every location in space. The modified wavenumber $k={k}_{0}n\left(\overrightarrow{x},z\right)$ will vary according to the index of refraction of the medium, $n\left(\overrightarrow{x},z\right)$, as the wavelength becomes altered. The coordinates $\overrightarrow{x}$ will be used to define the plane transverse to the direction of propagation, and can be used to denote Cartesian, polar, or cylindrical components depending on the type of solution to (1) being sought.With the standard paraxial assumption (second derivatives in *z* vanish) Eq. (1) admits a number of different solutions that carry OAM, among them: Laguerre-Gauss, Bessel-Gauss, Ince-Gauss, Hermite-Gauss, Mathieu-Gauss, beams. For example, making a circular symmetric assumption on the solution and using circular cylindrical coordinates, $\overrightarrow{x}\equiv \left(r,\theta \right)$, produces Bessel beams, an OAM carrying beam with spiral phase distribution exp(*imθ*). Bessel beams are described as:

*J*is the order

_{m}*m*Bessel function of the first kind,

*k*is the radial frequency defined by $k=\sqrt{{k}_{z}^{2}+{k}_{r}^{2}}=2\pi /\lambda $ and

_{r}*C*is a constant.

_{b}*m*defines the OAM mode number of the beam, which can be any integer, and is physically manifested in beam as

*m*2

*π*wraps in phase space. Implementing a system with such Bessel beam formulation would be impossible due to the infinite amount of energy required. Gaussian-tapered Bessel beams (BGB) can approximate ideal Bessel beam for a finite distance (pseudo diffraction-free beam) instead. The BGB can be modeled as:

*C*is a constant,

_{bg}*ζ*(

*z*) = tan

^{−1}(

*z*/

*z*) is the Gouy phase, $w\left(z\right)={w}_{0}\sqrt{1+\left(z/{z}_{R}\right)}$ is the beam radius,

_{R}*w*

_{0}is the beam waist,

*R*(

*z*) = [1 + (

*z*/

_{R}*z*)

^{2}] is the radius of curvature, and ${z}_{R}=\pi {w}_{0}^{2}/\lambda $ is the Rayleigh range. It is important to note that the OAM phase profile, expressed as exp(

*imθ*), is what allows BGB to be orthogonal when the OAM modes are different, i.e.,

#### OAM in communications

To implement a communication scheme based on OAM modes, one requires both a means of generating multiplexed modes as well as a procedure for demultiplexing the modes at the receiver. Succinctly, the emitter sends a spatial pattern corresponding to a bit sequence through the open air channel and the task at the receiver end is to determine which pattern (bit sequence) was sent based on the received intensity. At the receiver, each OAM mode appears as an annular ring of different diameter; adding (multiplexing) modes results in more complicated spatial patterns. Not surprisingly, these spatial patterns are altered by atmospheric disturbances (e.g., turbulence) and the received signal becomes more difficult to demultiplex (for example, see Fig. 1). In particular, the presence of turbulence can cause energy from one mode to be placed in another, thereby destroying the orthogonality property thus making methods that depend on it less effective.

Several approaches for OAM communications are summarized in Willner *et al.* [7]. Methods based on conjugate mode sorting [14], which rely on the orthogonality of OAM modes, are highly desirable given they yield simple, efficient, and highly optimal decoding schemes. These methods can range from more simple serial application of multiple volume holograms with individual channel detectors [15] to more complicated holograms which can measure the incoming beam against several possible OAM modes at once [16]. Other popular techniques include: counting spiral fringes [17], optical transformations [18], measuring the Doppler effect [19], dove prism interferometers [20]. In the presence of atmospheric turbulence, however, these become difficult to implement given the effects of turbulence on the orthogonality of the modes. Attempts to mitigate the effect of turbulence have been made, including using adaptive optics to correct the phase of the distorted OAM beams [8] and using DSP based MIMO channel equalizer [21]. Finally, instead of trying to reverse the effect of turbulence, machine learning based approaches [9, 11] have been investigated to directly identify the multiplexed beam patterns. To achieve this, OAM carrying beams were recorded by a CCD camera, and the classification system is trained on the set of sample images. Using a shallow neural network, Krenn et al. were able to discriminate between 16 OAM modes following transmission through 3km of turbulent city atmosphere [9]. This work was expanded by Knutson et al., who developed a deep neural network with 3 hidden layers to identify OAM beams under low SNR conditions [10]. Moreover, Knutson et al. also used a 16 layer convolutional neural network to simulataneously classify 110 different OAM states with an accuracy of 74%. This result was confirmed in [12], where Li et al. showed that convolutional neural networks (CNN) can outperform both K-nearest neighbors and artificial neural network classfiers. Furthermore, Doster et al. demonstrated that a CNN is able to demultiplex combinatorially multiplexed OAM modes for high levels of turbulence [11].

## 3. Beam propagation and image formation model

The demultiplexing strategy described below is based on a recently proposed model for electromagnetic wave propagation in atmospheric turbulence by Nichols et al. [13]. Nichols et al. [13] formulated a link between the wave propagation under turbulence and the theory of optimal transport. Below we review the transport based image formation model proposed in [13], and describe how it can be used for deriving a suitable approach for demultiplexing OAM patterns.

Write the complex amplitude in terms of a magnitude and phase, $U\left(\overrightarrow{x},z\right)={\rho}^{1/2}\left(\overrightarrow{x},z\right){e}^{-i\left({k}_{0}z+\varphi \left(\overrightarrow{x},z\right)/2\right)}$ [22, 23], and let the refractive index of the turbulent medium be described by ${n}^{2}\left(\overrightarrow{x},z\right)=1+\eta \left(\overrightarrow{x},z\right)$. We therefore have a plane wave traveling through a medium with an average refractive index of unity (air), but subject to turbulence-induced perturbations governed by $\eta \left(\overrightarrow{x},z\right)\ll 1$. The quantity $\rho \left(\overrightarrow{x},z\right)$ denotes the associated image intensity and $\varphi \left(\overrightarrow{x},z\right)$ captures the phase fluctuations induced by the turbulent atmosphere as well as the BGB phase. Note, this is the same basic description as for the BGB, only now traveling through a spatially inhomogeneous medium where both magnitude and phase are varying. The plane transverse to the direction of propagation is now given in Cartesian coordinates by $\overrightarrow{x}\equiv \left({x}_{1},{x}_{2}\right)$, while *z* is again the direction of propagation (see Fig. 2).

In [13] it was shown that, based on this description, we can view image intensity (or photon density) as ‘mass’ that is being moved in the transverse plane with velocity $v\left(\overrightarrow{x},z\right)={\nabla}_{X}\varphi \left(\overrightarrow{x},z\right)$ according to

The reason for casting the problem in this form is that it allows us to leverage both mathematical and algorithmic advances in the field of “optimal transport” to accurately estimate solutions to Eqs. (2a) and (2b) given only a pair of clean/corrupted images. These estimated solutions can then be used to accurately classify BGB modes given only a corrupted intensity measurement (i.e., a corrupted image) as will be shown in section 4.

To formulate the optimal transport problem, we first note that given continuity of image intensity Eq. (2a) we can express the effects of turbulence on the traveling photon density $\rho \left(\overrightarrow{x},z\right)$ as ${\rho}_{0}\left(\overrightarrow{x}\right)={D}_{f}\left(\overrightarrow{x}\right){\rho}_{1}\left(f\left(\overrightarrow{x}\right)\right)$, with $f\left(\overrightarrow{x}\right)={\displaystyle {\int}_{0}^{Z}v\left(\overrightarrow{x},z\right)dz}$, $f\left(\overrightarrow{x}\right)=\overrightarrow{x}-u\left(\overrightarrow{x},Z\right)$,and ${D}_{f}\left(\overrightarrow{x}\right)$ denoting the determinant of the Jacobian of the (transverse) spatial transformation $f\left(\overrightarrow{x}\right)$.

Understanding that photon density (image intensity) propagates in turbulence in a manner compatible with the continuity Eq. (2a) is interesting, but without further information is not sufficient for recovering neither $v\left(\overrightarrow{x},z\right)$ nor $f\left(\overrightarrow{x}\right)$. Thus in [13], we proposed to incorporate a least action principle into the modeling approach in order to uniquely identify *v* and *f*. The proposed action is related to the system energy. In this case it makes sense to consider the system energy

Thus we have that minimizing Eq. (3) yields an approximate solution to an electro-magnetic field propagating through a turbulent atmosphere. Importantly, this action minimization can also be treated as an optimal transport problem [24]

In summary, the action minimization expressed in Eq. (3), or alternatively in Eq. (4) provides us with a means to approximate the displacements $u\left(\overrightarrow{x}\right)=\overrightarrow{x}-f\left(\overrightarrow{x}\right)$ that are responsible for the spatial distortions depicted graphically in Fig. 1 and Fig. 2. Importantly, the estimated model is a solution to the physics governing the relationship between clean and corrupted bit patterns. Rather than asking a deep CNN to try and learn this relationship, and classify accordingly, we instead supply this relationship (in the form of the estimated model) to a shallow CNN. By removing the burden of learning the spatial deformation caused by turbulence from CNN, we can achieve the same classification performance as for the deep CNN with only a shallow (fast) classifier.

## 4. Algorithm design for demultiplexing OAM beams

Let ${\rho}_{m}\left(\overrightarrow{x}\right)$ represent an ‘uncorrupted’ pattern encoding a particular bit sequence. For example, to represent a 5-bit sequence, 5 OAM modes are used and multiplexed, creating 32 distinct patterns where *m* will range from *m* = 1,…, 32. The analysis shown above can be used to construct a model for observed OAM patterns at the receiver end of an optical communication link. Let $f\left(\overrightarrow{x}\right)$ represent the intensity transport map due to the presence of turbulence in the channel. The received pattern for OAM model *m* can be modeled as ${\tilde{\rho}}_{m}\left(\overrightarrow{x}\right)={D}_{f}\left(\overrightarrow{x}\right){\rho}_{m}\left(f\left(\overrightarrow{x}\right)\right)$. We can then consider the set of possible maps $f\in \mathbb{F}$ being generated by the turbulent channel as a ‘confound’ (e.g. translation, scaling, etc.). We thus consider all sets of observed, degraded, OAM patterns ${\tilde{\rho}}_{m}$ for mode *m* as the set

Our objective is to make use of this model in designing a pattern recognition system for recognizing vortex patterns. To that end we employ the formalism of the cumulative distribution transform [26] and Radon cumulative distribution transform for 2D images [27]. More specifically, we utilize the ‘linear separability’ theorem accompanying these transforms to design pattern recognition systems that are dramatically simpler than state of the art deep learning solutions, but maintain similar accuracy (see Fig. 3). The idea is to utilize these transforms, more specifically the Radon cumulative distribution transform (R-CDT), as a pre-processing step to a ‘shallow’ (one layer) CNN classifier in comparison to the recent trend utilizing the deep architecture [28]. We begin by describing the one dimensional (1d) and link it to the image formation model described in section 3. We then extend the analysis to 2 dimensions (2d) followed by the description of the entire algorithm for OAM beam classification.

#### 1d analysis: The cumulative distribution transform (CDT)

We first consider the 1d case where $\overrightarrow{x}=x$, and the model for an observed pattern is simply $\tilde{\rho}\left(x\right)$. The goal of the pattern recognition system we describe is to determine which one of the 32 classes ${\mathbb{P}}_{m}$ the pattern *ρ* belongs to: $\tilde{\rho}\in {\mathbb{P}}_{1}$, or $\tilde{\rho}\in {\mathbb{P}}_{2},\dots ,{\mathbb{P}}_{32}$. Noting that the received patterns satisfy $\tilde{\rho}\left(x\right)\ge 0$, for a fixed channel, we can normalize it such that ${\int}_{\mathrm{\Omega}}\tilde{\rho}\left(x\right)dx=1$ (where Ω is the extent of the field of view of the observed pattern). The cumulative distribution transform (CDT) [26] of $\tilde{\rho}\left(x\right)$ is given by:

*I*

_{0}is a chosen ‘reference’ pattern (also normalized so that ${\int}_{\mathrm{\Omega}}{I}_{0}\left(x\right)dx=1$), and

As described in detail in [26], the CDT is a nonlinear and invertible operation, with the inverse being *h*^{−1}(*x*)*I*_{0}(*h*^{−1}(*x*)), and *h*^{−1}(*h*(*x*)) = *x*. Furthermore, Park et al. [26] demonstrated that the CDT facilitates certain pattern recognition problems. More precisely, if certain mathematical conditions are met, the CDT can transform non-linearly separable classes of signals into linearly separable sets (see Fig. 4 for an illustration). These conditions are reviewed below.

Let $\mathbb{F}$ be a set of 1D maps as discussed above and consider the binary detection case where only two OAM patterns are used (${\mathbb{P}}_{1}$ and ${\mathbb{P}}_{2}$). If there exists no $h\in \mathbb{F}$ for which *ρ*(*x*) = *h*′(*x*)(*ρ*_{2}(*h*)(*x*))) then the sets ${\mathbb{P}}_{1}$ and ${\mathbb{P}}_{2}$ are disjoint but not necessarily linearly separable in signal space. A main result of [26] states that the signal classes ${\mathbb{P}}_{1}$ and ${\mathbb{P}}_{2}$ are guaranteed to be linearly separable in transform space (regardless of the choice of the reference signal *I*_{0}) if $\mathbb{F}$ satisfies the following conditions,

- $h\in \mathbb{F}\iff {h}^{-1}\in \mathbb{F}$
*h*_{1}, ${h}_{2}\in \mathbb{F}\Rightarrow \gamma {h}_{1}+\left(1-\gamma \right){h}_{2}\in \mathbb{F}$, ∀*γ*∈ [0, 1]*h*_{1}, ${h}_{2}\in \mathbb{F}\Rightarrow {h}_{1}\circ {h}_{2}$, ${h}_{2}\circ {h}_{1}\in \mathbb{F}$*h*′(*ρ*∘_{i}*h*) ≠*ρ*, $\forall h\in \mathbb{F}$,_{j}*i*≠*j*

The set of translations $\mathbb{F}=\left\{f|f\left(x\right)=x+\tau ,\tau \in \mathbb{R}\right\}$, and scaling functions $\mathbb{F}=\left\{f|f\left(x\right)=ax,a\in {\mathbb{R}}^{+}\right\}$, for instance, satisfy the conditions above. In short, if one can assume the conditions above on the confound class $\mathbb{F}$ imparted by the channel on the OAM bit encoding pattern, then the CDT (in one dimension) will allow one to build a decoder out of a linear classification method by rendering the received patterns linearly separable in transform space.

#### 2d analysis: The Radon-cumulative distribution transform (R-CDT)

The OAM patterns received at the decoder end are essentially 2D images which must be correctly classified. The CDT framework was extended to 2D patterns (images normalized density functions) through the sliced-Wasserstein distance in [27], and was denoted as the R-CDT. It is shown in [27] that similar characteristics of the CDT, including the linear separation property, also hold for the R-CDT. Briefly, we use the standard Radon transform of an image $\tilde{\rho}$ (an image of a potentially corrupted pattern) which we denote by $\mathcal{R}\left(\tilde{\rho}\right)$, is defined as:

*t*is the perpendicular distance of a line from the origin and

*θ*is the angle over which the projection is taken. The main idea behind the R-CDT is then to apply the CDT over the

*t*dimension in $\Re \left(\tilde{\rho}\right)\left(t,\theta \right)$. Thus the forward R-CDT is defined as:

As with the case of the CDT presented earlier, the R-CDT is also invertible (see [27]). In addition, a similar linear separation theorem applies, with the difference being that the confound class is applied along the *t* dimension of $\mathcal{R}\left(\tilde{\rho}\right)\left(t,\theta \right)$. Under the same assumptions outlined for the CDT case, the data in R-CDT space will be linearly separable for any given reference *I*_{0}. We thus utilize the R-CDT as a pre-processing step, to enhance the ability of a ‘shallow’ CNN classifier to regress the right label (bit pattern) for a set of training OAM patterns. Fig. 5 and Fig. 6 display all 32 uncorrupted OAM patterns and their respective R-CDT. Note that each OAM mode here corresponds to a unique bit string, and thus a decoder at the detector end functions as a OAM pattern classifier.

#### Decoding vortex patterns with the R-CDT and a ‘shallow’ CNN

A Convolutional Neural Network is a supervised learning method for finding a non-linear mapping between input and output pairs $\left(\tilde{\rho},\mathit{\ell}\right)$, where *ℓ* denotes the true class for the input $\tilde{\rho}$. The mapping can be represented with a composition of functions:

Precisely, *a*_{(}_{i}_{)} denotes the *i ^{th}* non-linear composition function and

**y**

^{(}

^{i}^{)}denotes its output. Function

*a*

_{(}

_{i}_{)}typically consist of a linear operations such as convolution and downsampling followed by non-linear operations such as max-pooling, batch normalization, and rectified unit activations (ReLu). Network architectures can vary greatly depending on the number of unitary composition function as well as specific parameters for linear and non linear operations. Because of this versatility, CNN produces complex non-linear mapping that is suitable for classifying classes with non-linear decision boundary.

Fig. 7 illustrates the ‘shallow’ CNN network that we used to decode the vortex patterns. Our ‘shallow’ network consists of one convolution layer followed by downsampling and ReLu operation. For our 32-class classification problem, we chose the final layer *a _{L}* to be a

*Softmax*function, and the output ${\mathbf{y}}^{\left(L\right)}=[{y}_{1}^{\left(L\right)},\cdots ,{y}_{32}^{\left(L\right)}]$ can be expressed as:

From a probabilistic point of view, the output ${y}_{\mathit{\ell}}^{\left(L\right)}$ can be interpreted as posterior probability $p\left(\mathit{\ell}|\tilde{\rho}\right)$. Following the well-known Bayes classification rule, the negative log of the posterior is minimized (equivalent to maximizing the posterior probability):

The training step involves finding the parameters for subsequent layers *a*_{1}, ⋯, *a _{L}* that minimizes the loss (

*L*). At the testing step, the label is assigned based on the Bayes classification rule:

## 5. Experimental setup

The OAM carrying BGB were simulated in the lab, with 633-nm Gaussian laser and binary phase ferroelectric SLMs. The SLM was programmed with a binary phase hologram which created the desired mode multiplexing when illuminated with a Gaussian plane wave. The beam propagated in free space until the beam was imaged on the camera. Turbulence was simulated by the phase screen method described in [29] with the modified Kolmogorov turbulence model of Andrews [30]. We note that it is common to use the kolmogorov turbulence model and lab-based simulation is the only practical way to perform this experiment. However, we note that there is evidence that the model has limitation in predicting the lower layers of the atmosphere and when the atmosphere is stable; hence non-Kolmogorov models. For more detailed setup of the experiments, refer to section 4 of the paper by Doster et al. [11].

Here we used 5 OAM modes and multiplexed them to create 32 (=2^{5}) distinct patterns. In theory, the mode number *m* can be any integer and therefore infinite combinatorial choices can be made to configure a mode set. We tested three mode *sets* to see if a particular combination of modes is more robust to atmospheric turbulence effect and is easily recognized by the receiver.

Table 1 describes the mode numbers that compose each set. The sets are all approximately centered around *m* = 0, but contain different spacing between adjacent modes. Increasing the spacing between modes in the encoding set diminishes the effects of mode coupling, however, includes the higher mode numbers. OAM beams with higher mode numbers will have wider beam diameter, and therefore degraded by the turbulence field more severely. Five neighboring modes (i.e., m=0, 1, 2, 3, 4) were deliberately not chosen because the alignment and calibration of the bench-top system would have been too difficult.

Five mode numbers in each set, when multiplexed, generates 32 (2^{5}) distinctive mode patterns. The example of multiplexed patterns for set 1 is visualized in Fig. 5. Each pattern was transmitted and recorded 1000 times while being degraded with one of 1000 random phase screens that simulated the effect of turbulence. The 1000 phase screens were kept identical across the patterns. Pattens transmitted from phase screens indexed by 1 to 850 were used for the training set, and the remainder was used for the testing set.

Three levels of turbulence were tested, corresponding to the commonly used ${C}_{n}^{2}$ values of 10^{−16}, 10^{−15} and 10^{−14} respectively [31]. Note these values are also consistent with the optical communications work of Anguita *et al.* [32]. In terms of the experiment associated with this work, the ${C}_{n}^{2}$ values can be placed in terms of the ratio *D*/*r*_{0} = [5, 10, 15] where *D* is the linear dimension of the SLM and *r*_{0} is the Fried parameter. For all subsequent results, the level of turbulence will be quantified using this ratio. A few examples of OAM patterns propagating through different level of atmospheric turbulence for set 1 are shown in Fig. 8. Note that the degree of turbulence level wasn’t pushed to the system to failure. However, the same level of turbulence resulted in a failure with the traditional conjugate mode demultiplexing method shown in [11], whereas our routine still performed nearly perfectly.

For R-CDT pre-processing, the OAM beam patterns were normalized and then 90 equidistant projections of normalized patterns were used to compute the corresponding Radon transform. Examples of R-CDT transformed beam patterns are shown for mode set 1 in Fig. 6.

We trained a 32-class classification system to demultiplex 5 OAM-carrying beams. A classification system was independently trained for three different turbulence levels, three different mode sets, and for two types of input space – image space $({\mathbb{R}}^{151\times 151})$ and R-CDT $\left({\mathbb{R}}^{217\times 90}\right)$ space. Also, we utilized two different classifiers: LDA (linear discriminant analysis) and shallow CNN. The shallow CNN consists of 36 convolutional filters of size 11 × 5 with 3 × 3 strides, followed by 3×3 downsampling operation, relu operation, and batch-normalization (see Fig. 7.) CNN weights were initialized according to [33] and learned with stochastic gradient descent using Adam optimization technique [34]. In summary, total 36 (= 3 × 3 × 2 × 2) different classifiers were trained.

#### Performance measures

The performance of the different classifiers was evaluated with two criteria: i) how accurately they can identify transmitted OAM beam patterns (demultiplexing power) and ii) the computational complexity. The demultiplexing power is measured by the classification accuracy and the bit error ratio (BER). The computational complexity is measured by the complexity of the classifier and the number of floating point operations (FLOPS) during the testing phase. Specifically, we used the following performance measures:

- Classification accuracy (%):$$\text{Acc}=\frac{{\displaystyle {\sum}_{n=1}^{N}1\left({\widehat{\mathit{\ell}}}_{n}={\mathit{\ell}}_{n}\right)}}{N}\times 100,$$where
*ℓ*is the true label of the multiplexed beam_{n}*S*, ${\widehat{\mathit{\ell}}}_{n}$ is the predicted label, $1\left({\widehat{\mathit{\ell}}}_{n}={\mathit{\ell}}_{n}\right)$ is an indicator function, 1 if ${\widehat{\mathit{\ell}}}_{n}={\mathit{\ell}}_{n}$ and otherwise 0, and_{n}*N*is the number of sample multiplexed patterns considered. - bit error ratio (BER):$$\text{BER}=\frac{{\displaystyle {\sum}_{n=1}^{N}{\displaystyle {\sum}_{m=1}^{5}1\left({\widehat{p}}_{nm}={p}_{nm}\right)}}}{{\displaystyle {\sum}_{n=1}^{N}{\displaystyle {\sum}_{m=1}^{5}{\widehat{p}}_{nm}}}}.$$
*p*is the true mode indicator of_{nm}*S*, which is 1 if the multiplexed beam_{n}*S*consists of mode_{n}*m*, and otherwise 0. ${\widehat{p}}_{nm}$ is the predicted mode indicator. In this BER formulation, each OAM carrying beam represents ’1’, and therefore the on-off keying modulation is inherently assumed. - The number of trainable parameters in classifiers,
- The number of total floating point operations (FLOPs).

## 6. Results

#### LDA classification result

A linear discriminant analysis (LDA) [35] classifier was trained to discriminate received patterns in both the image space and R-CDT space. Computing LDA involves inverting a data matrix, but this is practically infeasible with a standard computer given the storage requirement and the cost of diagonalization for our dataset: 32,000 images of size 151 × 151. Therefore, the size of the input vector was reduced from 151 × 151 to 4000 using principal component analysis (PCA), and then to 31 using Fisher-LDA [35]. For computing the PCA subspace, the entire training set was used (since no label is required). For computing the Fisher-LDA subspace, however, 80% of the training set was used, and the 20% of the training set was used as a validation. The LDA classifier was trained using only this 80% of the training set. The R-CDT was computed using *θ* = [0, 2,·⋯, 180], in total 90 projections, resulting in 217×90 sized input data. Then the size of the input vector was reduced to 4000 using PCA, and then to 31 using Fisher-LDA, and the LDA classifier was trained using 80% of the training set.

Table. 2 shows the *testing* classification accuracy of the LDA classifiers for different levels of turbulence on image space and R-CDT space. The R-CDT space consistently produced the highest demultiplexing accuracy. The advantage of utilizing R-CDT became more prominent under severe turbulence. LDA couldn’t correctly classify the image set because of confounding deformation patterns introduced by the turbulence. However, the R-CDT transform decoded out the confounding factor from the images that were identifiable by LDA classifier.

In summary, applying the R-CDT makes the demultiplexing job as simple as using LDA classifier, which implies that the classification boundary within classes became much simpler and linear.

#### Shallow CNN classification result

In the second experiment, we utilized ‘shallow CNN’ to identify OAM beam patterns. Table 3 shows demultiplexing accuracy for the *testing set* in the R-CDT space using the R-CDT plus ‘shallow’ CNN model described previously. By utilizing a more powerful classification system, the multiplexed beams were identified with higher accuracy and lower BER.

The Table 4 shows the results reported previously in [11] which utilized Alexnet [36] for decoding each OAM pattern. It is worth mentioning that decoding in R-CDT space provides comparable results to utilizing the Alexnet but with about 100 times savings in computational complexity (in both FLOPs and the number of parameters, see Table 6 Table 6). The proposed shallow convolutional network network requires 9.11M FLOPs, which takes 1.98 *µ*s to compute using Nvidia’s GTX 980 (which offers 4.6 tera-FLOPS), which approximates to about 0.51MBits/sec of data transmission rate.

Note that there is marginal overhead for computing the R-CDT data. More precisely, the computational complexity associated with computing the R-CDT can be decomposed into two steps. The first is to compute the Radon transform. For a single *θ*, all pixels have to be considered, and therefore the computation complexity is *O*(*n*^{2}) for size *n*×*n* image. The second is to compute 1d-CDT, which has complexity of *O*(*n* log(*n*)). Since the R-CDT computation can be parallelized for each projection, the overhead of computing the R-CDT for 151 × 151 image would be about 23K FLOPs, marginal compared to the computational complexity of Alexnet itself.

Table 5 shows demultiplexing accuracy for the testing set in the image space utilizing the shallow CNN model. For square shaped input image, 11 × 11 size convolutional filters were used instead of 11 × 5 size filters. Hence, 1-layer CNN for image space required twice more FLOPs to compute than the 1-layer CNN for R-CDT space. For mode set 2 and 3, the shallow network for the image space was not powerful enough to identify the modes correctly. It is interesting that for the mode set 1, the shallow net performed as well as the complex Alexnet. However, as will be discussed in section 7, the shallow net in the image space is not enough to decode out the mis-alignment errors which are likely to happen in outside-of-lab experiments.

In summary, the R-CDT simplifies classification problem of OAM beams affected by atmospheric turbulence. The non-linearity introduced into the data by atmospheric turbulence effect can be decoded linearly using the R-CDT. And with the aid of rather simple classifiers such as 1-layer convolutional neural network, we were able to identify beams better or at least as accurate as current state-of-the art demultiplexing methods.

#### The best mode set for transmitting and receiving spatial patterns

The mode set 1 contains higher mode numbers with wider spacing between the adjacent modes. The mode set 3 contains smaller mode numbers with narrower spacing between the adjacent modes. The best BER per turbulence level is marked in bold font in Table 3. Increasing the spacing between modes diminishes the effect of mode coupling. When turbulence is not severe (*D*/*r*_{0} = 5, 10), we can confirm that the R-CDT + CNN classifier will perform better with mode set 1 or mode set 2. However, OAM beams with higher mode numbers will have wider beam diameter, and therefore degraded by the turbulence field more severely. Severe distortions are less likely to be decoded out by R-CDT, and therefore, for strong turbulence, the mode set 3 yields the lowest BER.

## 7. Discussion

#### The power of the R-CDT when the beams are misaligned

The OAM-carrying beam, generated in the laboratory, propagated through a pre-defined path with low beam wander from the simulated phase screen. In addition to the low beam wander case, we want to consider larger beam wander cases which introduce greater translations of the beam across the detector. In this experiment, we tested whether our classification system can robustly demultiplex off-grid OAM patterns, using 1-layer CNN in the radon-CDT space.

To mimic the effect of beam wandering, the images were randomly translated by [Δ*x*, Δ*y*] pixels, where Δ*x* and Δ*y* were independently drawn from a uniform distribution in a range [−151*α*, 151*α*] pixels, where *α* ∈ [0, 1] controls the severity of the beam wandering. The mentioned preprocessing was performed on-line, while training the CNN, which resulted in augmenting the dataset. Note that the translations in the image space convert to the vertical shifts in the R-CDT space by the Translation Property Stated in Park et al. [26]. The R-CDT under the effect of beam wandering can, therefore, be computed on-line without the additional cost of computation given that the R-CDT before the translation is already computed.

Table 7 shows the classification accuracy for the testing set in the R-CDT space and the image space. Two types of testing sets were compared: i) the original testing set without the effect of beam wandering and ii) the augmented testing set with the effect of beam wandering. For both testing cases, in the R-CDT space, the shallow CNN successfully identified the multiplexed patterns, whereas, in the image space, it failed. The R-CDT removed the wandering effect from the image and therefore reduced the task of ‘shallow’ CNN substantially, rendering CNN to classify the patterns correctly. However, in the image space, the shallow CNN could not decode the translation and the turbulence confounds at the same time. This observation matches the linear separability property of the CDT which enables the images with translation variation to become linearly separable in the CDT space.

In summary, the classifier learned in R-CDT space is robust to beam-wandering without the cost of increasing the complexity, whereas the classifier learned in the image space failed to identify the multiplexed patterns.

#### Multiplexing with low-resolution images

Here we investigated whether we can utilize low-resolution image for demultiplexing. The images were down-sampled by a factor of *α* = [1/4, 1/8, 1/16], followed by the Radon-CDT transform. Fig. 9 shows the down-sampled images (top) and Radon-CDT data (bottom). Table. 8 summarizes the size of the down-sampled data. Two classifiers were tested: a linear classifier (LDA) and a 1-layer regular neural network (NN) as before. Here we used a standard NN instead of CNN to reduce the computational complexity (FLOPS). In order to maintain a similar network complexity for each downsampling factor, the number of dense nodes in the NN was adjusted such that the network comprised approximately 500k parameters.

Table 9 shows the classification performance for LDA. For a down-sample factor of 4 and 8, classification performance was not significantly compromised by reducing the size of images. In fact, in the image space, training in reduced dimension facilitated higher linear classification accuracy than using full dimension images. Yet, again, Radon-CDT yielded consistently higher accuracy.

Table 10 shows the classification performance for NN. As similar to the previous result with 1-layer CNN, adding non-linear feature extraction step enhanced classification performance. For a down-sample factor of 4 and 8, the demultiplexing accuracy is nearly similar to full resolution images. In case of using low-resolution images that were downsampled by factor of 16, the total number of FLOPS for the dense neural network were 0.60M, which takes about 0.13*µ*s to compute using Nvidia’s GTX 980 (which offers 4.6 tera-FLOPS). This approximates to about 7.69MBits/sec of data transmission rate. We conclude that demultiplexing using low-resolution images can greatly reduce the computation cost without damaging the performance.

## 8. Summary and conclusions

In this study, we provided a new classification system that can accurately demultiplex OAM beam patterns which were distorted by atmospheric turbulence. Precisely, by combining the transport-based image transform, namely R-CDT, and a simple 1-layer CNN, our experiment outperformed existing state-of-art decoding performance for OAM signals utilizing complex CNN. The fact that the transforms are mathematically invertible ensures that no information will be lost in using Radon-CDT. We validated that the transform is a useful pre-processing step in the search for solutions.

This experiment was guided by newly developed theory suggesting a link between image turbulence and photon transport through the continuity Eq. (2a) In the presence of turbulence, the light will only approximately follow the optimal transport path. We hypothesized that certain deformations introduced by turbulence could be (nearly) linearly decoded in the R-CDT space, with the aid of non-linear feature extractor (1-layer CNN). In our experiment, the advantage of utilizing R-CDT space was clear. The R-CDT yielded a representation that was favored and could be decoded by 1-layer CNN instead of complex CNNs. This result supports our hypothesis that confounds introduced by the turbulence can be (nearly) linearly decoded out by the R-CDT.

Results showed that the new method could obtain comparable or better classification accuracies (bit error ratio) at a fraction of the computational cost, thus enabling higher bit rates. Precisely, our experiment revealed that the demultiplexing could be performed reliably under severe turbulence achieving BER < 1.0*e* − 3 for medium or strong turbulence, and ~ 1.0*e* − 3 for very severe turbulence, which is comparable to previous reported performance, with 1/100 less computational computation.

Future study will involve finding the optimal mode set and number of modes that guarantees the highest data rate with lower BER and formulating the linear classifier that incorporates the physics of the problem such that it can yield better or more or less the same performance as the shallow CNN.

## Funding

National Science Foundation (NSF) award CCF 1421502; the U.S. Naval Research Laboratory 6.1 Base Program: Enabling Capabilities of Optical Vortices and Orbital Angular Momentum for Naval S&T.

## References and links

**1. **A. Banerjee, Y. Park, F. Clarke, H. Song, S. Yang, G. Kramer, K. Kim, and B. Mukherjee, “Wavelength-division-multiplexed passive optical network (WDM-PON) technologies for broadband access: A review,” J. Opt. Netw. **4**, 737–758 (2005). [CrossRef]

**2. **D. M. Marom, D. T. Neilson, D. S. Greywall, C.-S. Pai, N. R. Basavanhally, V. A. Aksyuk, D. O. López, F. Pardo, M. E. Simon, Y. Low, P. Kolodner, and C. A. Bolle, “Wavelength-selective 1 × k switches using free-space optics and MEMS micromirrors: Theory, design, and implementation,” J. Lightwave Technol. **23**, 1620–1630 (2005). [CrossRef]

**3. **G. Xie, F. Wang, A. Dang, and H. Guo, “A novel polarization-multiplexing system for free-space optical links,” IEEE Photon. Technol. Lett. **23**, 1484–1486 (2011). [CrossRef]

**4. **D. J. Richardson, J. M. Fini, and L. E. Nelson, “Space-division multiplexing in optical fibres,” Nat. Photonics **7**, 354–362 (2013). [CrossRef]

**5. **J. Wang, J.-Y. Yang, I. M. Fazal, N. Ahmed, Y. Yan, H. Huang, Y. Ren, Y. Yue, S. Dolinar, and M. Tur *et al.*, “Terabit free-space data transmission employing orbital angular momentum multiplexing,” Nat. Photonics **6**, 488–496 (2012). [CrossRef]

**6. **H. Huang, G. Xie, Y. Yan, N. Ahmed, Y. Ren, Y. Yue, D. Rogawski, M. J. Willner, B. I. Erkmen, K. M. Birnbaum, S. J. Dolinar, M. P. J. Lavery, M. J. Padgett, and A. E. Willner, “100 Tbit/s free-space data link enabled by three-dimensional multiplexing of orbital angular momentum, polarization, and wavelength,” Opt. Lett. **39**, 197–200 (2014). [CrossRef] [PubMed]

**7. **A. E. Willer, H. Huang, Y. Yan, Y. Ren, N. Ahmed, G. Xie, C. Bao, L. Li, Y. Cao, Z. Zhao, J. Wang, M. P. J. Lavery, M. Tur, S. Ramachandran, A. F. Molisch, N. Ashrafi, and S. Ashrafi, “Optical communications using orbital angular momentum beams,” Adv. Opt. Photonics **7**, 66–106 (2015). [CrossRef]

**8. **Y. Ren, G. Xie, H. Huang, C. Bao, Y. Yan, N. Ahmed, M. P. J. Lavery, B. I. Erkmen, S. Dolinar, M. Tur, M. A. Neifeld, M. J. Padgett, R. W. Boyd, J. H. Shapiro, and A. E. Willner, “Adaptive optics compensation of multiple orbital angular momentum beams propagating through emulated atmospheric turbulence,” Opt. Lett. **39**, 2845–2848 (2014). [CrossRef] [PubMed]

**9. **M. Krenn, R. Fickler, M. Fink, J. Handsteiner, M. Malik, T. Scheidl, R. Ursin, and A. Zeilinger, “Communication with spatially modulated light through turbulent air across Vienna,” New J. Phys. **16**, 113028 (2014). [CrossRef]

**10. **E. Knutson, S. Lohani, O. Danaci, S. D. Huver, and R. T. Glasser, “Deep learning as a tool to distinguish between high orbital angular momentum optical modes,” in “Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series,”, vol. 9970 (2016), vol. 9970, p. 997013.

**11. **T. Doster and A. T. Watnik, “Machine learning approach to OAM beam demultiplexing via convolutional neural networks,” Appl. Opt. **56**, 3386–3396 (2017). [CrossRef] [PubMed]

**12. **J. Li, M. Zhang, and D. Wang, “Adaptive demodulator using machine learning for orbital angular momentum shift keying,” IEEE Photon. Technol. Lett. **25**, 1455–1458 (2017). [CrossRef]

**13. **J. M. Nichols, A. T. Watnik, T. Doster, S. Park, A. Kanaev, L. Cattell, and G. K. Rohde, “An optimal transport model for imaging in atmospheric turbulence,” arXiv preprint arXiv:1705.01050 (2017).

**14. **M. Mirhosseini, M. Malik, Z. Shi, and R. W. Boyd, “Efficient separation of the orbital angular momentum eigenstates of light,” Nat. Commun. **4**, 2781 (2013). [CrossRef] [PubMed]

**15. **I. B. Djordjevic and M. Arabaci, “LDPC-coded orbital angular momentum (OAM) modulation for free-space optical communication,” Opt. Express **18**, 24722–24728 (2010). [CrossRef] [PubMed]

**16. **G. Gibson, J. Courtial, M. J. Padgett, M. Vasnetsov, V. Pas’ko, S. M. Barnett, and S. Franke-Arnold, “Free-space information transfer using light beams carrying orbital angular momentum free-space information transfer using light beams carrying orbital angular momentum,” Opt. Express **12**, 5448–5456 (2004). [CrossRef] [PubMed]

**17. **M. Soskin, V. Gorshkov, M. Vasnetsov, J. Malos, and N. Heckenberg, “Topological charge and angular momentum of light beams carrying optical vortices,” Phys. Rev. A **56**, 4064 (1997). [CrossRef]

**18. **M. P. Lavery, G. C. Berkhout, J. Courtial, and M. J. Padgett, “Measurement of the light orbital angular momentum spectrum using an optical geometric transformation,” J. Opt-UK **13**, 064006 (2011). [CrossRef]

**19. **M. P. Lavery, F. C. Speirits, S. M. Barnett, and M. J. Padgett, “Detection of a spinning object using light’s orbital angular momentum,” Science **341**, 537–540 (2013). [CrossRef] [PubMed]

**20. **J. Leach, M. J. Padgett, S. M. Barnett, S. Franke-Arnold, and J. Courtial, “Measuring the orbital angular momentum of a single photon,” Phys. Rev. Lett. **88**, 257901 (2002). [CrossRef] [PubMed]

**21. **P. J. Winzer and G. J. Foschini, “MIMO capacities and outage probabilities in spatially multiplexed optical transport systems,” Opt. Express **19**, 16680–16696 (2011). [CrossRef] [PubMed]

**22. **E. Madelung, “Quantum theory in hydrodynamic form,” Zeit. f. Phys. **40**, 322–326 (1927). [CrossRef]

**23. **M. Tsang and D. Psaltis, “Metaphoric optical computing of fluid dynamics,” arXiv:physics/0604149 pp. 1–12 (2006).

**24. **J.-D. Benamou and Y. Brenier, “A computational fluid mechanics solution to the Monge-Kantorovich mass transfer problem,” Numerische Mathematik **84**, 375–393 (2000). [CrossRef]

**25. **S. Kolouri, S. R. Park, M. Thorpe, D. Slepcev, and G. K. Rohde, “Optimal mass transport: Signal processing and machine-learning applications,” IEEE Signal Process. Mag **34**, 43–59 (2017). [CrossRef]

**26. **S. R. Park, S. Kolouri, S. Kundu, and G. K. Rohde, “The cumulative distribution transform and linear pattern classification,” Appl. Comput. Harmon. A. (2017). [CrossRef]

**27. **S. Kolouri, S. R. Park, and G. K. Rohde, “The radon cumulative distribution transform and its application to image classification,” IEEE Trans. Image Process. **25**, 920–934 (2016). [CrossRef]

**28. **J. Ba and R. Caruana, “Do deep nets really need to be deep?” in “*Advances in Neural Information Processing Systems 27*,” Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, eds. (Curran Associates, Inc., 2014), pp. 2654–2662.

**29. **R. Lane, A. Glindemann, and J. Dainty, “Simulation of a kolmogorov phase screen,” Wave. Random Media **2**, 209–224 (1992). [CrossRef]

**30. **L. C. Andrews, “An analytical model for the refractive index power spectrum and its application to optical scintillations in the atmosphere,” J. of Mod. Opt. **39**, 1849–1853 (1992). [CrossRef]

**31. **T. Doster and A. T. Watnik, “Laguerre-Gauss and Bessel-Gauss beams propagation through turbulence: analysis of channel efficiency,” Appl. Opt. **55**, 10239–10246 (2016). [CrossRef]

**32. **J. A. Anguita, M. A. Neifeld, and B. V. Vasic, “Turbulence-induced channel crosstalk in an orbital angular momentum-multiplexed free-space optical link,” Appl. Opt. **47**, 2414–2429 (2008). [CrossRef] [PubMed]

**33. **X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” in “Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics,” (2010), pp. 249–256.

**34. **D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980 (2014).

**35. **R. A. Fisher, “The use of multiple measurements in taxonomic problems,” Ann. Hum. Genet. **7**, 179–188 (1936).

**36. **A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in “*Advances in neural information processing systems*,” (NIPS2012), pp. 1097–1105.