Papers
Topics
Authors
Recent
2000 character limit reached

The universal approximation theorem for complex-valued neural networks (2012.03351v2)

Published 6 Dec 2020 in math.FA, cs.LG, and stat.ML

Abstract: We generalize the classical universal approximation theorem for neural networks to the case of complex-valued neural networks. Precisely, we consider feedforward networks with a complex activation function $σ: \mathbb{C} \to \mathbb{C}$ in which each neuron performs the operation $\mathbb{C}N \to \mathbb{C}, z \mapsto σ(b + wT z)$ with weights $w \in \mathbb{C}N$ and a bias $b \in \mathbb{C}$, and with $σ$ applied componentwise. We completely characterize those activation functions $σ$ for which the associated complex networks have the universal approximation property, meaning that they can uniformly approximate any continuous function on any compact subset of $\mathbb{C}d$ arbitrarily well. Unlike the classical case of real networks, the set of "good activation functions" which give rise to networks with the universal approximation property differs significantly depending on whether one considers deep networks or shallow networks: For deep networks with at least two hidden layers, the universal approximation property holds as long as $σ$ is neither a polynomial, a holomorphic function, or an antiholomorphic function. Shallow networks, on the other hand, are universal if and only if the real part or the imaginary part of $σ$ is not a polyharmonic function.

Citations (51)

Summary

  • The paper demonstrates that complex-valued neural networks can universally approximate continuous functions when using non-polynomial, non-holomorphic activation functions.
  • It establishes distinct criteria for shallow and deep architectures, highlighting a broader design space for activation functions in the complex domain.
  • The findings provide a theoretical framework paving the way for practical CVNN implementations in applications like signal processing and MRI imaging.

Universal Approximation Theorem for Complex-Valued Neural Networks

Complex-valued neural networks (CVNNs) offer a promising direction for expanding the applicability of neural architectures, particularly in domains where inputs naturally involve complex numbers, such as in signal processing and MRI imaging. The paper "The universal approximation theorem for complex-valued neural networks" (2012.03351) addresses a generalization of the universal approximation theorem to complex-valued networks, detailing the conditions under which a complex neural network can approximate any continuous function within specified bounds.

Characterization of Activation Functions

The distinguishing feature of CVNNs, where both weights and activation functions extend over the complex plane, introduces unique challenges and opportunities compared to their real-valued counterparts. This work rigorously defines which complex activation functions enable networks to achieve universal approximation. For deep networks with at least two hidden layers, the paper identifies conditions under which they universally approximate any continuous function on CdC^d:

  • Deep Networks: The universal approximation property holds as long as the activation function is neither a polynomial nor purely holomorphic or antiholomorphic. Activation functions that avoid these restrictions can enable a network to densely cover the space of continuous functions, achieving practical universality.
  • Shallow Networks: These only achieve universality if neither the real nor the imaginary part of the activation function is polyharmonic. This reflects a stricter constraint on approximation ability at lower network depths.

Insights and Implications

The implications of these results are profound for theoretical and practical advancements in CVNNs:

  • Variation in Activation Function Properties: The characteristics required vary distinctly between shallow and deep networks, underscoring the increased flexibility of deeper architectures. This insight informs architectural decisions when developing complex-valued models for specific applications.
  • Complex vs. Real-Valued Domains: The findings underscore unique properties in the complex domain, revealing that broader classes of activation functions are permissible compared to real-valued networks. This expands the potential architecture design space and supports increased model complexity without sacrificing approximation capabilities.
  • Applications in Complex Domains: Effectiveness in domains such as quantum computing and advanced imaging techniques could be significantly influenced by the appropriate selection and design of complex activation functions, enhancing computational efficiency and accuracy.

Future Directions

This theoretical foundation invites several avenues for further exploration:

  • Practical Implementation: Developing efficient training algorithms for CVNNs leveraging universally approximating activation functions.
  • Exploring Holomorphic Functions: Investigating the boundary conditions where nearly holomorphic functions might offer approximation advantages without fully satisfying universality, possibly bridging gaps for specific applications.
  • Cross-Domain Network Structures: Incorporating cross-domain architectures where complex and real-valued components interact seamlessly, enhancing model versatility and performance.

Conclusion

By extending the universal approximation theorem to complex-valued networks, this research deepens our understanding of neural approximators in multidimensional complex domains. The results offer practical guidelines for designing networks that leverage the richness of the complex plane, broadening the horizon for future innovations in neural network design and application.

Whiteboard

Knowledge Gaps

Knowledge gaps, limitations, and open questions

Below is a single, focused list of what remains missing, uncertain, or unexplored in the paper, stated concretely to support follow-up research.

  • Minimal regularity for deep-network necessity: The necessity part of the deep theorem requires continuity of the activation function. Determine natural, weaker regularity assumptions (ideally within the class MM of locally bounded functions whose set of discontinuities has null closure) under which the same necessary conditions (polynomial in (z,z)(z,\overline{z}), holomorphic, or antiholomorphic) still exclude universality.
  • Non–locally-bounded activations: The theory does not cover activation functions that are not locally bounded (e.g., principal branches of arctan\arctan and $\arctanh$ on C\mathbb{C}). Develop a universality classification for CVNNs using such activations or provide explicit counterexamples.
  • Quantitative approximation theory: The results are qualitative (density). Establish approximation rates, network size (width/depth) bounds, and sample complexity for CVNNs with specific “good” activations (e.g., branch-cut functions like arcsin\arcsin, arccos\arccos, $\arcsinh$) and for target function classes (e.g., Hölder/Sobolev classes).
  • Constructive approximation schemes: Provide explicit constructive procedures (network architectures and parameter choices) to achieve prescribed approximation error with CVNNs under the paper’s conditions, along with complexity guarantees.
  • Other topologies and domains: Extend the characterization to approximation in Llocp(Cd)L^p_{\mathrm{loc}}(\mathbb{C}^d), weighted sup norms on unbounded domains, and function spaces with additional regularity (e.g., Sobolev), including a full sufficiency/necessity theory beyond locally uniform convergence.
  • Domains of empty interior in Cd\mathbb{C}^d: The paper notes that approximation on real cubes [0,1]nCn[0,1]^n\subset\mathbb{C}^n (which have empty interior in Cn\mathbb{C}^n) behaves differently (holomorphic activations may then be viable). Develop a complete universality characterization for such lower-dimensional real submanifolds.
  • Holomorphic activations with singularities in deep networks: Proposition 1 addresses shallow networks and holomorphic activations with isolated singularities. Provide a definitive universality/non-universality result for deep networks under holomorphic (non-entire) activations with isolated or non-isolated singularities, and clarify how admissibility constraints on weights interact with depth.
  • Full classification for discontinuous activations: The paper exhibits a discontinuous σM\sigma\in M that equals a polynomial p(z,z)p(z,\overline{z}) almost everywhere, yet yields universality for L2L\ge 2. Develop a necessary-and-sufficient characterization of universality for discontinuous activations in MM (including conditions that exclude such pathological exceptions).
  • Activation functions used in practice: Systematically classify common CVNN activations (e.g., modReLU, separate real/imaginary ReLU, amplitude-phase nonlinearities) within the paper’s framework (are they in MM, almost polyharmonic, polynomial in (z,z)(z,\overline{z}), etc.?), and infer universality outcomes from the theorems.
  • Polyanalytic vs. polyharmonic criteria: The shallow-network characterization uses “almost polyharmonic” (via the Laplacian). Investigate whether an equivalent or sharper criterion in terms of polyanalyticity (zmf0\partial_{\overline{z}}^m f\equiv 0) can be established, and compare its scope to the current polyharmonic condition.
  • Universality for holomorphic targets: While holomorphic activations are not universal for all continuous targets, study whether CVNNs with holomorphic activations are universal within the class of holomorphic target functions on open subsets of Cd\mathbb{C}^d, and under what network constraints.
  • Architectural generalizations: Extend the universality characterization to complex-valued architectures beyond fully connected feedforward networks (e.g., convolutional CVNNs, residual CVNNs, bounded-width deep CVNNs), paralleling known real-valued results.
  • Weight and bias constraints: Analyze how restrictions on parameter sets (e.g., real-only weights/biases, unitary/orthogonal constraints, quantization) affect universality in the complex setting for shallow and deep networks.
  • Stability to activation perturbations: Quantify robustness—if σ\sigma is close (in LlocpL^p_{\mathrm{loc}} or uniform on compacts) to an almost polyharmonic or holomorphic activation, does universality persist or fail? Provide thresholds and counterexamples.
  • Multi-valued branches and branch-cut design: For “mostly holomorphic” activations with branch cuts (e.g., zLog(z)z\cdot\mathrm{Log}(z), principal branches of inverse trig/hyperbolic functions), characterize how branch choice and cut geometry affect universality and what minimal topological/analytic conditions on the discontinuity set suffice.
  • Depth-sensitive boundary cases: The paper shows more “good” activations for deep than shallow networks. Investigate whether there exist borderline activations for which shallow networks fail but depth L=2L=2 suffices (beyond the pathological discontinuous case), and identify mechanisms by which additional layers overcome shallow obstructions.
  • Approximation under compositional or structural priors: Explore universality when target functions possess known structure (e.g., separability, sparsity, radial symmetry), and whether weaker conditions on σ\sigma suffice in these restricted settings.
  • Extension beyond complex numbers: Examine whether the proof techniques (e.g., Wirtinger calculus–based arguments) extend to quaternionic or Clifford-algebra–valued networks, and formulate universality criteria in those algebras.
  • Numerical training implications: Although theoretical, assess whether the “good” activation functions admitted by the theorems are trainable in practice (e.g., gradient stability, initialization), and whether the pathological/discontinuous cases can be avoided or regularized without losing universality.

Practical Applications

Practical Applications of “The universal approximation theorem for complex-valued neural networks”

The paper characterizes which complex activation functions guarantee universal approximation for shallow and deep complex-valued neural networks (CVNNs). This enables principled activation design, auditing of existing models, and grounded adoption of CVNNs across domains with naturally complex-valued data.

Immediate Applications

Below are actionable use cases and workflows that can be deployed now, including sectors, tools, and feasibility notes.

  • Activation selection guidelines for CVNNs across sectors (software and AI tooling)
    • Use activations that guarantee universality:
    • For shallow CVNNs: choose any activation that is not almost polyharmonic (e.g., split activations such as σ(z) = ReLU(Re z) + i·ReLU(Im z), bounded non-constant functions like σ(z) = z/(1+|z|), or branch-cut-based functions like principal arcsin, arcsinh, or z·Log(z) with safe handling on branch cuts).
    • For deep CVNNs (L ≥ 2): avoid activations that are holomorphic, antiholomorphic, or polynomial in z and z̄ almost everywhere. Most non-smooth/non-holomorphic functions will be safe.
    • Tools/workflows that can be implemented:
    • A “CVNN Activation Registry” in PyTorch/JAX/TensorFlow with certified-universal activations and flags for non-universal ones (e.g., sin, sinh, tan, tanh, pure holomorphic functions).
    • An “Activation Audit” utility that numerically checks whether σ is holomorphic/antiholomorphic or polynomial in (z, z̄), and screens for almost polyharmonic behavior via discrete Laplacian tests.
    • Assumptions/Dependencies: σ must be locally bounded with discontinuities confined to a measure-zero set; universality is about expressivity, not guaranteed learnability or training stability.
  • MRI fingerprinting and complex medical imaging pipelines (healthcare)
    • Replace holomorphic activations (e.g., sin/sinh) with non-holomorphic/non-polyharmonic ones in CVNN architectures used for reconstruction, parameter mapping, or denoising.
    • Expected outcome: expressivity guarantees and reduced risk of hidden representational blind spots.
    • Assumptions/Dependencies: fixed-depth feedforward architectures; compatibility with clinical ML frameworks; careful handling of branch cuts (e.g., for Log, arcsin) in autodiff.
  • Complex baseband signal processing in wireless communications (energy and telecom)
    • Use deep CVNNs with safe activations for channel equalization, beamforming, OFDM symbol detection, and direction-of-arrival estimation.
    • Workflow: implement split activations or branch-cut-aware activations; verify via the audit tool; benchmark against traditional complex-linear models.
    • Assumptions/Dependencies: hardware support for complex tensors; training datasets representative of complex-valued signals; ensure activation is not holomorphic or polynomial in (z, z̄).
  • Radar, sonar, and synthetic aperture imaging (robotics and defense)
    • Deploy CVNNs for inverse problems and target recognition with guaranteed expressivity by auditing activations and replacing holomorphic ones.
    • Assumptions/Dependencies: feedforward CVNNs; consistent gradient handling around branch cut discontinuities; adherence to non-holomorphic activation policy.
  • Power grid phasor analysis and state estimation (energy)
    • Use deep CVNNs with safe activations for complex phasor regression/classification in grid monitoring and anomaly detection.
    • Assumptions/Dependencies: existing complex-valued data pipelines; performance validation; activation universality does not replace domain constraints or stability requirements.
  • Audio and speech processing in the Fourier domain (software, consumer tech)
    • Apply shallow CVNNs with bounded non-constant activations or split non-smooth activations to spectral denoising, source separation, and equalization.
    • Assumptions/Dependencies: complex spectral inputs; optimization considerations for non-smooth activations; verification via compact-domain approximation metrics.
  • Academic course materials and research tooling (education and academia)
    • Integrate Wirtinger calculus and polyharmonicity checks into ML curricula; release open-source notebooks demonstrating how activation properties affect universality.
    • Tools: sample scripts to test Laplacian iterates; examples that contrast holomorphic vs. safe activations.
    • Assumptions/Dependencies: undergraduate-level complex analysis prerequisites; reproducible environments.
  • Corrections for “Extreme Learning Machines” (ELM) with complex activations (academia and research QA)
    • Update ELM implementations to use non-holomorphic/non-polyharmonic activations to align with universal approximation guarantees, replacing incorrect choices like sin/sinh/tanh.
    • Assumptions/Dependencies: reliance on universality results under fixed-depth feedforward settings; random-weight layers must be coupled with proper output-layer training.
  • Model governance and documentation standards for complex-valued ML in regulated contexts (policy and compliance)
    • Add an “Activation Universality Checklist” to model cards: specify activation properties, justify selection against holomorphy/polyharmonicity criteria, and provide audit results.
    • Assumptions/Dependencies: regulatory emphasis on model transparency; compatibility with existing documentation standards.

Long-Term Applications

These applications require further research, scaling, tooling, or development before widespread deployment.

  • Branch-cut–aware autodiff and training kernels for CVNNs (software infrastructure)
    • Develop robust autodiff that handles edge cases of discontinuities at measure-zero sets (e.g., principal branches of Log, arcsin, arcsinh) without gradient pathologies.
    • Dependencies: numerical stability and consistent gradient definitions around branch points; framework-level support for complex-valued differentiation.
  • Formal verification and certification of universality within ML frameworks (software and standards)
    • Create static analyzers that certify whether a given CVNN architecture and activation set is universal under the paper’s criteria; integrate with CI/CD pipelines.
    • Dependencies: implementation of symbolic/numeric tests for holomorphy/antiholomorphy and polynomial-in-(z, z̄) detection; standardized APIs.
  • Hardware accelerators for complex-valued deep learning (semiconductors and HPC)
    • Design complex arithmetic units and activation function blocks optimized for non-smooth/non-holomorphic functions, enabling efficient CVNN inference and training.
    • Dependencies: co-design with ML frameworks; energy efficiency targets; verification for edge devices in telecom and medical imaging.
  • Universality results extended to other complex architectures (academia and R&D)
    • Generalize the theorem to convolutional CVNNs, residual CVNNs, bounded-width deep CVNNs, and continuous/infinite-dimensional settings.
    • Dependencies: new mathematical proofs (e.g., adapting Stone–Weierstrass–type results to structured kernels), empirical validation.
  • Approximation rates, sample complexity, and learnability with safe activations (academia and industry research)
    • Investigate quantitative bounds (rates of approximation, required width/depth) for the recommended activations; link to optimization landscape and generalization.
    • Dependencies: cross-disciplinary work bridging approximation theory and non-convex optimization; large-scale benchmarks.
  • Domain-specific libraries and benchmark suites for complex-valued ML (industry consortia)
    • Curate datasets and tasks (MRI, OFDM, radar, power grid phasors) with standardized CVNN baselines that use universally-approximating activations; define shared metrics.
    • Dependencies: community buy-in; licensing; long-term maintenance; alignment with academic findings.
  • Policy and standards for complex-valued ML in safety-critical systems (policy)
    • Develop guidelines ensuring activation choices do not limit expressivity in medical or infrastructure applications; include formal audit requirements for activation universality.
    • Dependencies: collaboration with standards bodies; empirical evidence; impact assessments.
  • Education: advanced coursework and textbooks integrating complex analysis and deep learning (academia)
    • Produce materials that systematically teach Wirtinger calculus, polyharmonic functions, and their role in ML, with hands-on coding modules.
    • Dependencies: faculty expertise; sustained curriculum development funding; adoption across programs.
  • Robust CVNN deployments on edge devices (energy, telecom, consumer)
    • End-to-end pipelines that leverage universal activations with efficient kernels, enabling real-time inference for complex signals (e.g., beamforming on base stations, audio on smartphones).
    • Dependencies: hardware support; optimized libraries; rigorous field testing.

Notes on assumptions and feasibility across applications:

  • Universality holds for feedforward CVNNs of fixed depth with componentwise activations and locally bounded σ whose discontinuities have measure-zero closure; it guarantees approximation capacity on compact sets, not training success.
  • Activation choices like pure holomorphic/antiholomorphic functions or polynomials in (z, z̄) should be avoided for deep networks; almost polyharmonic functions should be avoided for shallow networks.
  • Practical deployment requires careful numerical handling of branch cuts and non-smooth activations, framework-level complex autodiff support, and domain-specific validation.

Authors (1)

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 190 likes about this paper.