Se hai bisogno urgente del nostro intervento puoi contattarci al numero 370 148 9430
RENOR & Partners è una società di consulenza d'impresa con alti standard qualitativi e con i prezzi più bassi del mercato.
RENOR & Partners S.r.l.
Via Cicerone, 15 - Ariccia (RM)
info@renor.it
This post is also available in:
Italiano (Italian)
I’ll begin this article with a premise… PHP is certainly not the ideal language when it comes to artificial intelligence. Neural networks are typically the domain of more scientific languages like Python, which offers optimized libraries for this purpose such as PyTorch and NumPy—and that’s usually the language I rely on when working in this kind of context.
However, a few months ago I was chatting with a friend over an aperitif, and as usual, we started throwing around a bunch of silly ideas about AI (I’ll spare you the wild ones we came up with). But then we focused on one specific topic: “Would it be possible to create a neural network in PHP?”
The short answer is yes—albeit with some limitations.
To understand this article, we first need to make a few preliminary remarks…
In today’s technological landscape, Artificial Intelligence (AI in English, or IA in Italian) is no longer a niche topic, but a strategic pillar for nearly any company aiming to extract value from its data, optimize processes, and offer truly competitive products.
In the field of HR & Workforce Management, for instance, predictive models make it possible to anticipate sudden absences, dynamically calibrate shifts, and, ultimately, save time and resources—precisely the “speed/quality” combination we discussed in this article: https://renor.it/velocita-e-qualita-nei-progetti/.
But how many of you actually know what artificial intelligence really is?
The term Artificial Intelligence refers to the set of techniques that allow a computer system to exhibit abilities normally attributed to human intelligence: reasoning, learning, decision-making, and recognizing complex patterns.
Within this broad field, Machine Learning represents the approach in which learning takes place through statistical analysis of data, without having to manually code every rule.
In recent years, the rise of Deep Learning has pushed evolution even further: very deep neural networks, composed of dozens or even hundreds of processing layers, are able to detect structures in data that traditional models fail to capture.
An artificial neural network is a mathematical model inspired by the structure of the neurons in our brain. After all, almost all human discoveries are based on observing what already exists in nature.
The neural structure of the brain allows us to solve problems where the relationship between input (what we perceive from the outside) and output (what we derive from it) is non-linear or difficult to formalize.
Let’s imagine, for example, that we want to predict the likelihood of an employee being late based on variables such as traffic, weather conditions, personal history of delays and absences, and public transportation schedules. The relationship between these factors is far too complex to be described with a few conditional statements—but a well-trained neural network can learn it by analyzing a large amount of historical data.
The reason for this ability lies in the fact that each connection between neurons is associated with a weight and a bias term. We will later understand what these are.
During the training phase, the network adjusts these parameters to minimize a loss function that measures the discrepancy between its prediction and the actual value. This refinement process occurs through the backpropagation algorithm, which computes the gradients, and an optimization method such as stochastic gradient descent, which iteratively updates the weights.
At the end of the process, the network does not contain a set of rules written by the programmer, but rather a collection of numerical coefficients that encode, in a distributed manner, the knowledge extracted from the data.
Now that we have a general overview, we can understand the roles and responsibilities of weights and biases.
The weight is the numerical coefficient that modulates the intensity with which an input signal contributes to the activation of the next neuron. We can think of it like a volume knob: turning it up amplifies the contribution of that specific feature, while turning it down reduces it, even to the point of inverting its effect.
From a mathematical standpoint, the weight multiplies the input value and determines the slope of the function the network is learning; a high weight indicates that the input is strongly correlated with the output, while a weight close to zero makes it practically irrelevant.
The bias, on the other hand, acts as a translator: it is added to the product of input and weight, shifting the overall result up or down before the activation function is applied.
Therefore, if the weights—as we’ve seen—represent the slope of a line, the bias represents the y-intercept, allowing the network to model functions that do not necessarily pass through the origin.
In practice, the bias allows a neuron to activate even when all inputs are zero, introducing that flexibility which makes neural models true function approximators.
During training, weights and biases are updated using the backpropagation algorithm: the gradient of the loss function indicates in which direction and by how much each parameter should be adjusted to reduce the gap between the network’s prediction and the actual value.
Iteration after iteration, the network adjusts these two types of parameters in a coordinated manner, refining both the slope and the position of its decision curves, until it captures the complexity of the phenomenon we aim to model.
In summary, weights and biases are the fundamental building blocks of the network’s adaptive intelligence: the former controls the relative importance of the inputs, while the latter provides the freedom to move within the solution space without predefined geometric constraints.
One might wonder whether it makes sense to build a neural network in a language traditionally considered for web backend development. The answer is yes—for certain well-defined scenarios.
First of all, a native implementation avoids introducing an additional runtime—typically Python—thereby simplifying the build, test, and deploy cycle when the entire application stack is already in PHP. Furthermore, for microservices that require lightweight models and inference times in the order of a few milliseconds, a self-contained solution is more than adequate.
There’s also the good old educational aspect, which should not be underestimated: writing the network line by line dismantles the “black box” aura that surrounds many deep learning frameworks, and puts developers in a position to understand, optimize, and—most importantly—debug every step of the computation. This offers a detailed overview of how a neural network works, leading to a deeper and more comprehensive understanding.
To move forward, we need to define the backbone of a “bare-metal” neural network that we can build using only the core PHP engine—without relying on C extensions or external libraries.
In practice, this means modeling, using native data structures, the following three fundamental elements:
Each layer will be represented by a simple two-dimensional array of weights ($weights
) and a one-dimensional array of biases ($biases
). The transfer of activations from one layer to the next will occur through standard matrix-vector multiplication, followed by the application of an activation function (sigmoid, ReLU, or tanh, depending on the use case).
This minimalist scheme has the advantage of remaining readable and facilitating step-by-step debugging, but it imposes certain design choices: no automatic parallelization, no SIMD optimizations, and a need for extreme attention to computational complexity, since the unrestrained use of nested foreach loops in PHP can cause inference times to spike.
Nevertheless, for networks with one or two hidden layers and a number of neurons in the order of hundreds, performance remains surprisingly decent—provided that op-caching is leveraged and redundant memory allocations are avoided.
In essence, before diving into the actual code, it’s important to understand that in PHP, neurons are nothing more than rows in arrays, and gradients are float values updated within a loop. The simplicity of the implementation makes the arithmetic of the network easy to grasp, making each step of the learning process clearly visible.
At this point in the article, it’s appropriate to present in detail the bare-metal implementation of a two-layer feed-forward neural network, written entirely in PHP 8.1.
The following code maintains maximum transparency: every mathematical operation is explicitly expressed using simple for loops, each intermediate variable is stored so it can be inspected during debugging, and the only prerequisites are the PHP engine and opcache enabled in production.
To make the project easily reusable, I have divided the code into two separate files.
The first, NeuralNetwork.php, contains all the neural network logic, complete with classes, activation functions, forward-pass, backpropagation, and training routines.
The second, demo_xor.php, is a simple execution script that imports the class, instantiates the network, trains it on the classic XOR problem, and prints the results to the screen.
<?php
declare(strict_types=1);
/**
* Minimal feed-forward neural network in pure PHP
* MIT License – (c) 2025
*/
/* ---------- Activation functions ---------- */
/** Sigmoid activation */
function sigmoid(float $x): float
{
return 1.0 / (1.0 + exp(-$x));
}
/** Derivative of the sigmoid */
function sigmoid_derivative(float $x): float
{
$s = sigmoid($x);
return $s * (1 - $s);
}
/** ReLU activation */
function relu(float $x): float
{
return max(0.0, $x);
}
/** Derivative of ReLU */
function relu_derivative(float $x): float
{
return $x > 0 ? 1.0 : 0.0;
}
/* ---------- Layer class ---------- */
final class Layer
{
public readonly int $in; // number of input neurons
public readonly int $out; // number of output neurons
/** @var float[][] weight matrix [out][in] */
public array $W = [];
/** @var float[] bias vector [out] */
public array $b = [];
private array $lastInput = [];
private array $lastZ = [];
private array $lastOutput = [];
public function __construct(
int $in,
int $out,
callable $act,
callable $actDer
) {
$this->in = $in;
$this->out = $out;
$this->activation = $act;
$this->activation_d = $actDer;
// Xavier/Glorot uniform initialization
$limit = sqrt(6 / ($in + $out));
for ($i = 0; $i < $out; $i++) {
$this->b[$i] = 0.0;
for ($j = 0; $j < $in; $j++) {
$this->W[$i][$j] = (mt_rand() / mt_getrandmax()) * 2 * $limit - $limit;
}
}
}
/** Forward propagation */
public function forward(array $input): array
{
$this->lastInput = $input;
$this->lastZ = [];
$this->lastOutput = [];
for ($i = 0; $i < $this->out; $i++) {
$z = $this->b[$i];
for ($j = 0; $j < $this->in; $j++) {
$z += $this->W[$i][$j] * $input[$j];
}
$this->lastZ[$i] = $z;
$this->lastOutput[$i] = ($this->activation)($z);
}
return $this->lastOutput;
}
/** Back-propagation, returns gradient for previous layer */
public function backward(array $gradOutput, float $lr): array
{
$gradInput = array_fill(0, $this->in, 0.0);
for ($i = 0; $i < $this->out; $i++) {
$delta = $gradOutput[$i] * ($this->activation_d)($this->lastZ[$i]);
// Propagate gradient and update weights
for ($j = 0; $j < $this->in; $j++) {
$gradInput[$j] += $delta * $this->W[$i][$j];
$this->W[$i][$j] -= $lr * $delta * $this->lastInput[$j];
}
// Update bias
$this->b[$i] -= $lr * $delta;
}
return $gradInput;
}
/* callable */ private $activation;
/* callable */ private $activation_d;
}
/* ---------- NeuralNetwork class ---------- */
final class NeuralNetwork
{
/** @var Layer[] */
private array $layers = [];
/** Add a layer to the network */
public function addLayer(Layer $layer): void
{
$this->layers[] = $layer;
}
/** Forward pass through all layers */
public function predict(array $x): array
{
$out = $x;
foreach ($this->layers as $layer) {
$out = $layer->forward($out);
}
return $out;
}
/**
* Train the network with SGD and mean squared error
* @param float[][] $xTrain
* @param float[][] $yTrain
*/
public function train(
array $xTrain,
array $yTrain,
int $epochs = 1000,
float $lr = 0.1,
bool $verbose = true,
int $logStride = 100
): void {
$n = count($xTrain);
for ($e = 1; $e <= $epochs; $e++) {
$loss = 0.0;
for ($k = 0; $k < $n; $k++) {
$out = $this->predict($xTrain[$k]);
// MSE derivative: 2*(ŷ - y)
$grad = [];
foreach ($out as $i => $o) {
$diff = $o - $yTrain[$k][$i];
$grad[$i] = 2 * $diff;
$loss += $diff ** 2;
}
// Backward pass
for ($l = count($this->layers) - 1; $l >= 0; $l--) {
$grad = $this->layers[$l]->backward($grad, $lr);
}
}
if ($verbose && $e % $logStride === 0) {
printf("Epoch %d/%d - loss: %.6f\n", $e, $epochs, $loss / $n);
}
}
}
}
In this file, we find the activation functions—sigmoid and ReLU—along with their respective gradients. Keeping them at the global scope, rather than encapsulated within the class, reduces the overhead of method calls and allows them to be passed as callables directly to the layer constructor, maintaining flexibility without sacrificing performance.
The Layer class is declared as final to prevent unwanted extensions and represents the logical unit of computation. It contains the in and out integers marked as readonly, ensuring their integrity throughout the object’s entire lifecycle.
The weight matrix and bias vector are initialized using the Xavier initialization technique, which distributes values over an interval proportional to the square root of the total number of input and output connections.
This mathematical strategy prevents the activations from saturating during the initial epochs—a phenomenon that would otherwise compromise the learning process.
where denotes the continuous uniform distribution between a and b; the term under the square root,
, serves as the upper and lower bound of the sampling interval.
Alternatively, one could use a Gaussian distribution with zero mean and variance , but the expression above—used in the code example—is the original uniform form proposed by Glorot and Bengio.
During forward propagation, each neuron sums the dot product and bias, then applies the chosen activation function. The intermediate results—lastInput, lastZ, and lastOutput—are stored for reuse during backpropagation, allowing the gradient computation to proceed without recalculating anything. This design is ideal for step-by-step debugging.
The backward method receives the error gradient from the next layer, combines it with the local derivative of the activation function, and updates weights and biases by subtracting a fraction proportional to the learning rate. At the same time, it returns the gradient to be propagated backward to the previous layer.
The inner loop is entirely manual—a deliberate choice that highlights the underlying mathematics and makes the code perfectly transparent, even to those who have never used specialized libraries.
The NeuralNetwork class—also declared as final—acts as the orchestrator: it maintains the array of layers and provides the predict method, which channels an input vector through each layer.
The train method implements stochastic gradient descent with mean squared error. It iterates over the training set for a specified number of epochs, computes for each example the difference between the predicted and actual output, doubles the value, and propagates the gradient backward, updating the layers in reverse order.
At each iteration, it accumulates the loss to provide a global indicator which—if the verbose flag is enabled—is printed at regular intervals, allowing real-time monitoring of the model’s convergence.
The layer constructor accepts callables, so if in the future you wish to use different activation functions, you can simply pass their references without altering the architecture.
<?php
declare(strict_types=1);
// Include the neural network implementation
require_once __DIR__ . '/NeuralNetwork.php';
// Training data for XOR
$xTrain = [
[0, 0],
[0, 1],
[1, 0],
[1, 1],
];
$yTrain = [
[0],
[1],
[1],
[0],
];
// Build the network: 2-3-1 with sigmoid activations
$net = new NeuralNetwork();
$net->addLayer(new Layer(2, 3, 'sigmoid', 'sigmoid_derivative'));
$net->addLayer(new Layer(3, 1, 'sigmoid', 'sigmoid_derivative'));
// Train the network
$net->train($xTrain, $yTrain, epochs: 5000, lr: 0.5, logStride: 500);
// Test predictions
foreach ($xTrain as $sample) {
$out = $net->predict($sample)[0];
printf("Input %s ⇒ Output %.4f\n", json_encode($sample), $out);
}
In this functional snippet, the training dataset and the corresponding ground truth for the classic XOR problem are declared: four pairs of binary inputs, each paired with its expected output. This setup allows us to test the model’s ability to learn a non-linear function.
The core logic begins with the instantiation of the NeuralNetwork object. A two-layer topology is then constructed: the first layer accepts the two input features and projects them onto three output neurons; the second layer receives those three intermediate activations and returns a single scalar value.
In both layers, the sigmoid activation function is used—chosen for its didactic simplicity and for the ease with which its gradient is computed during the backpropagation phase.
Here is the sigmoid function, along with its derivative—commonly used in neural networks for both activation and backpropagation:
where is Euler’s number (the base of natural logarithms), and
represents the real-valued input to the neuron.
This expression guarantees a continuous output between 0 and 1, with an inflection point at the origin that defines its characteristic “S” shape: for very large negative values, the function tends asymptotically toward 0, while for very large positive values, it approaches 1.
In the context of machine learning, the derivative of the sigmoid is often used during backpropagation. Its compact form is:
This latter relationship derives directly from the original definition and allows the gradient to be computed without the need for additional exponential functions, thus optimizing the weight update phase.
Continuing on, the train method initiates the actual training process, iterating over the same small dataset for five thousand epochs. With a learning rate set to 0.5 and a loss log printed every 500 iterations, the loop performs gradient descent on the mean squared error, updating the weights and biases of both layers at each observation.
At the end of training, a simple foreach loop iterates once more over the four input patterns, feeds them to the predict method, and prints the network’s numerical outputs to the screen—allowing for immediate comparison with the expected outputs and a quick evaluation of the model’s accuracy.
In a production context, this same logic could easily be encapsulated in a REST endpoint or a command-line script, but in this minimalist form it already provides a complete demonstration of how PHP can manage the entire lifecycle of a small neural network—from layer definition to final prediction.
The experiment demonstrates that, although PHP wasn’t designed for numerical computing, it is possible to implement a basic yet functional neural network, train it in reasonable time on small-scale problems, and deploy it in production without introducing an additional runtime.
Xavier initialization preserves signal stability from the very first epochs, the sigmoid ensures a well-defined gradient, and the fully transparent, non–black-box approach makes the model an excellent didactic tool: every weight, every bias, and every step of backpropagation is under full control.
It’s clear that this solution is not meant to compete with GPU-optimized frameworks—but when the goal is to integrate lightweight inference into an already PHP-based stack, or simply to gain a deep, hands-on understanding of how a neural network works, the presented implementation offers an elegant and accessible path.
In conclusion, the most interesting aspect of this article was not to build a new ChatGPT, but rather to foster awareness and learn the mathematical principles behind the construction of a simple neural network—line by line.
In my opinion, the true power of artificial intelligence lies in understanding the scientific principles on which it is based, rather than in the specific programming languages we use to implement it.