The Language Nightmares Are Programmed In


Okay, so here’s the thing: I’m sure I’ll get some hate for this article. So before anything else, let’s get one thing out of the way: Although it is objectively one of the worst strangest languages still in use, it is nevertheless my go-to when I need to build a quick-and-dirty proof-of-concept web app, because I have yet to find a language that allows for faster prototyping. Meaning, good tooling and infrastructure can – to some extent – remedy bad language design. Now let’s get into it.

*drumroll* And the 2022 award for the suckiest programming language goes to… PHP!

As far as my motivation goes, let’s just say that I recently had to deal with 1.2 million lines of legacy PHP code. Now more than ever I’m convinced, PHP is one of the strangest languages out there.

No joke. I have yet to see another (popular) language that matches PHP’s level of weirdness.

Arrays: These are not the lists you are looking for.

PHP is the only modern high-level language that I’m aware of, that doesn’t provide a primitive list type. There is something called “array”, which oddly enough doesn’t have anything to do with arrays in other languages. The first sentence of the respective page in the PHP manual aptly describes them as being “[…] actually an ordered map”.

To be able to use them as lists, if no keys are given, the values will be assigned incrementing integer keys starting from 0. When appending new values, the next index will be the highest integer key in array (past or present) +1.

$a = ["foo", "bar"];
$a[] = 0;
unset($a[2]);
$a[] = "foobar";

print_r($a);
// Array 
// (
//    [0] => foo
//    [1] => bar
//    [3] => foobar
// )
Code language: PHP (php)

This of course isn’t – as you would be forgiven to think – a fundamental design flaw, but is in fact very useful. Let’s for example assume, you are new to PHP but have experience with one of the many programming languages that count from 1 (like Lua, MATLAB, COBOL, …).

Constructing a function in PHP that changes an array from starting with index 0 to an array starting with index 1 takes what? Maybe 6 lines of code?

function array_lua(...$elements) {
   $a = [0];
   unset($a[0]);
   array_push($a, ...$elements);
   return $a;
}

$a = array_lua("foo", "bar");

print_r($a);
// Array
// (
//    [1] => foo
//    [2] => bar
// )
Code language: PHP (php)

Starting with PHP 8, there is also other cool and very totally useful stuff to do with arbitrary array keys, like constructing lists with negative indices.

$a = [-100 => 0];
unset($a[-100]);
$a[] = "foo";
$a[] = "bar";

print_r($a);
// Array
// (
//    [-99] => foo
//    [-98] => bar
// )

Code language: PHP (php)

Oh, did I mention that PHP does type juggling for array keys? Fun times…

$a = [
   1    => "a",
   "1"  => "b",
   1.5  => "c",
   true => "d",
];

print_r($a);
// Array
// (
//    [1] => d
// )
Code language: PHP (php)

(This example actually is also from the PHP manual.)

You are probably thinking: “That’s cool and all, but it’s not like anybody will actually do this. What does it matter if you can crazy stuff, if it behaves reasonable for any meaningful application?”

Well, I have bad news for you. Because there is no way for the interpreter to know if the programmer meant to use an array as a list or as a dictionary, almost all standard array functions will assume it is a map.

My favourite example is the array_filter() function. It removes all entries that don’t match the filter function, but does not change the keys. So iterating over the array afterwards may yield some surprises.

$a = range(0,3);
$a = array_filter($a, fn($e) => $e % 2 == 0);

print_r($a);
// Array
// (
//    [0] => 0
//    [2] => 2
// )

for ($i = 0; $i < count($a); $i++) {
   echo $a[$i];
}
// Warning: Undefined array key 1 in wtf.php on line 12
Code language: PHP (php)

The way to deal with this kind of insanity is to either use foreach() loops instead, or to use the array_values() function that extracts all values in order, discards the existing keys, and enumerates them into a new array.

EDIT: In a recent project I fell for this exact thing again, as I was trying to find the first value that matches a predicate.

$result = array_filter($array, $predicate)[0];
Code language: PHP (php)

I googled “how to get first value of array in php” and the results were just painful.

TL;DR:

  • array_values($array)[0]; is slow and ugly
  • array_pop(array_reverse($array)); is just madness
  • reset($array); changes the internal array pointer, and also doesn’t work with expressions
  • current($array); only works if the array is already reset (or a new array)

Overloading: It’s a kind of magic.

Let’s turn our flux capacitors on and go back in time. It’s the 11th of October 1994. Python version 1.1 was just released1Python history on Github. Some special new methods __setattr__() and __getattr__() were introduced, that can be used to trap the access to object members. They are the single best option if you are trying to build convoluted, unreadable, unmaintainable code. The Python folks nowadays call these abominations “magic methods” or more aptly “dunder methods”.

Almost ten years later the PHP guys were probably stoned one night, thought what Python did was a sane idea, and in version 4.2 they added a new “overloading” (WTF is that name?) extension2PHP version history3PHP museum version 4 download page – the beginning of PHP’s very own “magic methods”.

As expected, magic methods turned out to be doughnuts for dinner, as they not only confuse the heck out of any static analysis tool but also get in the way of following references and other advanced features in IDEs.

class A {
   public function __get($name) {
      return "foobar";
   }
}

class B extends A {
   private $foo = "bar";
}

$b = new B();
echo $b->foo; // prints "foobar"
Code language: PHP (php)

(Strictly speaking stuff like __construct(), __toString() and so on are also magic methods. But those are sensible, so for the sake of a funny rant, let’s ignore them. ^^)

As if this wasn’t already messy enough, PHP also supports dynamic properties. Meaning you can create new properties on the fly just by accessing them. This makes sense if you have a weak object type system with no access modifiers for encapsulation (and even then I would argue it’s a bad idea because typos can introduce very subtle bugs). The thing is though: PHP has a rather powerful (Java-inspired) class setup.

To remedy this inconsistency, PHP 8.2 deprecated dynamic properties, but because for some reason the whole world and their dog relies on this feature, they kept it for stdClass objects and sub-classes, and also added a new attribute #[AllowDynamicProperties] that enables them for the attributed class.

class A {}
class B extends stdClass {}
#[AllowDynamicProperties]
class C {}

$a = new A();
$a->foo = "bar"; // deprecation warning

$b = new B();
$b->foo = "bar"; // no warning

$c = new C();
$c->foo = "bar"; // no warning
Code language: PHP (php)

Language constructs: In space no one can hear you calling.

Let’s play a guessing game: Which of the following lines causes an error?

$a = [1,2,3,4,5];

array_walk($a, function($e) { print($e); });
array_walk($a, fn($e) => print($e));
array_walk($a, "printf");
array_walk($a, "print");
Code language: PHP (php)

Was your answer line 6? No? I’m not surprised. The reason is: print() – unlike printf() – is not a built-in function. It looks like a function, it feels like a function, it smells like a function, it mostly behaves like a function. Nevertheless it isn’t. It is what’s called a “language construct”.

var_dump(is_callable("printf"));  // bool(true)
var_dump(is_callable("print_r")); // bool(true)
var_dump(is_callable("print"));   // bool(false)
var_dump(is_callable("echo"));    // bool(false)
Code language: PHP (php)

The difference is that “language constructs” are directly handled in the interpreter. The respective functions don’t actually exist as PHP functions.

There is actually a fair amount of these: echo(), print(), include(), exit(), die(), require(), isset(), list(), each(), empty() to name a few.

Lambda expressions: Silence of the scope.

Despite all the hate that I give PHP, I actually think the scoping concepts are quite solid.

By default you can only write variables that are defined in your current (local) scope. So let’s say you are inside a function and you want to access a global variable: You can’t. This makes sure that you can’t accidentally (by means of a name collision or just a typo) change global state and potentially create bugs that are basically impossible to track down. If you are really sure that you want to touch a global variable you can declare inside your function, which one you intend to use.

$v = "foo";

function f() {
   $v = "bar";
}
function g() {
   global $v;
   $v = "bar";
}

f();
echo $v; // foo

g();
echo $v; // bar
Code language: PHP (php)

Interestingly, there is no way of doing this with non-local variables in nested functions. No sure why to be honest. Python – which is otherwise fairly similar in this regard – has another keyword nonlocal to do exactly that.

While PHP doesn’t have that for normal nested functions, you can however do a similar thing using anonymous functions.

function p() {
   $v = "foo";
   
   $f = (function() {
      $v = "bar";
   });
   $g = (function() use (&$v) {
      $v = "bar";
   });

   $f();
   echo $v; // "foo"

   $g();
   echo $v; // "bar"
}

p();
Code language: PHP (php)

Non-local variables captured with the use keyword then behave like call-by-value parameters. In order to modify non-local variables in an anonymous function they need to be declared as references (Note the (&$v) in line 7).

This is actually awesome if you are into functional programming as it makes them referentially transparent4ignoring I/O, call-by-reference and superglobals by default. And if they are not, they are at least declared as such.

(Fun fact: I’ve been calling anonymous functions “functions” all along while they’re actually objects of type Closure, which use calling magic to emulate how functions behave – see magic methods. This is probably also the technical reason why anonymous functions don’t see the parent scope by default.)


Everything was peaceful. Then, the firenation attacked.

Version 7.4 came along and brought arrow functions – syntactic sugar for anonymous functions. Except they aren’t at all, as they now do auto-capturing of the parent scope – just read-only, but still.

$v = "foobar";

(function() { print($v); })(); // warning: undefined variable
(fn() => print($v))();         // foobar
Code language: PHP (php)

At a talk by Rasmus Lerdorf (the inventor of PHP) at phpday 2019 for PHP’s 25th anniversary, he explains why he decided to use such an interesting approach to scoping and jokingly said “The first time a bug appears because of auto-capturing in arrow-functions I will kick Nikita [Popov, one of the developers who proposed arrow functions 5PHP RFC: Arrow Functions 2.0]“. (Youtube, Timestamp 19m03s; I would recommend watching the entire talk, it’s really interesting.)

EDIT: Apparently the Youtube video was removed, but I found another upload of the same video (timestamp 19m22s).

Superglobal: The mother of all scopes.

So now that we have established that the scoping concept of PHP is actually fine. Let’s look at something that again just completely destroys everything.

PHP has something called “superglobals”. Yes that’s the actual name6PHP Manual. These are global variables that don’t need to be declared in a function scope. Are you f*cking kidding me? Why do you even bother with these elaborate scoping rules that make everything explicit if you then have variables that are just available everywhere.

Now to be fair, there is only a hand-full of these superglobals ($_REQUEST, $_SERVER, $_SESSION, …, also $GLOBALS ironically) that are already populated when the script is started and have very specific purposes. You also can not add more of them. But you can modify them, and I have seen PHP code in production that actually abuses superglobals to handle global state without declaring it. In that particularly instance, $_ENV was used since that application didn’t need environment variables.

$_ENV = "foo";

function f() {
   $_ENV = "bar";
}
f();

print($_ENV); // bar
Code language: PHP (php)

(Interestingly, constants also have superglobal scope. But since they can not be changed, and also don’t use the $ sign and thus are not susceptible to confusions with local variables, this is not that big of a deal.)

Undefined behaviour: PHP is the new C.

PHP was originally designed to be a templating engine for C. This is evident even today, since a lot of built-ins are essentially just wrappers around libc functions.

In order to allow for potential platform-specific optimisations at build time, the C language specification actually leaves a lot of decisions up to the compiler. This is called “undefined behaviour”. An example would be, primitive integer types not having a fixed size. The int type for example is only guarantied to be at least 16 bits in size. float and double are also not defined but at least usually follow the IEEE 754 standard.

PHP, being so closely related to C, of course also does the same thing. Which is pretty bad as is, but additionaly in PHP float and double are the same thing. That’s because… ehm… honestly no idea. Surely not to make the type names make sense.

var_dump((double) 42); // float(42)
Code language: PHP (php)

Another interesting quirk is, that integer overflow, unlike in every other common programming language7Actually, now that I think about it: JavaScript behaves similarly when a number value goes over the safe-integer limit. But then again, JavaScript doesn’t have integer types at all, so I think it’s fair to say: That doesn’t count. 😛, instead of throwing an error or wrapping around to the minimal value, actually changes the type to float instead.

var_dump(PHP_INT_MAX);     // int(9223372036854775807)
var_dump(PHP_INT_MAX + 1); // float(9.223372036854776E+18)
var_dump(PHP_INT_MIN);     // int(-9223372036854775808)
var_dump(PHP_INT_MIN - 1); // float(-9.223372036854776E+18)
Code language: PHP (php)

Mixed types: Have ambiguity – will travel.

This one even bothered me back when I build my first big(-ish) PHP application.

PHP, like some other dynamically typed languages, supports mixed types (or type unions in PHP 8; this means a function can have more than one return type) and type-juggling (types can change on the fly depending on the context).

That’s already bad enough, but manageable. PHP though, for some bizarre reason, uses mixed types in the standard functions. Which causes a lot of problems for inexperienced developers.

My favourite example is the strpos($haystack, $needle) function. It returns the index of the $needle in $haystack, or false if it is not found. But because integers can be type-juggled into booleans (and vice versa), and because 0 is value-equivalent to false, you are actually forced to do a type-safe comparison (===)8Hot take: You should only very, very rarely need to use type-safe comparisons, since they imply that you don’t actually know what type of data you are dealing with..

function starts_with($haystack, $needle) {
   return strpos($haystack, $needle) == 0;
}

var_dump(starts_with("foobar", "baz")) // bool(true)Code language: PHP (php)

PHP considered harmful.

I have some more points, but those are honestly too small for a whole heading. So, tell you what: Let’s just do 10 more (minor) annoyances, and call it a day. ^^

1. In PHP < 8 equality checks between the number 0 and any string that can’t be interpreted as a number are true.

var_dump(0 == "Hello World");
// bool(true) in PHP 7.4
// bool(false) in PHP 8
Code language: PHP (php)

2. Variables, array keys, properties and most constants are case-sensitive, while keywords, language constructs, functions, methods and some constants are case-insensitive.

cLaSs bar {
	FUNction foo() {}
}

$bar = nEW BAR();
$bar->FOO(); // this is fine
$BAR->foo(); // undefined variable $BAR
Code language: PHP (php)

(I feel the urge to wash my hands after writing that…)

3. That’s even more weird for constants in PHP < 8, since when using the define() construct, you can actually choose, if you want it to be case-sensitive or not.

define("FOO", 42);

var_dump(defined("FOO")); // bool(true)
var_dump(defined("Foo")); // bool(false)

define("BAR", 42, true);

var_dump(defined("BAR")); // bool(true)
var_dump(defined("Bar")); // bool(true)
Code language: PHP (php)

4. There are two ways of declaring constants in PHP: Using the const keyword, and using the define() construct. The difference is that the first one does its magic at “compile” time, while the latter executes at runtime. In practice, when using define() your constant values can depend on information that is only available when the program is already running.

function f() {
   define("FOO", 42);
}
function f() {
   const BAR = 42; // syntax error
}
Code language: PHP (php)

(Fun fact: Constants in PHP may not contain arrays.)

5. This one is a bit historic but still a mindf*ck. Up until PHP 5.4 (or so), there used to be a config directive called register_global. When turned on, it basically puts all keys of most superglobals – meaning all form data values, query parameters, cookies, and so on – into global scope as variables.

If you are not sure why this is a horrible idea, take a look at the following code.

<?php
if ($password == "password") {
   $authenticated = true;
}
if (isset($authenticated) && $authenticated) {
   echo "authenticated";
}
Code language: PHP (php)

The $authenticated global can be set via a query parameter (e.g. ?authenticated=1) and thus bypass the password check.

6. You can actually construct an infinit recursion in PHP just by enabling a hidden flag in the count() function.

$a = [&$a];
echo count($a, 1);
Code language: PHP (php)

The second parameter of count() enabled recursive counting (COUNT_RECURSIVE), and since $a is self-recursive, PHP will abort the process.

(I know, this is actually a feature. But the fact that something as simple as getting the number of elements in an array can cause an endless recursion is just wild.)

7. PHP 8 removed the each() function, which can be used to interact with an array on a low level. It was removed precisely because it allows low level (aka implementation specific) access. Ironically though, other functions (like current(), next(), …) that allow even more in-depth array interaction are not even deprecated.

The following two lines do the exact same thing, but the first one no longer works.

while(list($k, $v) = each($array)) {}

for($v = current($array), $k = key($array); $k !== null; $v = next($array), $k = key($array)) {}

Code language: PHP (php)

Here is a snippet that emulates each() for PHP 8.

function each(&$array) {
   $key = key($array);
   if ($key === null) {
      return false;
   }
   $value = current($array);
   next($array);
   return [
      0 => $key,
      1 => $value,
      "key" => $key,
      "value" => $value,
   ];
}
Code language: PHP (php)

8. For some reason, array_map() and array_filter() use a different order of parameters.

$a = [1,2,3,4,5];
$a = array_map(fn($e) => $e * $e, $a);        // callback, array
$a = array_filter($a, fn($e) => $e % 2 == 0); // array, callback
array_walk($a, "printf");
Code language: PHP (php)

9. If you try to add a new element to an array that has an element with index PHP_MAX_INT, you will get the most unhelpful error message ever.

$a = [PHP_INT_MAX => "foo"];
$a[] = "bar";
// Error: Cannot add element to the array as the next element is already occupied
Code language: PHP (php)

10. PHP is basically a templating engine. A solid one at that. And for whatever reason there are still pointless projects like Twig out there, which are less powerful than PHP while being a lot slower.

Conclusion

That’s it. This article is long enough anyway. ^^

I hope you had fun reading it, or at the very least learned something.

Judging from the current pattern, my next blog post be released in 2025. You are welcome to leave topic suggestions in the comments down below. ^^

Stay tuned,

Sigma

,

Leave a Reply

Your email address will not be published. Required fields are marked *