What’s Up With Scoping?


Disclaimer: This is a rant. If you are not interested in my ramblings, there’s probably something more fun to read.

*sigh* Okay, so I have this thing with programming languages: Inconsistencies and stupid language design, they… how shall I put this? They bother me. One thing in particular that gets my gears turning is idiotic scoping rules. My friends are already kind of fed up with my nitpicking. So instead, I decided to vent my frustrations here.

Please buckle your seat belts and keep your arms inside the vehicle at all times. Let’s go!

Java: Effectively Final

The first one on the list is non-local variables in Java. This one upsets me especially, since I really like the design of the language.

So what are non-local variables in Java? Basically whenever a variable from a method scope is referenced in a non-static nested class (also called “instance inner class”) that variable is non-local for the inner class. This also applies to lambda expressions since they are just syntactic sugar for anonymous classes.

public class Test {
   public static void main() {
      int f = 1;
      f = 2;
      List<Integer> list = Arrays.asList(0, 1, 2, 42, 69);
      list.map(i -> i * f).forEach(System.out::println);
   }
}
Code language: Java (java)

The reference to f in the code above is non-local for the lambda expression. These variable references are only allowed to be final or “effectively final”. The latter means that the variable is only written once. So the above example would not compile because of line 4. This sucks in so many ways.

First of all, the rationale behind that limitation is all but obscure: Local variables are usually stored on the stack. This is a problem for first class functions since they don’t necessarily inhabit an adjacent stackframe. Java’s solution is to copy the value of non-local variables onto the heap, the inner class then can reference the copy instead (more details).

Secondly, this is especially unpleasant for primitives. Since objects are always referenced, you can mutate the object state without changing the reference itself. That’s, however, not the case with primitives. You can’t even use primitive wrappers since they are immutable. When you want to mutate a primitive, you have to create your own wrapper class (Optionals don’t work either since they too are immutable). And even if you don’t want to update a primitive value, you might still have to do stuff like int _f = f; to create an effectively final copy of a running index or similar.

JavaScript: Hoisting

Oh, JavaScript is always fun.

Let’s play a game: What do you think is the final value of i?

var i = 1;
(function(){
   i = 2;
})();
(function(){
   i = 3;
   var i;
})();
Code language: JavaScript (javascript)

The correct answer is 2. Did you get it? No? I’m not surprised. I mean: Who in their right mind would think of something like this?

When a variable is used, the JS interpreter first checks the local scope. If there is no match, the non-local (i.e. parent) scopes are checked. JavaScript also has something called “hoisting”. The idea is that all variables in a function have to be declared first. So when the interpreter sees line 7 it’s going to silently take the local declaration of i and put it at the top of the function. Therefore, the i in line 6 is actually local and the assignment does not affect the global i.

As a side note: This does only apply to variables declared using the var keyword. The newer let and const keywords behave differently and would just cause a “variable used before its declaration” error.

JavaScript: Undeclared Globals

This next thing is just insane: When you use a variable that is not declared in a function or its parents, JavaScript assumes that it is global. But here is the catch: That’s also the case if the name does not exist in the global scope.

(function(){
   foobar = 42;
})();
console.log(foobar);
Code language: JavaScript (javascript)

After the function call, the name foobar is a global variable – which in JS means: It’s an attribute of the window object. Just: Why?!

C/C++: Exports Without Headers

C/C++ is actually very reasonable with regards to scoping. But one detail that is very unintuitive these days, is that exports are completely independent of header files. Let me explain:

So, in most modern languages, there exists a concept of import, require, include, or similar, to add library code into the file or project, right?. C/C++ also has a preprocessor macro #include that behaves similarly. Except it doesn’t. The addition of the library code is done by the linker. All symbols that are exported by the library can be used without any includes at all. All the header files do, is provide function prototypes, declarations for external variables, useful macros for the library, and so on. But you could in theory do all this stuff yourself. The following is a completely valid C program:

int puts(const char*);
int main() {
        puts("Hello World!");
        return 0;
}
Code language: C++ (cpp)

One interesting aspect of this is that in C/C++ functions are exported by default. If you don’t want to export a function (because it should be private for the file, or you just don’t want to pollute the global namespace) you have to explicitly declare that using the static keyword.

I really want to give C/C++ a pass on this because that’s just how ABIs and ELF work. Still, this doesn’t feel very modern to me.

Python: Assignments to Globals

Although I’m not a huge Python fan, I can’t deny that it has a very logical scoping mechanism: You can read everything from non-local or global scope but if you want to write to it, you need to declare that.

i = 42
def test():
   print(i)
test() # prints 42

def test():
   global i
   i = 43
test()
print(i) # prints 43Code language: Python (python)

They kinda ripped off PHP, but that’s okay since it’s a very good and logical approach for a language that doesn’t have variable declarations – at least in my opinion.

However, that means that similar to what we saw with var in JS, the interpreter can only determine whether a variable is local or not after the function has been completely analyzed. But, in contrast to JavaScript it’s not nearly as obvious why the execution fails:

i = 42
def test():
   print(i)
   # a lot of code
   i = 43
   # a lot of code
test()
Code language: Python (python)

Now line 3 will cause an UnboundLocalError even though the problem is at a completely different location.

PHP: Super-Globals

Similar to Python, in PHP you have to declare whenever you are using a global or non-local variable: That’s great.

But because forcing the declaration of non-local variables was too good of an idea, PHP also has a concept called super-globals – I swear, I’m not making this up. These are globals that don’t have to be declared in a function scope. I want to cry…

$g = 42;
function test() {
   global $g;
   echo $g; // regular global
   echo $_SERVER['SCRIPT_NAME']; // super global
}Code language: PHP (php)

It’s not just $_SERVER either. There are quite a few of these super-globals in PHP:

  • $GLOBALS
  • $_SERVER
  • $_GET
  • $_POST
  • $_FILES
  • $_COOKIE
  • $_SESSION
  • $_REQUEST
  • $_ENV

ShellScript: Write Access from Subshell Context

Now before you complain: I’m aware that this wasn’t really a design decision. It’s just how the process model works. Still, I think it’s a relevant pitfall that should be discussed.

Basically, if you open a subshell in shell script all parent variables are copied to your environment. And there is no way to propagate changes to those variables upward to the parent process – at least no obvious one (pipes or shared memory would work).

i=0
( i=42 )
echo $i # i is still 0
Code language: Bash (bash)

Since I’m sure someone will say that this example is completely arbitrary, and that no one would write something like this, let’s use a more practical problem: Given a list of integer equations, we want to count how many of them are correct.

i=0
cat equations.txt | while read line; do
   lhs="$(echo "$line" | cut -d= -f1 | bc)"
   rhs="$(echo "$line" | cut -d= -f2 | bc)"
   if [ "$lhs" -eq "$rhs" ]; then
      ((i++))
   fi
done
echo $i
Code language: Bash (bash)

I know this is not the best way of writing this in the first place but still, it proves my point: It can be very difficult to find these errors. Especially in the given example since the subshell is implicit.

Conclusion

Most languages suck in one way or another. I’m sure I’ll find more stuff like this in the future. So my plan, for now, is to just come back later and add things to this post.

If nothing else, I hope my rant was a bit interesting at the very least. ^^

Have a nice day,
Sigma.


Leave a Reply

Your email address will not be published. Required fields are marked *