HELO, Sigma here.
Trying something clickbaity this time. I’m curious if it works. 😛
So, email being the absolute f*cking worst technology ever conceived, is pretty well established by this point. A German friend of mine who works in IT once said:
Als wäre der Standard nicht schon problematisch genug, wird damit auch noch wie Sau umgegangen.
As if the standard wasn’t already problematic enough, it’s also being treated like a bastard.@Stingshot
Still, I decided to make a listicle anyway, purely because it’s fun to complain about things. ^^
I’ll start with the more obvious, practical reasons and go into more technical details later. That way this should stay somewhat interesting even for non-IT people.
For non-techies this is probably the most annoying thing about email. Despite what some people say, spam is not a solved problem. Not by a long shot.
Because of the sheer endless stream of junk emails, using spam filters is pretty much necessary. Which has two disadvantages: False-negatives are a problem because of the lower spam-noise-floor the (wrongly) approved mails look a lot more legit. And I don’t think I need to explain why false-positive matches are annoying.
These problems are inherent with the type of spam filter we are using. There are basically two types of filters: Policy based filters and content based filters.
Policy based methods includes things like DKIM and SPF checks (urgh, we will talk about these later), but also stuff like IP reputation. The latter is a major pain in the backside for hosting providers, but also their customers, because dealing with blacklists is annoying and time consuming.
Content based filters are very diverse, from fuzzy hashes, over Bayes classifiers to dark magic (aka neural networks). These are very much not trivial to set up and maintain, and usually require a lot of training data. But even an abundance of examples does not necessitate a working spam filter. Even a big email provider like Gmail – which has one of the better spam filters out there – has false-positives sometimes.
1. Security & Privacy
Don’t get me wrong: (Todays) email is not the most insecure method of communication. Most of the time, decentralization is good for privacy because you can choose a provider that you trust. Also pretty much all mail traffic has (or should have… hopefully) some sort of transport encryption nowadays.
However, there is no good solution for E2E (end-to-end) encryption. The only encryption standard for email that has got any kind of measurable traction is PGP, and it sucks too.
You might ask: Is end-to-end encryption really such a big deal? Yes! Yes, it is! There is an important concept in cyber security: Minimize trust, maximize trustworthiness. By encrypting emails in a way that only the recipient can read them, you eliminate the need to fully trust your email provider. Also, in case the recipient uses a different mail server (which they could, since email is decentralized) you don’t need to trust that either.
So what is so bad about PGP? Let me digress a bit: In my view the biggest threat to privacy at the moment are not hackers that specifically target you, but corporations and governments that want to analyse everybody. The best way of blinding these entities is to (end-to-end) encrypt as much of our communication as possible. The more people do this, the better. From this we can derive: Encryption needs to be as accessible as possible, so even normies can use it.
That being said: PGP’s usability is abysmal. Partly because the user has to understand complex concepts like web-of-trust, basics of asymmetric cryptography, the difference between encryption and signature, … And partly because of missing tooling – there are hardly any good UIs, or mechanisms to share keys between devices.
And the thing is: E2E encryption doesn’t need to be so complicated. Signal, Matrix – heck, even Telegram, if you use private chats – prove, that encryption can be accessible to anybody if you are willing to abstract the details out of sight. Of course it’s less secure than a proper out-of-band key-exchange with a web-of-trust, but hardly anyone would use that. Therefor, in this case less secure but still encrypted is better.
2. ASCII Only
Email is ASCII only – I mean: Of course it is. Because there just aren’t any other real languages besides English.
Okay, but seriously: Email is very old. 1970s old. SMTP is old too (RFC 821). Things like ASCII-only mail content are evidence of that.
The standard RFC 1562 (8bit-MIMEtransport) does allow for characters outside of the ASCII range. There is also RFC 6532 (“International Email”) that allows the use of “unencoded” UTF-8 characters in mail headers like subject. This is, however, not widely adopted.
The problem with these solutions is that, if any mail server in the chain does not support the required extensions, the message is not going to be delivered.
The way to get around that, is using some sort of transformative Content-Transfer-Encoding, like Quoted-Printable or Base64, as defined by RFC 2045 (MIME). Gmail does this for non-ASCII content by default.
This is somewhat related to the previous point. Let’s consider an example:
We want to send an image via email. But email don’t support binary data, so we encode it using Base64.
We care for privacy, so we encrypt the content using PGP. The result is binary data, so we encode it using Base64.
We send our email to the SMTP server, and since we care about security, we use opportunistic TLS (= STARTTLS). The result is binary data. Guess what.
Base64 inflates the content by about 36 % (including line feeds), so after the last encode, our image has 1.36³ ≈ 2.5 times it’s original size. That’s just insane.
4. Size Limitations
The data overhead wouldn’t be that big of a deal if the maximum data in an email isn’t limited. The current SMTP standard from 2008 (RFC 5321) says that at least 64 KiB of content (including headers) need to be accepted.
There are even more size restrictions, the most notable one being “line length”. The original SMTP standard (RFC 821) from 1982 defines 998 characters as the minimum that needs to be accepted. However, RFC 5322 specifies that no line should exceed 78 characters.
This last point is particularly annoying because some mail servers think it’s their responsibility to insert newlines into the body. It’s hardly a problem for plain text emails, but counter-intuitively for HTML content this is really bad, since the newlines might end up in the middle of words (which will be rendered as a white-space) or inside of tags.
It should be noted, that these limitations are the bare minimum that any mail server needs to accept. The SMTP “SIZE” extention (RFC 1870) can be implemented to tell the client what the actual limits are. But as with all extensions: The weakest link defines the chain. So if your mail relay does not allow you to send your holiday photos, you just can’t send them, even if your mail provider and the recipients mail provider support it.
5. Address Validation
Validating email addresses is notoriously hard. Here are just a few valid email addresses (according to RFC 5322):
- “@”@[foo bar]
Using the obsolete syntax, email addresses could potentially even contain ASCII control characters – wild.
(Funny enough, foo..firstname.lastname@example.org is not a valid email address.)
While I think, the grammar for addresses is regular (don’t quote me on that), it’s very unpractical to construct a regex for email addresses (more information). Which means that most websites that want to do email address validation, will use a simplified regex, like the following, that might not accept all valid addresses or might accept invalid addresses.
[a-zA-Z0-9-_.]+@[a-zA-Z0-9-]+([.][a-zA-Z0-9-]+)+Code language: CSS (css)
6. Arbitrary Sender
Email is meant as a digital replacement of conventional snail mail. Security was more of an afterthought.
With that in mind: The FROM header of an email is completely arbitrary. I could without any problems send a message using the sender address [insert-current-president]@whitehouse.gov
This is completely bonkers. It even has a name: mail-spoofing
There have been many ideas and standards on how to fix this. The easiest way is to check whether the server, that email is coming from, is actually the MX-record of the sender domain. This, of course, breaks down if the incoming and outgoing mail servers are not the same, or if the mail servers are distributed and have multiple location-specific addresses.
The next idea is to use SPF, which is a TXT-record, that contains a list of servers that are allowed (or disallowed) to use that domain for sending emails. This approach fixes distributed mail environments, but ruins classical mailing lists and mail forwarding, since the email is coming from a domain the mail server is not authorized to send mails from. (BTW: In case any of my readers are into mail stuff, can someone please explain to me why the f*ck “softfail” exists? I just don’t get it.)
DKIM seeks to fix this issues by signing emails (including headers). The public key for the signature is put into a TXT-record. This solves the mail forwarding problem, but doesn’t work for mailing lists either, since they usually inject content, which invalidates the signature.
DMARC uses either SPF or DKIM or both, but also adds alignment checks (of the FROM header). For DKIM this entails that at least one signature domain matches the sender domain. Still doesn’t address mailing lists.
A relatively new addition to the ABC-family of workarounds for arbitrary FROM headers is ARC, which finally addresses mailing lists. It’s solution is to just trust the intermediate server… Wait what? Who thought, that’s a good idea?
7. Multiple Different Senders
There is more then one way of specifying who the sender of an email is. That’s because… ehm… I… honestly have no idea.
Envelope FROM (also called “RFC5321.MAIL FROM” because it concerns SMTP) is part of the mail envelope object – hence the name – and is mainly used for Non-Delivery Reports (bounce messages), although some mail servers use the header FROM instead. Notably, the envelope FROM is used in SPF. For a recipient, this address is usually available through the “Return-Path” mail header.
Header FROM (aka “RFC5322.FROM”, because it is part of the Internet Message Format) is a mail header. Email clients will only display the FROM header as sender. Also interesting: DMARC uses this address for alignment validation.
Reply-To is also a mail header, that should be used by mail user agents for replies. This one actually makes kinda sense. Possible applications are mailing lists for example.
8. SMTP Status Codes
“SMTP suffers some scars from history, most notably the unfortunate damage to the reply code extension mechanism by uncontrolled use.” – RFC 3463
The mere fact, the “Enhanced Mail System Status Codes” standard verbatim contains that sentence, speaks volumes.
The status code system in SMTP is actually quite reasonable. Similar to HTTP the basic status codes have the form of a three digit number. The first digit is the status class (yay or nay), the second is the reply subject/category, and third one encodes details. Originally (RFC 821) there were only 21 reply codes. But because of delivery status notifications the codes (most notably 354 – “start mail input”) were overloaded with several different error conditions.
To fix this problem, enhanced status codes were introduced. Now, a dot splits the three semantic positions, and the subject and detail position might use up to three digits.
It’s probably just me, but I think, the moment your status codes look similar to IPv4 addresses, you should maybe reconsider your life choices. It’s especially crazy when you realize that you may only use enhanced status codes if you also provide a basic status code.
554 5.3.4 Message too big for systemCode language: CSS (css)
I already touched on this with previous point. Basically, using MIME (RFC 2045) you can specify what type of data is present and how the message body is encoded. It is important that the Content-Transfer-Encoding must always fit the transmitted data, otherwise the target mail server, or even the spam filter might reject the email.
The default encoding it “7bit”. So if you are sending mail containing UTF-8 characters, but they get rejected from the spam filter, this might be the reason. I ran into this exact problem once when I wrote a newsletter system.
“8bit” is also an identity encoding, but it specifies that the characters in the body are allowed to use the full 8 bits – which is necessary for non-ASCII characters. I’m not exactly sure why this is not the default yet.
“binary” is interesting. It too is an identity encoding, but because the SMTP standard just doesn’t allow for unencoded binary data, it can not be used in practice. The RFC even says – and I quote: “Thus there are no circumstances in which the ‘binary’ Content-Transfer-Encoding is actually valid in Internet mail.”
“base64” and “quoted-printable” can both be used to transform the content into a character range that SMTP can deal with. But, as already discussed, they have overhead with regards to the encoded data.
Email is one of those topics that I try to avoid as much as possible. Every time I need to work with it on a low level, I’m amazed how something this broken can work at all.
Still, despite it’s ginormous legacy, email is (probably) one of the most used digital communication media out there today. And while there have been good ideas for replacements (Stuff like Google/Apache Wave comes to mind. I really liked it – a pity.), none have really taken off. So, I guess it’s fair to say, we will be stuck with email for the foreseeable future.