Traffic Is Expensive - Shorter JSON Keys Really Do Help
Traffic costs are a lot more expensive than most people think.
This site was originally something I made just to play with friends. I never designed it around the assumption that large numbers of people would use it. The data-transfer structure was also one where costs would rise pretty steeply as the number of users increased, but since I expected maybe three or four players at a time, ten at most, it felt like a perfectly reasonable tradeoff.
And honestly, that was how I thought it would be used. A few games with friends, and the server quietly sitting idle the rest of the time. Just a hobby project.
But that expectation was wrong almost immediately.
Since the server was idle outside the times my friends were playing, I posted the game link on a few sites just in case anyone else wanted to try it. A few days later, concurrent users passed 1,000. Unfortunately, that happened on a day when I was out in meetings all day. When I first got an alert in the morning that concurrency had exceeded 1,000, I assumed it was just a brief spike. I thought it would probably flare up once and disappear.
But by the afternoon, the number had doubled.
That was when the server started slowing down, and traffic costs began to surge along with it. At the time, it was still running on infrastructure I had been using casually for a hobby. I had to move to larger servers in a hurry. The server instance cost alone exceeded what I normally spend on hobbies in a month, and traffic cost even more. It was hard to sustain for a site with no ads at all. Unless I made it much more efficient, it felt impossible to keep operating.
Now the system is deployed on ECS and split into several microservices. Each service has schedule-based autoscaling. But even that turns out to be trickier than it sounds in practice. Traffic on my site does not rise in a nice smooth curve. It spikes suddenly. If you rely only on autoscaling, there are times when you are already too late. In the end, you still have to keep a fairly generous buffer running.
I changed the architecture, upgraded the servers, and added autoscaling, but the problem did not end there. The volume of game-data traffic generated every day by tens of thousands of users was much larger than I expected. And that traffic was far more expensive than I expected.
I used to work on systems that processed real-time financial data. I handled data streams arriving through WebSocket or FIX at the level of tens or hundreds of microseconds. Back then, I was on the client side, receiving and processing ultra-high-speed streams. This time it was the exact opposite. This time I was the one sending the data from the server side.
In that previous work, the data format was optimized very aggressively. Key names were made as short as possible, and one-letter keys were common. Sometimes even t and T were used as different keys with different meanings. At the time, I honestly wondered why anyone would go that far. It was harder to read, and from the perspective of someone seeing it for the first time, it just seemed confusing.
But once I became the person actually paying the traffic bill, I understood the reason very quickly.
If you ask communities or LLMs about this kind of thing, the answer is usually "premature optimization." And in many cases that is true. Adding complexity before you have even confirmed a bottleneck is usually not a good choice. But shortening JSON key names felt different for the kind of service I run. The risk is very low, and the effect is pretty clear.
In development, people often say things like "it's only 5%." It is easy to dismiss a 5% improvement as meaningless. But if you think about cutting manufacturing cost by 5%, that is not a small number at all. Operating a service is the same. And for costs like traffic, which accumulate every single day, it matters even more.
And when I actually tried it, the result was bigger than I expected. Just shortening key names reduced traffic volume by more than 20%. Even I did not expect it to be that noticeable.
Of course, there is no free lunch. It can be less intuitive for humans to read, and there were definitely moments when debugging became more confusing. It is true that t is less friendly than timestamp. But when you see the operating cost go down in a visible way, that inconvenience feels entirely acceptable, at least in my situation.
What is interesting is that the optimization I used to do and the optimization I do now are almost complete opposites.
Back then, I would accept more cost and inefficiency to save 10 microseconds. I would disable hyper-threading and use only half the cores, or add padding around data structures to align cache lines. That was because speed was the most important problem.
Now I think in the opposite direction. Ideally, I would love for players to receive each other's state instantly while playing together. But in real operation, I end up experimenting with questions like, "How much delay can I introduce before users actually start to feel uncomfortable?" The fastest possible answer is not always the right one, and neither is the prettiest structure. In some situations, network cost matters more than human readability.
In the end, there is no silver bullet. There are only methods that fit the situation. In the past, it made sense to spend more money to save 10 microseconds. Now it makes sense to sacrifice a bit of readability to save a few bytes. That, to me, is part of what makes development so interesting.
All posts