If you remember the recent missile optimizations, you’ll remember CCP Veritas as well. Here’s more …
CCP has always chosen the more challenging path in how we design games; one world-wide single shard server proves that. As we’ve grown and the number of logins has risen we’ve gone through iteration in fighting lag. The War on Lag is never ending, because EVE Online is always growing. But we have some interesting things to show you this week. This is the first in a series of three blogs that will show you how we fight lag through hardware upgrades, software fixes, and even wrestling the internet itself…
Welcome back. Just want to take some of your time here to brag a bit about how these changes went down. If you’re not familiar with what I’m talking about, go read up on my previous blog. Today I’m going to talk a bit about how the deployment went and the impacts. By the end of reading this, you should see exactly why we’re bumping the Jita population cap up by large quantities.
General Missile Optimization
The direct missile optimizations were deployed to Tranquility on February 8th with the Incursion 1.1.2 hotfix 7 in a partially dormant state. In the days following, we tested the more dangerous aspects of the change on a select handful of systems then switched them on cluster-wide. There appear to have been no issues, so hurray for that!
As for impact, I unfortunately can’t point to any metric and say “this is because of the missile work”. We’ve got a cluster-wide CPU per user metric that’s good at telling us when we’ve done something good in general, but it’s very noisy depending on what the player population happens to be doing at the time, so it takes many weeks of data to be able to establish a reasonable trendline. In the case of this change, there simply wasn’t enough data before and after without other complicating factors to be able to say what the impact was.
I hoped I could capture load from popular missioning systems, which tend to be more consistent in load than most and also involve lots of missiles, but systems like that do not have a consistent mapping on the cluster. What I mean by that is that on any given day, the systems around any given mission hub could be grouped together in many sorts of ways, with any number of other systems thrown into a mix. Looking at the CPU usage of such nodes is going to be extremely noisy depending on what other solar systems got introduced into the mix.
So, unfortunately, I have no concrete data to show you for a bulk of the changes in this devblog. I’m fairly confident things are better though, and as we progress on in profiling the server I expect we’ll see less missile involvement than we have in the past.
This guy took much longer to get out to Tranquility. Python’s typing method got in the way a bit…let me explain that briefly for the uninitiated.
In what we call “strongly typed” programming languages, when you declare a variable – a bit of memory you’re going to use to store data – you must say exactly what type of data it is. When, in C, you say “int x;” you are telling the computer “Hey, go and get me some memory, call it ‘x’, and treat that bit of memory like it is an integer.” If you then go and try to treat x as if it were a block of text, you’re going to get errors thrown into your face well before you even run your program, since the computer knows that x is an integer and trying to treat 5 as is if were “Hello, my name is Bob” will not end well for you.
In Python though, variables are “duck-typed”. What that means is that a variable’s type is meaningless – it’s much more important what it can do. If both integers and blocks of text have a “print” function, you can call “x.print()” regardless of what type x happens to be – its type is irrelevant to the conversation. This gives tremendous flexibility in extending data structures and swapping them out – so long as the interface remains compatible with how the variables are used, you can go crazy. If it looks like a duck, smells like a duck and quacks like a duck, it may as well be a duck, even if it happens to be a SuperCustomDuckThatIsActuallyAPlatypus.
So, in this change, I pulled out the carpet on all of the inventory variables in the entire EVE Online codebase and changed them from a list-derivative class to being sets. The motivation for this change is discussed in the previous devblog a bit, so I won’t go into it here except to say that if you’re interested in more details, we’re talking about switching from something like a dynamic array to a hash table. Upshot being that searching and removing things from inventories is expected to be considerably faster than before, with increasing win as inventories grow. We have some truly huge inventories – think all of the items in a solar system, or all of the items anyone has put into a specific station’s hanger over the course of the day.
Many of the interfaces to set are the same as list, but a few important ones are not. Unfortunately, since types can change on the fly during the execution of a program in Python, there’s no way to know you’ve caught all of the usages without actually running everything. Running everything in Eve is a rather tall order – a full regression test for EVE Online takes a sizeable team of QA many days to do, and even then it’s likely something slips through the cracks, there’s just so much you can do in this game.
Thankfully, the guys working on inventory coreification happened to be coming up on a deployment, and as such had a full regression test on order already, focused especially on inventory-related operations. After asking very nicely, they were kind enough to let me piggy-back this change on theirs so we could share the QA resources.
Good thing we did that too – there were a significant number of cases that I had missed in my couple days of testing that were only brought to light by having the change go through the full regression and also simmer on Singularity for a few weeks. The last fix for these went in on March 2nd and everything was deployed to Tranquility on March 8th with Incursion 1.3.
Similar to above, the cluster-wide metric feels like it’s better, but there really just isn’t enough data yet to say it’s a statistically significant shift. There’s likely to be more changes put out before we have enough data to make such a claim. Thankfully though, there is one solar system that is consistently isolated on the cluster which does a whole mess of inventory work – Jita. And boy is it happier now:
That’s data from the week prior to deployment plotted against data from the week after, lining like-days up together. As we had hoped, the impact is huge on Jita, especially as the run goes on and the inventory structures get bigger and bigger. Naturally this load reduction has given us a whole lot more headroom in Jita, so the cap is on its way up – hopefully higher than y’all need it to be.
So, that closes the book on this round of optimizations. Took some profiling data from a rather large fight last week, which has given us a few different routes to travel down…more on those as they develop.
Look for the next blog in this series later this week, and be sure to check the stream from Fanfest for even more detail.