Jump to content

Recommended Posts

Hi All, 

Got an issue with my 3x2 grids

everything launches fine and runs fine but after an hour or so I suddenly get red walls around whichever grid I'm in and can no longer travel out of the grid.

If I remote login to the server it still has everything running fine. if I reboot all grids except the one I'm in it still will not come good, but if I reboot all grids and Redis server/map server it will work again.

the only thing I can think of is the server is running windows 10 pro (what I had on hand) instead of an actual server install (downloading R2 now anyway) and some process is getting hung up somewhere. Years ago I had an ARK server on my old pc (win10) and would leave it running for friends to keep playing while I was at work but we found if my PC or the ARK wasn't active for a period of time it would bug out even though it was still running you couldn't join until it was rebooted.

So I'll be installing R2 today anyway but is there anything else I should be looking at? I'm hoping R2 will be more stable for the grids sitting dormant for long periods of time waiting for people to join. 

The server itself is an old Dell PowerEdge 2950 dual Xeon 5450 with 32gb ram using ATLAS Server Control, for the grids I'm running the hardware isn't stressed at all CPU and RAM float between 30-50% with people connected.

 

any assistance and ideas will be appreciated. 

Share this post


Link to post
Share on other sites

Transferred everything over to the server running win server 08 R2, fully updated OS and drivers, a clean install of the server files and launched the server.

Everything runs as it should but still getting the same issue.

when the server is sitting idle for a period of time and then I join, joining works fine but red walls all around and cannot travel until I restart the cluster.

Any ideas, as I am lost with this.

Share this post


Link to post
Share on other sites

I've seen something similar with my servers, but not to the extent you're seeing. If I log in near an edge, sometimes I see that the adjacent servers are red, but that disappears after a few seconds. If I'm starting the entire cluster and join, sometimes the red sticks around a little longer but I have never had to restart the cluster, or even individual servers.

I think the issue is your hardware is under powered. There are 3 things to keep in mind with your CPUs, Overall performance, and Per Thread performance, and Cache. All aren't that great compared to modern CPUs, even though you have 2. There are a lot of operations that don't parallelize well, especially in gaming. It would be great if the game server(s) could be parallelized evenly across all of your cores, but realistically don't count on it spreading out across. When a server boots, it pegs a core at 100%, there's some stuff on the side, but for the bulk of it doesn't parallelize.

https://www.cpubenchmark.net/cpu.php?cpu=Intel+Xeon+E5450+%40+3.00GHz&id=1236&cpuCount=2
Your chips were released in 2009 overall performance you're looking at 4179 per chip, or about 7499 overall. Their Single Thread performance was great for the time but lacking today. They score a 1266, most modern mid-range chips are double that. For comparison, my old 6600k has an overall score of 8061 and a per thread score of 2147. I'm currently running a Ryzen 3700x, overall 23840 and 2907 per thread.

Cache is also important. https://en.wikipedia.org/wiki/CPU_cache . If you don't know, CPUs have cache, which is like a faster form of memory. CPUs have different tiers of cache, Level 1, Level 2, and sometimes Level 3 and Level 4 these days. L1 is the fastest and smallest, L2 is larger and slower, but still faster than main memory. L3 is slower than L2 but still faster than main memory. They all mirror chunks of main memory. so everything in L3 is in main memory, everything in L2 is in L3, etc... Long story short, the CPU can only operate on stuff in cache, if it's not in L1, it looks in L2, if it's not there, L3, etc until it finally has to get it from main memory. This takes time so the more cache the better.

We can ignore the levels and just think about it as a single cache pool because the differences between in cache and not is what really matters. If we have to hit main memory it's orders of magnitude more of a performance hit than between L1 and L2. Each of your CPUs have 12 MB of cache. If you're running a lot of active processes you may run into a situation where the working set doesn't fit into cache, it's going to have to keep swapping out the contents of cache for each of the processes. This takes time, and can result in cache thrashing. https://en.wikipedia.org/wiki/Thrashing_(computer_science)

 

Knowing all of that, here is what I think is going on. When you log into the server, it checks the adjacent servers those 4 servers start processing. So you've got 5 servers trying to run at the same time. The processes are trying to run at the same time and fighting for cache. Instead of getting meaningful work done, it's thrashing. It spends more time swapping memory in and out of cache than it does processing the actual work that needs to be done.

I did some experimentation on my server and when I joined a server after not being on for 8 hours, after I joined I did see several other of the servers spike in CPU usage for about a second. Also when I was working on setting everything up a few weeks ago I ran into a thrashing issue (disk, not cpu) when I tried to start up too many of the servers at the same time. Everything ground to a halt.

 

Next time this happens, open up Resource Monitor and take a look. Are the server instances (ShooterGameServer.exe) pegged at 100%. Also check out the disk too. It could also be an issue with all the servers trying to hit the storage at the same time and thrashing on that. If you really want to dig in read this over https://software.intel.com/en-us/articles/intel-performance-counter-monitor/

 

If you want to experiment, have only 2 servers running, let it sit for a while, then log into 1 and see if you still have the issue with the edge that connects to the second server... then do it with 3, 4, 5, and then 6.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×