Out of memory - Out of patience

“There’s been something wrong with our Go server for the past 2 years…”, my colleague says.

My mind shuts off in anticipation of what’s to come.

“We’ve had several devs investigate this issue from our backend lead to various members from other backend teams but still with no luck.”

What could this be?

“It’s a memory leak and it causes constant crashes. Our users get kicked off of the websockets each time and they can’t view any content for some time. Could you check this out?”

I accepted the challenge, committing to solving not only this leak, but all other Go service leaks.

Global State

My initial hunch was to check for global state across Go modules. To my surprise, this was a dead end. There has to be something growing across requests…

What did my co-worker mention about the service that involves opening and closing resources?

Websockets. Those damn websockets.

Each websocket connection opened up a new time.Ticker which is used in part to send out heartbeats to each connection.

Sounds fine, right?

One little problem: once you no longer need a time.Ticker, you MUST call the .Stop() method on it or else it will never be freed.

It’d be ideal if Go would reject programs at compile time if a struct is missing a respective method call to close a resource.

The memory increases a slower rate with my tweak but restarts still happen at least once a day.

I’m not done just yet…

Slices

Array slices seem pretty harmless. You create them, index them, create sub-slices and they work well.

Well, not always.

Behold this embarrassing code snippet:

nums := []*int{new(int), new(int)}

nums = nums[1:]

We create a slice of integer pointers. So what, nothing seems off.

We then decide to slice off the first pointer. The slice starts at index 1 now but the backing slice is still keeping the pointer at index 0 alive.

One way to remedy this is to nil-out the first location:

nums := []*int{new(int), new(int)}
nums[0] = nil

nums = nums[1:]

The service also kept expanding a slice like the above, but with pointers still referencing older structs that were no longer needed.

Once I fixed this, the memory leaks were gone and restarts were a thing of the past.

pprof the last hope

As an honorable mention, pprof is a tool that helps us check out memory dumps of our Go programs. I didn’t need to use it this time but it can really come in handy when you’re not sure where to look.

We can spin up our web server locally, hit endpoints or other seams of our program to the real world and see how and where memory is being allocated.

TLDR: Find Go memory leaks fast
  1. Check for global state across Go Modules
  2. Check for immortal time.Ticker’s
  3. Check for slices
  4. When in doubt, pull out pprof & test all endpoints and seams with the outside world to investigate the memory dump