Concurrent map with a slice in golang

I’ve been trying to sort out a concurrency problem after one of the devs working in this area left a couple of months ago but I’m lost on an appropriate way to solve this.

For context, we load a customers data into a structure like:

[ Key ] -> { Value }

[customer-specific-hash] -> {Slice of data points/files}

Example – really badly formatted sorry:

[a60d849ad97bfb833e1096941] 
-> 
{ 
 { StartDate: '01-02-2022', EndDate: '28-02-2022', DataFrames: [1598,921578,12981,21749,192578...]},
 { StartDate: '01-03-2022', EndDate: '28-03-2022', DataFrames: [1234,1567,6781,126978...]},
}

The above is because we have 100,000’s of customers and there is a process that kicks off every night that consolidates there data based on the hashes (or a bucket really) per customer. Before processing the data frames, we go through the slice and “merge” the dataframes into a big DataFrame with lots of legal/accounting rules around it.

This runs within goroutines to index all the datapoints as fast as possible.

So the implementation is essentially a sync.Map[string, []DataFrame] but I noticed whilst the map operations are guarded, the appending to the dataframe slices is not. There could be about 20-30 file references in that slice every night for each hash.

There is every chance that the past 2 years, customer data would have been incorrectly merged & I’ve been tasked to fix it. Pre sync.map they had used a RWMutex with a Map – again, but not the slice and it points to this article as a guide.

Firstly, is the idea of a Map that contains a slice the appropriate data structure?

I’ve tried to create a RWMutex based slice handler but wondered if the Map could have a chan DataFrame instead to throw into when indexing a customers files, then once done, the second step of consolidating it into an array (as the len(chanx)) would be known instead?

I am coming from primarily Java, so I may have some terms confused so I apolgise.

  • 1

    Looks like Burak Sendar has graced you with a great answer. He’s one of the pros around here. If you want to up your Go concurrency game, read go.dev/blog/codelab-share “don’t communicate by sharing state; share state by communicating” (the channel based approach). I try to avoid locks whenever possible; they’re tricky to get right and omissions like this are all to easy.

    – 

  • thank-you @erik258 that’s a great share and so succint in explanation 😀 I did wonder whether a channel approach would be easier.

    – 

You have two separate problems:

  1. Concurrency issues while updating the map
  2. Concurrency issues while updating an entry of the map

sync.Map will protect against 1, but not 2.

One way to deal with this problem is to have:

sync.Map[string, *DFrame]

where

type DFrame struct {
  sync.RWMutex 
  Data []DataFrame
}

Once to get an entry from the map, you should Lock or RLock it, and then work with the data. This is not just limited to the appending of the slice. You have to RLock the struct even if you are only reading from the data frames.

So if you are appending a new dataframe:

df := &DFrame{}
entry,_:=m.LoadOrStore(key, df)
dfEntry:=entry.(*DFrame)
dfEntry.Lock()
dfEntry.Data=append(dfEntry.Data, newDataFrame)
dfEntry.Unlock()

Leave a Comment