Concurrent map with a slice in golang

Question 1

I’ve been trying to sort out a concurrency problem after one of the devs working in this area left a couple of months ago but I’m lost on an appropriate way to solve this.

For context, we load a customers data into a structure like:

[ Key ] -> { Value }

[customer-specific-hash] -> {Slice of data points/files}

Example – really badly formatted sorry:

[a60d849ad97bfb833e1096941] 
-> 
{ 
 { StartDate: '01-02-2022', EndDate: '28-02-2022', DataFrames: [1598,921578,12981,21749,192578...]},
 { StartDate: '01-03-2022', EndDate: '28-03-2022', DataFrames: [1234,1567,6781,126978...]},
}

The above is because we have 100,000’s of customers and there is a process that kicks off every night that consolidates there data based on the hashes (or a bucket really) per customer. Before processing the data frames, we go through the slice and “merge” the dataframes into a big DataFrame with lots of legal/accounting rules around it.

This runs within goroutines to index all the datapoints as fast as possible.

So the implementation is essentially a sync.Map[string, []DataFrame] but I noticed whilst the map operations are guarded, the appending to the dataframe slices is not. There could be about 20-30 file references in that slice every night for each hash.

There is every chance that the past 2 years, customer data would have been incorrectly merged & I’ve been tasked to fix it. Pre sync.map they had used a RWMutex with a Map – again, but not the slice and it points to this article as a guide.

Firstly, is the idea of a Map that contains a slice the appropriate data structure?

I’ve tried to create a RWMutex based slice handler but wondered if the Map could have a chan DataFrame instead to throw into when indexing a customers files, then once done, the second step of consolidating it into an array (as the len(chanx)) would be known instead?

I am coming from primarily Java, so I may have some terms confused so I apolgise.

Question 2

You have two separate problems:

Concurrency issues while updating the map
Concurrency issues while updating an entry of the map

sync.Map will protect against 1, but not 2.

One way to deal with this problem is to have:

sync.Map[string, *DFrame]

where

type DFrame struct {
  sync.RWMutex 
  Data []DataFrame
}

Once to get an entry from the map, you should Lock or RLock it, and then work with the data. This is not just limited to the appending of the slice. You have to RLock the struct even if you are only reading from the data frames.

So if you are appending a new dataframe:

df := &DFrame{}
entry,_:=m.LoadOrStore(key, df)
dfEntry:=entry.(*DFrame)
dfEntry.Lock()
dfEntry.Data=append(dfEntry.Data, newDataFrame)
dfEntry.Unlock()

Leave a Comment Cancel reply