Create or Initialize. How to minimize the load on database and server

I write in ruby, in the ruby on rails framework. I can’t understand the hypothetical situation.
Let’s say the get_date function returns an array of about 10 million elements. Each element contains the fields: fk, name, detail.
There is also a table that contains the same fields. It is partially filled. fk – a unique indexed field.
My task is to check if an entry exists in the table. If there is no record, I have to create one. If the record exists, I have to update the name and detail fields, because they could become irrelevant.
I need to write some code that will load the database and server as little as possible

I suggest:

get_data.each do |item|
          field = MyField.find_or_initialize_by_fk(item['fk'])
          field.update_attributes(
              {
                  :name       => item['name'],
                  :detail    => item['detail']
              }
          )
          field.save
end

or

get_data.each do |item|
          field = MyField.find_or_create_by_fk(item['fk'])
          field.update_attributes(
              {
                  :name       => item['name'],
                  :detail    => item['detail']
              }
          )
end

Which of these or the other would be better?

  • 1

    Neither. Upload the new data to a temporary table and do it in two queries in the database. It’s a database and much better at handling data than Ruby is.

    – 

  • But maybe loading the data into a temporary table would require the same amount of resources?

    – 

The first version would be better, because it only does two queries, one to find a potentially existing record, and another to update the existing or to create the new record.

The second version will do three requests to the database when no existing record was found: One trying to find the record, second to create the new records and a third to update the newly created record.

But instead, I suggest using upsert which inserts new or updates existing records in just one query:

get_data.each do |item|
  MyField.upsert(
    { fk: item['fk'], name: item['name'], detail: item['detail'] },
    on_duplicate: 'name = EXCLUDED.name, detail = EXCLUDED.detail',
    unique_by: :fk
  )
end

Note that there is also upsert_all which would allow your inserting or updating large batches of records in just one query. It depends a bit on how get_data is implemented, how to use it. But the general idea is to load the records in batches and then upsert_all each batch in one query.

Another even more performant way would certainly be doing this directly in SQL with one query and likely with using SQL’s UPSERT command again. How that would look like depends on the exact schema of the database tables in play.

Leave a Comment