Thursday, September 6, 2018

Why Custom-Delimited Data Sets are Bad and Should Be Avoided

With MySQL and other database options, you can store data to fields for tables. And sometimes, people are tempted to store a list to a field, like this: "1,2,3,4,5".

When the coder of this needs to go over the list, they retrieve it from the database, break it up by the commas using explode() or split(), depending on the language, and then iterate over the list.

There are a number of reasons why this is a problematic approach to whatever you're trying to solve.

The problem is that you need the whole list every time you want to do something with it.

Want to count how many 1's or 2's or 3's in the list? You must loop over the entire list.

Want to get only numbers when they are 4 or 5? You must loop over the entire list.

Want to delete just one value? Want to delete values greater than 5? Loop the list, loop the list, loop the list, and that's a lot of looping.

Besides performance issues, there are technical limitations.

How do you guarantee that something like "skyscraper" doesn't show up in your list of numbers? How do you escape characters like your delimiter (a comma in this case), which may appear in words, like this list: "barn", "skyscraper", "appartments, condos, etc."?

These performance and technical issues mean that few programmers actually implemented delimited lists. So, if you're going to use it in your project, it's going to be something that new people need to stress over and re-learn, when there are many more commonly accepted practices that solve this problem.

No comments:

Post a Comment