Data, Code, and Regulation

Data, Code, and Regulation
Data, Code, and Regulation

In some contexts software is regulated but data is not, or at least software comes under different regulations than data. For example, maybe you have to maintain test records for software but not for data.

Suppose as part of some project you need to search for files containing the word “apple” and you use the command line utility grep. The text “apple” is data, input to the grep program. Since grep is a widely used third party tool, it doesn’t have to be validated, and you haven’t written any code.

Next you need to search for “apple” and “Apple” and so you search on the regular expression “pple” rather than a plain string. Now is the regular expression “[aA]pple” code? It’s at least a tiny step in the direction of code.

What about more complicated regular expressions? Regular expressions are equivalent to deterministic finite automata, which sure seem like code. And that’s only regular expressions as originally defined. The term “regular expression” has come to mean more expressive patterns. Perl regular expressions can even contain arbitrary Perl code.

In practice we can agree that certain things are “code” and others are “data,” but there are gray areas where people could sincerely disagree. And someone wanting to be argumentative could stretch this gray zone to include everything. One could argue, for example, that all software is data because it’s input to a compiler or interpreter.

You might say “data is what goes into a database and code is what goes into a compiler.” That’s a reasonable rule of thumb, but databases can store code and programs can store data. Programmers routinely have long discussions about what belongs in a database and what belongs in source code. Throw regulatory considerations into the mix and there could be incentives to push more code into the database or more data into the source code.