Code
[1] "2022-01-31"
[1] "Date"
Some designs to validate function arguments.
Vishal Katti
January 18, 2022
Functions in R ( or any other programming language in general) allow us to encapsulate some lines of code that we want to run again and again. Functions are the natural outcome of the DRY1 principle. Functions group together a couple of lines of consistent logic making our code modular and consequently, easy to manage. However, when we write functions, we need to ensure that they behave exactly as we want them to and are able to handle whatever we throw at them. By whatever, I mean any and all kinds of inputs. The idea of creating unbreakable code is idealistic. I say this since creating robust functions requires additional code to handle the unwanted inputs and most useRs write functions during some one-time analysis. Hence we need to be pragmatic about how much time and effort we spend trying to make our functions robust. Maybe, we need our functions to be just robust enough! All I am saying is, if you are creating functions that will be used by you and only you i.e. if you have absolute control over what inputs would be provided to your functions, then you can forego certain checks and the functions need not be unbreakable. But, if you intend to write functions that will be used by a larger audience, you need to ensure that such functions are able to handle all kinds of innocent and malicious intents.
You must be familiar with the Garbage-In-Garbage-Out philosophy of Software engineering. We can think of it in terms of functions, that, given garbage or bad input, you get garbage or bad output. For a function to be robust, it must behave in a consistent manner for known and correct inputs, however, more importantly, it mustn’t give us garbage for bad inputs. Rather, it must provide useful output (as messages or instructions) which can be further used to inform the end-user about possible problems in the inputs to drive proper usage. The useful output/s in case of bad inputs would ideally be a combination of clean early exit and easy-to-understand error messages. So we shall try to implement Garbage-In-Useful-Info-Out by looking at some ways we can build well-behaved and reliable functions.
Input values passed to a function are more popularly known as arguments or parameters. A robust function must validate the function arguments before proceeding to implement the function logic. If this is not done, then the bad arguments will cause some errors in the logic and display error messages that the end-user may not be familiar with. Worst-case scenario is when the function doesn’t encounter any errors and just gives bad results!! Surely, we do not want this unpredictable behavior.
Consider the following function make_date
that takes 3 numeric inputs yyyy
, mm
and dd
and returns a single `Date` object.
[1] "2022-01-31"
[1] "Date"
We will use make_date
to demonstrate a couple of scenarios where this function can fail and the methods to safeguard against such scenarios.
The most basic check we should perform before running the function logic is to confirm if all the required arguments are available. Think about how your function should behave if one of the arguments, suppose mm
is missing.
Error in paste(yyyy, mm, dd, sep = "-"): argument "mm" is missing, with no default
Note that the error message shown to the user, is triggered, not from our function make_date
but from the internal paste
function. We do not have any control over what error messages are shown when errors occur. In this case, we know specifically that this error is due to a missing argument.
There are two ways to handle missing arguments:
If a certain required argument is missing, we can stop the execution of the function and show informative error message about which argument is missing. Your friends here are the missing
and stop
functions. The missing
function checks if the given argument is missing or is set to NULL and returns TRUE, else it returns FALSE. The stop
function stops the execution and displays the custom error message we provide. Using these functions inside an if
condition will let us check for missing arguments. Let us modify our naive function to stop early when required arguments are missing.
make_date <- function(yyyy, mm, dd) {
# check missing arguments
if (missing(yyyy)) stop("argument `yyyy` is required.")
if (missing(mm)) stop("argument `mm` is required.")
if (missing(dd)) stop("argument `dd` is required.")
# main logic
as.Date(paste(yyyy, mm, dd, sep = "-"), format = "%Y-%m-%d")
}
# Calling the function without `mm` argument
make_date(yyyy = 2022, dd = 31)
Error in make_date(yyyy = 2022, dd = 31): argument `mm` is required.
Note that here, we add three if-missing-stop
statements, one for each required argument. We must do this if we want to display specific error messages for each argument. There is another way to do the same but we will look at it later. If we want to display a single error message, we can do so by clubbing the missing
functions inside an any
which will return TRUE if any one of the arguments is missing. However, providing clear error messages becomes challenging in this method.
In some cases, we may need the function to use some sensible default value for the required arguments and continue execution. Here, we display a warning message instead of an error message. This is required when the argument value is either considered to be obvious or the argument is not necessarily the most important one and is used only in extreme customization. Providing default values to arguments makes them optional arguments. An example of default argument values can be seen in the paste
function we have used earlier. The default value of the separator argument sep
is a single whitespace character.
Similarly, we can provide some sensible defaults for the make_date
function. Let’s modify the function further to provide defaults for the mm
and dd
arguments only.
make_date <- function(yyyy, mm = 1, dd = 1) {
# check missing arguments
if (missing(yyyy)) stop("argument `yyyy` is required.")
if (missing(mm)) warning("argument `mm` is missing. Using default value mm = 1 instead")
if (missing(dd)) warning("argument `dd` is missing. Using default value dd = 1 instead")
# main logic
as.Date(paste(yyyy, mm, dd, sep = "-"), format = "%Y-%m-%d")
}
# Calling the function without `mm` and `dd` arguments
make_date(yyyy = 2022) # here, only `yyyy` is the required argument.
Warning in make_date(yyyy = 2022): argument `mm` is missing. Using default value
mm = 1 instead
Warning in make_date(yyyy = 2022): argument `dd` is missing. Using default value
dd = 1 instead
[1] "2022-01-01"
There are a few concerns about using warnings instead of error messages. Some are listed here in this article from RBloggers A Warning About warning
.
We have defined make_date
to accept 3 numeric arguments i.e. all 3 must be numbers. What would happen if someone tried to call make_date
with character, factor or boolean inputs?
In this case, the function works because when the arguments are combined into a single string using paste
, it matches the format
argument of the as.Date
function in the main logic of make_date
which is as.Date(paste(yyyy, mm, dd, sep = "-"), format = "%Y-%m-%d")
In this case, all the arguments pass the checks but since we pass 2022-May-1
to as.Date
which doesn’t match the format = '%Y-%m-%d'
thus giving NA
.
How do we check if the values provided to the arguments are numbers or number-like? If the values are numbers, we let them pass. But if they are non-numeric, we must check if they can be converted to numbers i.e. we must check if they are number-like. By number-like, I mean, will coercing the value using as.numeric
give us a numeric value or NA
? You guessed it right, we will pass the values through as.numeric
and check if the output is NA
or not.
What are the various data types in R that are not numeric
but can look like numbers? We have character
, factor
and boolean
data types which can behave like numbers sometimes. Let’s see a few scenarios.
Warning: NAs introduced by coercion
[1] 2022 NA
As you can see in above example, when passed through as.numeric
, the value “2022” gets converted to the number 2022 but the value “TwentyTwo” does not. Hence we can say “2022” is number-like but “TwentyTwo” is not.
[1] 2 1 3
[1] 1 2
[1] 1
As you can see from above examples, factor
values do get converted to numeric but do not give the right results. So we can safely say that factors are not number-like.
I will ignore boolean
data types hoping that useRs are bright enough to not use Booleans while creating a Date!
From the above examples, we can conclude that numeric
values and number-like character
values are the only valid data types that should be allowed. Modifying our make_date
function to include data type checks.
make_date <- function(yyyy, mm = 1, dd = 1) {
# check missing arguments
if (missing(yyyy)) stop("argument `yyyy` is required.")
if (missing(mm)) warning("argument `mm` is missing. Using default value mm = 1 instead")
if (missing(dd)) warning("argument `dd` is missing. Using default value dd = 1 instead")
# Check data types
if (!is.numeric(yyyy) & !is.character(yyyy)) {
stop("argument `yyyy` must be numeric")
} else if (is.character(yyyy) & is.na(as.numeric(yyyy))) {
stop("argument `yyyy` must be numeric")
}
if (!is.numeric(mm) & !is.character(mm)) {
stop("argument `mm` must be numeric")
} else if (is.character(mm) & is.na(as.numeric(mm))) {
stop("argument `mm` must be numeric")
}
if (!is.numeric(dd) & !is.character(dd)) {
stop("argument `dd` must be numeric")
} else if (is.character(dd) & is.na(as.numeric(dd))) {
stop("argument `dd` must be numeric")
}
# main logic
as.Date(paste(yyyy, mm, dd, sep = "-"), format = "%Y-%m-%d")
}
# Calling the function with new datatype checks
make_date(yyyy = "2022", mm = "May", dd = "1")
Warning in make_date(yyyy = "2022", mm = "May", dd = "1"): NAs introduced by
coercion
Error in make_date(yyyy = "2022", mm = "May", dd = "1"): argument `mm` must be numeric
Error in make_date(yyyy = "2022", mm = factor("5"), dd = "1"): argument `mm` must be numeric
Notice that the datatype check is lengthy and similar for all 3 arguments. We can apply DRY principle again and encapsulate that code into a small function is_numberlike
which will return TRUE
or FALSE
. Note that is_numberlike
has no checks because it is an internal function.
# This function check if value is number or number-like.
is_numberlike <- function(x){
if (!is.numeric(x) & !is.character(x)) {
# Early Exit 1 if value is neither numeric nor character
return(FALSE)
} else if (is.character(x) & is.na(as.numeric(x))) {
# Early Exit 2 if character value is not number-like.
return(FALSE)
}
return(TRUE)
}
Thus our make_date
function with data types check will look as below.
make_date <- function(yyyy, mm = 1, dd = 1) {
# check missing arguments
if (missing(yyyy)) stop("argument `yyyy` is required.")
if (missing(mm)) warning("argument `mm` is missing. Using default value mm = 1 instead")
if (missing(dd)) warning("argument `dd` is missing. Using default value dd = 1 instead")
# Check data types
if (!is_numberlike(yyyy)) stop("argument `yyyy` must be numeric")
if (!is_numberlike(mm)) stop("argument `mm` must be numeric")
if (!is_numberlike(dd)) stop("argument `dd` must be numeric")
# main logic
as.Date(paste(yyyy, mm, dd, sep = "-"), format = "%Y-%m-%d")
}
# Calling the function with new datatype checks
make_date(yyyy = "TwentyTwo", mm = "5", dd = 1)
Warning in is_numberlike(yyyy): NAs introduced by coercion
Error in make_date(yyyy = "TwentyTwo", mm = "5", dd = 1): argument `yyyy` must be numeric
Error in make_date(yyyy = "2022", mm = factor("5"), dd = "1"): argument `mm` must be numeric
Warning in is_numberlike(dd): NAs introduced by coercion
Error in make_date(yyyy = 2022, mm = 5, dd = "one"): argument `dd` must be numeric
One of the most interesting features of R is vectorization! Due to this feature, our function make_date
behaves in interesting ways. In some cases, it is desirable and sometimes it is not.
Warning in if (is.character(x) & is.na(as.numeric(x))) {: the condition has
length > 1 and only the first element will be used
[1] "2022-01-01" "2022-02-01" "2022-03-01" "2022-04-01" "2022-05-01"
[6] "2022-06-01" "2022-07-01" "2022-08-01" "2022-09-01" "2022-10-01"
[11] "2022-11-01" "2022-12-01"
Note the above warnings. These warnings appear because the if
statement checks if the condition provided results in a single TRUE
or FALSE
value. However, the output of the check is.na(as.numeric(mm))
is a boolean vector of length 12. But if
needs only 1 TRUE
or FALSE
.
The output contains 12 date values since paste
is vectorised, it recycles the values for yyyy
and dd
to give us 12 dates!
[1] "Month 1" "Month 2" "Month 3" "Month 4" "Month 5" "Month 6"
[7] "Month 7" "Month 8" "Month 9" "Month 10" "Month 11" "Month 12"
What do we do if we want make_date
to return just one date?
To ensure make_date
gives you just one date, we must ensure that the arguments have just value and is not a vector of multiple values i.e. length(arg)==1
. Let’s further add a few checks for the data size of the arguments and rearrange the checks.
make_date <- function(yyyy, mm = 1, dd = 1) {
# check missing arguments
if (missing(yyyy)) stop("argument `yyyy` is required.")
if (missing(mm)) warning("argument `mm` is missing. Using default value mm = 1 instead")
if (missing(dd)) warning("argument `dd` is missing. Using default value dd = 1 instead")
# Check argument lengths
if (length(yyyy)!=1) stop(paste0("Length of argument `yyyy` is ", length(yyyy),". Must be only 1."))
if (length(mm)!=1) stop(paste0("Length of argument `mm` is ", length(mm),". Must be only 1."))
if (length(dd)!=1) stop(paste0("Length of argument `dd` is ", length(dd),". Must be only 1."))
# Check data types
if (!is_numberlike(yyyy)) stop("argument `yyyy` must be numeric")
if (!is_numberlike(mm)) stop("argument `mm` must be numeric")
if (!is_numberlike(dd)) stop("argument `dd` must be numeric")
# main logic
as.Date(paste(yyyy, mm, dd, sep = "-"), format = "%Y-%m-%d")
}
# Calling function with new data size checks
make_date(yyyy = 2022, mm = 1:12, dd = "01")
Error in make_date(yyyy = 2022, mm = 1:12, dd = "01"): Length of argument `mm` is 12. Must be only 1.
Error in make_date(yyyy = c("2021", "2022"), mm = "1", dd = 1): Length of argument `yyyy` is 2. Must be only 1.
Error in make_date(yyyy = 2022, mm = 1, dd = c("1", "2")): Length of argument `dd` is 2. Must be only 1.
So far we checked for missing arguments, arguments with bad data types and arguments with incorrect sizes. We’ve used the stop
function along with if
to check for all failure conditions and show appropriate error messages. When we use stop
, we must specify all the failure conditions and the number of specific error messages goes up as number of arguments increases.
In case of our make_date
, if an argument is not missing, it must be a number-like value of length 1. To reduce the number of error messages, we can combine the error messages for data type and length. for eg, the error message could be argument yyyy
must be a number-like value of length 1.
Wouldn’t it be easier if we just specify what is the success condition aka the “happy path”, and show error for all other conditions? To do this, we can use the stopifnot
function that let’s us specify all the happy paths. See example below.
dummy_sum <- function(a, b, c){
# check missing
stopifnot(!missing(a) & !missing(b) & !missing(c))
# check argument values
stopifnot(!is.na(a) & is.numeric(a) & length(a)==1,
!is.na(b) & is.numeric(b) & length(b)==1,
!is.na(c) & is.numeric(c) & length(c)==1
)
sum(a, b, c)
}
dummy_sum(b = 2, c = 3) # a is missing
Error in dummy_sum(b = 2, c = 3): !missing(a) & !missing(b) & !missing(c) is not TRUE
Error in dummy_sum(a = NA_integer_, b = 2, c = 3): !is.na(a) & is.numeric(a) & length(a) == 1 is not TRUE
Error in dummy_sum(a = 1, b = "2", c = 3): !is.na(b) & is.numeric(b) & length(b) == 1 is not TRUE
Error in dummy_sum(a = 1, b = 2, c = 5:7): !is.na(c) & is.numeric(c) & length(c) == 1 are not all TRUE
Note the error messages above. They are not so user-friendly. Luckily, we can specify error messages in stopifnot
by providing the error messages as the names of the “happy path” conditions.
dummy_sum <- function(a, b, c){
# check missing
stopifnot("one or more required arguments missing" = !missing(a) & !missing(b) & !missing(c))
# check argument values
stopifnot("argument `a` must not be NA, must be a number of length 1" = !is.na(a) & is.numeric(a) & length(a)==1,
"argument `b` must not be NA, must be a number of length 1" = !is.na(b) & is.numeric(b) & length(b)==1,
"argument `c` must not be NA, must be a number of length 1" = !is.na(c) & is.numeric(c) & length(c)==1
)
sum(a, b, c)
}
dummy_sum(b = 2, c = 3) # a is missing
Error in dummy_sum(b = 2, c = 3): one or more required arguments missing
Error in dummy_sum(a = NA_integer_, b = 2, c = 3): argument `a` must not be NA, must be a number of length 1
Error in dummy_sum(a = 1, b = "2", c = 3): argument `b` must not be NA, must be a number of length 1
Error in dummy_sum(a = 1, b = 2, c = 5:7): argument `c` must not be NA, must be a number of length 1
Using stopifnot
in our make_date
function to combine the datatype and length checks, we get…
make_date <- function(yyyy, mm = 1, dd = 1) {
# check missing arguments
if (missing(yyyy)) stop("argument `yyyy` is required.")
if (missing(mm)) warning("argument `mm` is missing. Using default value mm = 1 instead")
if (missing(dd)) warning("argument `dd` is missing. Using default value dd = 1 instead")
# Check argument types and length
stopifnot(
"argument `yyyy` must be numeric with length 1" = is_numberlike(yyyy) & length(yyyy)==1,
"argument `mm` must be numeric with length 1" = is_numberlike(mm) & length(mm)==1,
"argument `dd` must be numeric with length 1" = is_numberlike(dd) & length(dd)==1
)
# main logic
as.Date(paste(yyyy, mm, dd, sep = "-"), format = "%Y-%m-%d")
}
make_date() # no arguments provided
Error in make_date(): argument `yyyy` is required.
Warning in if (is.character(x) & is.na(as.numeric(x))) {: the condition has
length > 1 and only the first element will be used
Error in make_date(yyyy = 2022, mm = 1:12, dd = 31): argument `mm` must be numeric with length 1
Warning in is_numberlike(mm): NAs introduced by coercion
Error in make_date(yyyy = 2022, mm = "Jan", dd = 31): argument `mm` must be numeric with length 1
Warning in make_date(yyyy = 2022, dd = 31): argument `mm` is missing. Using
default value mm = 1 instead
[1] "2022-01-31"
Finally, what do we do when the arguments provided will definitely give us bad results despite passing all checks? In our case, make_date
creates a date but if we give it values that will result in an invalid date, it will give us invalid results (remember Garbage-In-Garbage-Out?).
We get NA
because as.Date
returns NA for invalid inputs with no error messages or warnings! We can check the output and provide a generic error message.
make_date <- function(yyyy, mm = 1, dd = 1) {
# check missing arguments
if (missing(yyyy)) stop("argument `yyyy` is required.")
if (missing(mm)) warning("argument `mm` is missing. Using default value mm = 1 instead")
if (missing(dd)) warning("argument `dd` is missing. Using default value dd = 1 instead")
# Check argument types and length
stopifnot(
"argument `yyyy` must be numeric with length 1" = is_numberlike(yyyy) & length(yyyy)==1,
"argument `mm` must be numeric with length 1" = is_numberlike(mm) & length(mm)==1,
"argument `dd` must be numeric with length 1" = is_numberlike(dd) & length(dd)==1
)
# main logic
out <- as.Date(paste(yyyy, mm, dd, sep = "-"), format = "%Y-%m-%d")
if (is.na(out)) {
stop("Invalid values provided. Please check your inputs.")
}
return(out)
}
make_date(yyyy = 2022, mm = 13, dd = 1) # is there a 13th month?
Error in make_date(yyyy = 2022, mm = 13, dd = 1): Invalid values provided. Please check your inputs.
Error in make_date(yyyy = 2022, mm = 2, dd = 31): Invalid values provided. Please check your inputs.
Do you think our function make_date
is robust enough?
Making functions robust requires some prior thought about its intended use and audience. Based on this, we can decide what checks to implement, what to skip, whether to stop execution using error messages or to use default values with warnings. Checking for “happy paths” is simpler compared to checking each and every bad input and providing specific error messages. Too many different error messages for the same argument could become a source of frustration of the end user, so consider combining some checks and their error messages to be informative and precise. Robustness, like everything else, in moderation, is good and getting it “just right” takes time and dedicated effort. Happy Coding!
Don’t Repeat Yourself!↩︎
@online{katti2022,
author = {Katti, Vishal},
title = {Writing {Robust} {R} {Functions}},
date = {2022-01-18},
url = {https://vishalkatti.com/posts/writing-robust-functions/},
langid = {en},
abstract = {This post demonstrates some techniques to make your R
user-defined functions unbreakable (well, almost!) by checking if
function arguments are missing, incorrect data type or just
down-right invalid values and how to return meaningful error
messages.}
}