Whilst programming, I am a Don’t Repeat Yourself (DRY) devotee. I am also frequently side-tracked by ancillary exploration: “Hmmm, what about this instead?” “What if I tried this really quick?” “Ohhhh should I check this variable too?” My point is this: exploratory data analysis is seldom linear; I often want to loop back or branch off from my pipes, preferably with minimal syntactical friction, and definitely without repeating myself. In this blog I’ll show two ways to ‘loop’ and ‘branch’ with pipes.
‘Looping’ with hacksaw
For example: I neither enjoy writing nor seeing code like this:
This irksome repetition was the primary motivation for my hacksaw package. Rather than re-type penguins %>% count(...)
N amount of times, I can write this instead:
[[1]]
# A tibble: 3 x 2
species n
<fct> <int>
1 Adelie 152
2 Gentoo 124
3 Chinstrap 68
[[2]]
# A tibble: 3 x 2
island n
<fct> <int>
1 Biscoe 168
2 Dream 124
3 Torgersen 52
[[3]]
# A tibble: 9 x 3
year sex n
<int> <fct> <int>
1 2009 male 59
2 2009 female 58
3 2008 male 57
4 2008 female 56
5 2007 male 52
6 2007 female 51
7 2007 NA 7
8 2009 NA 3
9 2008 NA 1
Now that’s DRY! Note how count_split
and eval_split
recycle the original data frame. hacksaw provides this ‘looping’ *_split
construct with most dplyr verbs, and for me, this saves some time, keystrokes, and mild annoyance.
‘Branching’ with nakedpipe
Again, I sometimes wish to ‘branch’ off from my main pipeline rather than assign an intermediate object. The reason is threefold: (1) Because naming things is hard; (2) I prefer fewer objects in my environment; and (3) I want to be DRY.
Consider the following example from the excellent infer package:
Note how offshore %>% specify(response ~ college_grad, success = "no opinion")
is called three times. Not a bad thing, but wouldn’t it be nice to do it only once and then branch off mid-piping? This is possible with the nakedpipe package:
The syntax is odd to see but easy to type–I actually very much enjoy the nakedpipe experience, as it facilitates this kind of ‘branched’ thinking.1
In sum, I am not sure either of these features, ‘looping’ with hacksaw and ‘branching’ with nakedpipe, rises to the level of ‘best practices’, but I appreciate how they are better syntactical reflections of both my own thought process and coding preferences. Worth trying IMHO!
-
And I never accidentally hit Ctrl+N instead of Ctrl+M… ↩