This year has been a crazy whirlwind for me – I moved job once, moved house 3 times, co-authored a course on Data Camp, got invited by a former colleague to assist with a workshop at rstudio::conf 2019, was accepted to present at the very same conference, and became a minor contributor to the tidyverse. In this blog post I’ll talk about reasons for contributing and then walk through the steps I took to help guide others.
RStudio Conference 2019 takes place in January 2019, and this week RStudio put out a call for contributed talks and e-posters. Though I was eager to browse previous years’ abstracts for inspiration, I couldn’t find them all in one place, and so I decided to use one of my favourite R packages, rvest, to do some web scraping to grab the content. My main aim was to find all of the abstracts for the contributed talks only from 2018.
A colleague asked for my opinion on 2 packages; loggit and futile.logger. Whilst I have used futile.logger before, I hadn’t used loggit and so used the metrics of the package to evaluate the package itself. The packagemetrics package allows us to generate a number of metrics about a package, so we can compare them. The first thing I do is call package_list_metrics() to get metrics about these two packages. I’ve changed the shape of the table, just so it’s easier to read in this blog post.
In this series of blog posts introducing tidy eval, we’ve been looking at why tidy eval is important, and terms like “quotation” and “quasiquotation”. The next step is to look at how we can write our own dplyr-style functions in R. This post will look at the following terms and functions: quosures quo() enquo() What is a quosure? Quosures are a topic which come up frequently when talking about tidy eval.
In a previous entry, I introduced the concept of tidy eval. If you’re completely new to tidy eval and haven’t read that post yet, I’d suggest you go back to it before continuing, as this post will build upon the concepts I discussed there. To recap, tidy eval refers to the ‘special’ type of evaluation used by dplyr functions. Whereas in base R, you have to refer the data frame in question if you want to returns particular rows, this is not the case with dplyr functions.
I’m going to begin this post somewhat backwards, and start with the conclusion: tidy eval is important to anyone who writes R functions and uses dplyr and/or tidyr. I’m going to load a couple of packages, and then show you exactly why. library(dplyr) library(rlang) Data wrangling with base R Here’s an example function I have written in base R. Its purpose is to take a data set, and extract values from a single column that match a specific value, with both input and output both being in data frame format.
## Warning: package 'dplyr' was built under R version 3.4.4 I previously blogged about using tidy eval with dplyr::mutate, and found that post handy to refer back to. I still haven’t got round to having an in-depth look at the principles of tidy eval, so instead I’m continuing to explore problems as and when they come up. In this post, I’ll be taking a look at using tidy eval with dplyr::filter.
I recently attended rstudio::conf, with my favourite talks being those which taught me new things that I am going to use in my day-to-day work. I attended and enjoyed Hadley Wickham’s talk, ‘Tidy eval: programming with dplyr, tidyr, and ggplot2’, although got sidetracked trying to keep up typing whilst listening. When I’m delivering training courses, this is the one thing I advise all attendees not to do - it’s so easy to miss important points whilst running code.
When I started my career in data science, I was in the common position of having familiarity with technologies like R, Python, and SQL, but much less with big data technologies. I remember feeling intimidated by big data; there were lots of different technologies named after animals or making some sort of pun I wasn’t clued up enough to understand. Flash forward 18 months and with experience, some parts of the big data landscape felt a bit more familiar.