My First Month at Ursa Computing

2021/05/13

I’ve now been working at Ursa Computing on Apache Arrow for just over a month, and thought it would be a good time to write a blog post about my experiences over the past month.

So when I found out I got the apprenticeship, I will admit, I shrieked with joy. It was a similar noise to the one I made when I found out that my talk had been accepted for rstudio::conf 2019. At the time I’d been interviewing for a few different things, but this was the only one where I had to cut things down in the cover letter, rather than bulk it up. My cover letter was quite the ramble about my passion for open source community, and encouraging people to get involved in open source. The interviews and the small task I had to do were the least arduous of all the interviews and assignments for possible jobs I had to do at the time. It just felt right.

I’ve had a really great first month. It’s been really bloody challenging - but in a good way. I’ve had 12 pull requests merged so far, and that’s been all sorts of things, from adding a new job to the CI, to moving some C++ code from the R layer to the C++ layer and writing the relevant C++ unit tests, to writing R bindings that allow users to call common R functions like any() and all(), but actually harness the power of the underlying C++ code for faster execution on large datasets.

I’ve learned a lot. Before this, I hadn’t done any C++, and found it pretty daunting. I still do, but tackling the tickets where I had to work with the C++ code taught me a lot about the power of using analogy and looked at similar bits of code and previous pull requests which solve similar problems. Many times I’ve seen, in advice for people wanting to get involved in open source, the suggestion that they should read previous pull requests, but I’d never really understood the benefits until now. I’ve enjoyed getting to learn independently, but also more structured opportunities to learn - there is a fortnightly meeting which about different topics in open source maintenance, and this has been fascinating so far. The most interesting so far was a session on The Apache Way.

I’m also learning a lot about the difference between being a developer in the open source world compared to the commercial world. As much as in industry, plenty of people do strive to achieve great code quality, ultimately, if the code you write gets the job done and is sufficiently well-tested, then that’s good enough a lot of the time. However, often, when writing code for a specific application or use-case, you have a well-defined idea of how people are going to use your code, and might literally be able to see every time a function you write is being called, and so there is less risk of things going wrong from edge cases. Things are a lot different when you’re building frameworks that other people are going to be able to use. You need to think a lot more about edge cases as people could be doing all sorts of things with your code, and investing extra time in devising an elegant solution that reduces complexity is time well spent. My perfectionist tendencies absolutely adore this, though the part of me that likes to “MacGyver” something impressive together with string and duct tape is sitting crying in a corner! It’s not a bad thing though - I feel like my R skills have improved a lot over the past month, thanks to the feedback I’ve received.

Another difference I’ve observed is in communication patterns. I’m working on a globally distributed project, and so a lot of communication is asynchronous. In addition to this, one of the key tenets of The Apache Way - a set of principles that govern how Apache Software Foundation projects are run - is open communication. This means that much of the communication about the project must be done on project mailing lists, and other public but “unofficial” channels like Zulip are used for real time chat. This can be a bit daunting at first, but I’m getting my head around the idea. There’s still the internal Ursa Slack, which I tend to use for a lot of questions I have. I really do enjoy working with a distributed team though; I like the fact that communication is expected to be somewhat asynchronous, so whilst some conversations may happen in one go, it’s perfectly fine to start a conversation on a topic and people come back to it later depending on their individual time zones and what they’re up to. I find myself spending less time concerned about my Slack notifications and being more able to focus when I need to.

At this point, I do want to drop my tendency to be overwhelmingly positive, to instead be genuine. There are some things I’m finding it harder to adapt to. When I was working as a consultant or a data scientist, I always would push to be involved in the planning side of the work I was doing, really dig into what the big picture was, and aim to be working with as specific requirements and expectations as possible. This approach makes a lot of sense when you’re doing client work and have deadlines, but this is not the world which I am inhabiting now. Whilst there are certain items targeted for the next release, work is generally self-assigned and people work on what they’re interested in or what they think needs doing. This approach absolutely works - stuff gets done, and gets done well - but as someone who quells their anxiety by evaluating how things are by comparing the current state of things to the grand plan, it’s a huge mindset shift. I have a tonne of things I want to get better at - CI, R, Python, C++, documentation - and occasionally I find myself feeling a bit overwhelmed trying to do a bit of everything as I don’t have a huge amount of specific focus, and so feeling like I’m not moving much. Ultimately though, it’s up to me to talk to people, think about what I think is worth doing and making my own plan/goals if that structure is what I need. I’m optimistic about solving this though and I anticipate that I will actually really enjoy this extra freedom once I’ve adjusted to it.

One really great thing is the people. Things got off to a good start before I’d even started, with interviews taking a conversational tone, and not feeling like the interviewers were trying to interrogate me or catch me out. When I did start, I was paired up with someone with whom I had daily catchups with in the first couple of weeks, and now twice-weekly catchups. I think this was really important as a new person starting out somewhere that is entirely remote - there are a lot of incidental conversations that don’t happen and questions one doesn’t just ask in passing, and so having dedicated time to discuss things is helpful. As well of this, people seem to be really good at coaching/mentoring. When I ask questions about how/why things worked the way they do, people were very clear in encouraging me to ask these kinds of things. When I include code in a pull request that, perhaps unkindly, I might later label “f-ing stupid”, they’d gently ask me questions like “what would happen if I passed X into this function?” so I’d come to the realisation myself. Similarly, when I asked questions to which the answers are in the vignettes, instead of just pointing me there, to explain what I did understand so far, and started a conversation. It’s, honestly, an absolute delight to be working with people who are really great at a lot of things I’m still a relative newbie to, and just being able to enjoy it rather than feeling constant “imposter syndrome”. As an aside, I’m being a lot more careful about using that term now, as it ignores the impact of the context and environment, and frames things in terms of neuroses when there are things that can be done about it.

So, what next then? I have a lot to learn and a lot of skills to level up, so I’m going to set myself some goals so I have a bit more focus, as mentioned earlier. I also want to be more communicative without adding meaningless noise. For example, getting involved in code reviews a little more; yesterday someone had done a huge amount of refactoring, and I realised that even though the author was a lot more experienced as a developer than me, it was still worth expressing an opinion on whether their changes made the codebase easier to work with, even if I was just confirming that it was. I’ve also started a slightly bigger project on documentation. I think there’s a fine line between the fact that sometimes people really do need to just RTFM, but on the other hand, many people trying to use a new codebase are busy and so skimming the docs is as much effort as it makes sense for them to invest. I want documentation that works for the people in the latter category, as that’s often been me in the past. I want all my man pages to have examples I can run. As a developer, I want complicated internal functions to be documented so I can read the code more easily. I have quite strong feelings about this, and am really excited to dig more into this topic.

This month has been fun but challenging - which, to be honest, is my favourite combination of things. I’m working on a really exciting project with some fantastic people, and I’m really glad to be here.