When I teach brand new programmers about Python, I like to ask them to ponder this question. I show them a slide with three pictures: a hitchhiker, a mechanic, and a driver and ask them, which one of these best describes programming? The slide is meant to illustrate different levels of interaction with technology and how programming puts you in control. The hitchhiker represents the user - someone who relies on others’ established directions to navigate technology e.g. how the creator of a system decided to setup an interface or a piece of software determines how you use it. Many people who use apps have no ability to modify or create them. The mechanic symbolizes a tinkerer, someone who understands how things work and can make adjustments but doesn’t necessarily build everything from scratch. Creating a custom function using existing tools that other people within your organization can use will make you more like a mechanic. And the driver symbolizes the programmer, able to direct and shape technology to achieve specific goals. While not everyone can or needs to be a mechanic, deeply understanding how every part of a system works, everyone can be a driver. In other words, you don’t have to know the intricate details of how a computer or programming language is built to be able to use it effectively. But just as with a motor vehicle, you need to know enough about how software and hardware systems function to get ****into the driver’s seat. That is certainly preferable to hitchhiking all the time, using technology passively as others have predetermined and scripted for you. Learning how programming works and doing some programming on your own helps you gain the ability to create, modify, and innovate with technology. It also increases your taste for well designed technology and creates a pressure on developers and startups to meet the demand for systems that empower users (more on that in the last chapter). If you are.a data analyst, certainly you have to be a driver when it comes to the various tools and platforms out there for working with data and generating insights. This is the reason for including ‘Driver Labs’ within this book.
<aside>
Driver Lab Basics
Skills labs are designed to allow the reader follow along with some hands on demos of fundamental ideas related to working with big data. Some of the skills labs will be too basic for an experienced data analyst, but more suitable for new data analysts and people without a computing background. I’ve written the driver labs to be conversational and also include stories and narrations so you may find skimming through them interesting, no matter your skill level.
</aside>
The philosophy behind the skills labs is consistent with how I approach teaching programming to students, and I have done so to quite a wide variety of graduate and undergraduate students. Here they are listed and described with examples.
In general, software is not built for programmers, it is built for users. By learning to program, please be aware that you are flipping the order of things on its head, and there will be consequences to pay - namely being lost, being confused and having to figure out things on your own. While expecting this from users would be fatal for creators of software e.g. apps on your phone, banking apps, and so on, it is a staple to expect programmers to figure out what they need to get stuff done. Having said this, there are some tools that have been built specifically for ‘programmers as users’ but these remain the exception, rather than the norm. Nothing exemplifies this more than a survey of API access instructions. While users typically engage with a software tool via an interface, other programmers interface with software via ‘connectors’ that are based on systems known as application programming interfaces ‘APIs’ for short. APIs are a subset of a broader idea known as ‘Web Services’, a topic which is discussed in Chapter 8. In an annual report called the ‘State of the API’, over 40% of developers surveyed reported that they had to dig through actual source code to figure out how to make an API work, and over 40% reported relying on colleagues to explain how APIs work. Perhaps because those colleagues already invested the time in digging through the source code in the first place. For experienced data analysts, this may not be surprising, but if you are new to this you may find that number uncomfortably high. This is like hearing that 40% of your friends have to pop open the hood of their cars and tinker with the engine every morning before getting their cars to run, or 40% have to call a mechanic everytime they want to go to the grocery store. The whining in your group chats would be nonstop, you wouldn’t be able to talk about anything else!
This is one of the most intuitive things for modern learners to accept about programming. I’ll give you an example. A while ago I was working on a report and I needed to convert a survey I had built using Google Forms into a text format for easy inclusion into the document. At the time, Google Forms did not have an export button that lets you take the survey you built and create a formatted document from it. They still don’t at the time of publishing this. It’s not like I’m making a feature request, I’m just stating the obvious. Anyway, I searched online for solutions to doing this, and stumbled on a Google Apps script to do this. I don’t have a lot of experience with Apps script, although I’ve used it in the past while collaborating with some colleagues. Apps script is Google’s version of the Excel Macro, which allows you to do interesting things within the Google ecosystem, like bulk rename files, retrieve filenames to Excel sheets, etc. The first link I pulled up from searching from the solution had sample code to run and only required me to provide the link to the Google Form and the Google sheet. I copied the Apps script, pasted into the script editor, modified the lines for the Google form and the destination sheet, saved it and ran the script. Voila, it worked. I still have no idea what the code did, and I probably never will. What did I learn from this task? More importantly, what have you learned from hearing me talk about this task? You may have learned that I learned nothing! I got the task done, and moved on - simple and short.
Prepare to constantly look up information about the right or best way to do certain things when you are working with data, except for the most routine of tasks that you perform frequently. For instance, the previous point describes me seeking out a way to do something and finding examples from the internet that I could adopt for my purpose. Converting a Google form into a document is such a one-off task that not many people are likely to have built a competency in doing that. Also, once the problem is solved by one person, it doesn’t really make much sense to try to improve or change the code if it does the job. In order words, some specific tasks are doomed to being unrepeated and so will end up getting looked up a lot of times. This particular proviso has been greatly impacted by generative AI and the embedding of AI coding assistants which ask you want to do and generate code, in-line code completion assistance and other natural language understanding based interface enhancement tools into software, with the main impacts being increased accessibility and speed of lookup (it’s right there on the screen you are working on) and its problem solving capabilities (allowing you to obtain explanations for your errors without having to search the documentation or google in a separate step). Nevertheless, coding assistants are not ubiquitous with data analysis quite yet, and many still must resort to searching Stackoverflow, Github, blog articles and documentation for instructions on infrequently used commands and operations. Grow comfortable with this as part of your process. This reality can be a stumbling block to individuals who are learning how to use tools offline in locations without ubiquitous and affordable internet access, and there are still many locations like that across the world. I was once someone exactly like that, and I’ve written this book to be useful to someone in that position today as well. Pay attention to the pop-outs within the chapters that specifically speak to offline readers, and the information in the ‘What you need to start’ section of each lab. I also write a love letter to the offline reader in the final chapter titled ‘The Future’.
When you begin looking up information about things you don’t know how to do or things you do rarely enough that you always have to look them up, you will find a jungle of information out there. This is one of those moments where the use of an adjective is actually quite literal. The space where information to support data analysis and programming lives is a jungle, and for many reasons which will become apparent as you proceed through this book. Differences in the version of the tool being explained, differences in operating systems, undisclosed dependencies, people posting code that simply doesn’t work but still remains a top search result, and so on. So you need to approach this task intentionally like a seasoned discount shopper knows how to sort through a packed rack of clothes in the clearance section. This is also a critical part of workflow that you need to develop when working with data. Do you run every suggested piece of code you find online within the main workflow you are working on? What if it breaks something else within the work you have already made so much progress on or modifies your data, creating new steps you have to figure out how to roll back? None of these are trivial tasks to deal with, and you will be dealing with that in addition to your unsolved problem that sent you to looking up answers in the first place. Related to this is the idea of knowing how to properly ask for help within the online forums from the friendly and helpful strangers of the internet. Do you know how to create a reprex of your problem? Reprex stands for ‘reproducible example’ which basically means you have provided all the necessary context for the issue you are facing that allows someone learned to make accurate suggestions. (An unofficial theme for reprex, a term coined by Romain Francois, developer and blogger at tada.science, is ‘help me help you’).
Being lazy isn’t necessarily bad—in fact, some of the best innovations come from people trying to make their work easier. But there’s a catch: not all laziness is created equal. Good laziness means finding smart ways to minimize effort while still getting great results, whether that’s automating repetitive tasks, optimizing workflows, or simply working smarter instead of harder. The irony? It actually takes tremendous effort and know-how to be efficiently lazy. It takes skill to set things up so that less effort leads to better outcomes, and this is especially true when working with data. So, the next time you’re taking a shortcut, ask yourself: is this good lazy or just plain lazy?