I think the data internet tends to swing wildly between shiny object syndrome and “the old ways are the only ways” syndrome. I reckon you’d find this to be true if you spend approximately 10 minutes on data Bluesky. It’s equally easy to find posts venerating Excel-as-a-database as it is to find Python devs singing UV’s praises (whatever that is) and insisting that the seed startup that their bro from Y Combinator started is totally gonna destroy the evil industry incumbent, just trust me bro, it’s all going to happen.
Some of that chaos is exactly why I love being on the data internet. It’s fascinating to sit at the rowdy town square and watch data people the world over hash out how we ought to do things to get to that hallowed promised land of ⭐ insights ⭐. I love that much of this duality of “the old ways are better” and “shiny object syndrome” can come from the same people—myself absolutely included in this. People are relentlessly complicated and there’s something comforting about watching that in real time. It makes me feel less crazy, anyways.
I do have to constantly remind myself that the data internet is a really small bubble, though. There’s so much discourse and snark and dunking that we all throw around, enough so that we can think that certain things are obvious or the gospel truth, and every day on the job I get reminded that the gospel truth is only about as true as the sky is blue. (because….the sky isn’t blue. Not all day. Not all the time.)
I get reminded of this truth every damn day at my job. Today, I bring you a short story from one of these occurrences. This week: That Time I Had To Convince A Customer That Git Is Good, Actually.
Friday is not the day to have a heart attack over a customer engagement, but there I was having one anyways.
Usually, I get at least a week or so of advance notice before starting a new customer engagement. But today, for reasons that are too complex and too private to share on the blog, I had a request in front of me to start a group with less than a full business day’s worth of notice. This was gonna require some careful footwork. Professionally, I think it’s important to prioritize my own sanity as much as I responsibly can. The heartburn of starting a customer engagement with very little prep time wasn’t something I had enough Tums in the house to deal with.
I knew pushing to start this group a week later than they wanted (after they had already run into several roadblocks and weren’t happy) meant I really needed to deliver from day one when the engagement began. No pressure, right?
Luckily, this group was populated by only a handful of developers, about half the size of my usual group. Doubly luckily, they responded well to a special pre-engagement meeting with me and met me with kindness and respect. They were willing to “start fresh” with me. We kicked off the first part of our engagement together understanding the dbt Cloud IDE and common workflow patterns.
For those of y’all who haven’t been in dbt’s IDE, this involves interacting with some git guardrails that shepherd you through a correct git workflow. dbt Cloud will force you to check out a branch before committing any changes, prompt you to commit changes when you save files within the IDE, and will prompt you to open a pull request after you’ve committed your changes. It’ll even prompt you to pull changes from the remote repo if there are any. It’s pretty nice and generally good guardrails so that dbt devs don’t have to think too hard about git.
But, if you are used to whipping up a stored procedure in Oracle, firing it off when you’re done, then getting your dataset out to a stakeholder who needs it yesterday, you might feel hamstrung by this process. One of my attendees in this engagement felt exactly that way. Her alias will be Jana.
Jana stopped me lots of times as I stepped through the dbt Cloud workflow with the group. All of her questions had to do with Git. She pointed out every time Git named something in a way that wasn’t indicative of what it was actually doing, and I could not argue with her.
“What do you mean, saving your work and committing your work are two different things?”
“Committing my work to a branch isn’t the same as making it available in prod? What the hell is prod?”
“When I open a pull request, I’m requesting to merge my changes into the main branch. Why isn’t it called a merge request??”
“You just said I need to check out a new branch, but this button says create branch. Which is it? What am I actually doing in this step?”
It would be easy to listen to Jana’s questions and assume that she wasn’t very technical. But the opposite was actually true. She was a veritable guru of SQL skill, and a font of the hallowed Business Knowledge that every damn data engineer seems to forget matters when they want to split hairs over whether or not Python package XYZ runs 0.5s faster than Python package ABC.
Jana was really technical. She humbled me constantly the entire engagement, pointing out SQL bits and bobs that I could be doing better, including things I’d literally never heard of before. I thought I was pretty good at SQL before this engagement, but I was brutally reminded that nothing replaces experience, and I did not have Jana’s 20 years of working with SQL.
Jana was a really technical SQL developer who had never worked with Git before, and she was the star performer of her team. To her, all these extra and poorly named steps to version control her work just seemed like extra steps. She told me as much in session.
It took us awhile to start to speak the same language! I ended up explaining their GitHub repo to her as a kind of shared file system. We tackled each of the important pieces of Git like this:
“Prod”, or the “Main” branch is the “live” one. That’s the data and transformations that are hooked up to the BI tool that her business stakeholders depend on. Changes shouldn’t go into the main branch until you are damn sure that they are ready for primetime and error-free.
You should never edit your main branch directly—people are looking at it all the time!!
Git lets you still make changes to your data and code without editing the main branch directly.
It does this by making copies of your shared file system on a branch. A branch is an isolated copy of your shared file system (codebase) that you can make edits to without messing up the stuff that’s “live”.
You make changes to the files you need to make changes to on your branch. Committing those changes as you go creates an audit trail that git saves for you. If you ever need to go back to a previous state of your work, your commit history is going to come in clutch.
Once you’re ready to move the changes you’ve made from your isolated copy into the “shared” or “live” one, you’ll open up a pull request. Your pull request should be rigged up to run some automatic checks to make sure work is error-free, and usually you’ll ask one of your coworkers to check it for you too.
After all that checking gets done, git compares your version of your file system to the “live” one, and it’ll grab the stuff you changed and integrate it with the live one.
Eventually, Jana ended up getting the hang of git and seeing its benefits. Her team had already decided to adopt dbt Cloud once I got to them—it wasn’t like I was really in a sales position. But the post-sales challenge of getting her acquainted with and excited about a really different way of working challenged me a lot as a professional.
It also reminded me that there are a lot of ways to skin a cat in the data world. As someone on the dbt payroll, I of course think it’s a great way to work. But I think it’s also good to be periodically reminded that technical competence comes in a lot of different flavors and rarely consistently matches with the hype cycle.
Jana—if you’re reading this, you’re a real one. Thanks for forcing me to really hone in on why Git works and not just accepting it as part of the workflow for no reason.
See y’all next week.