Welcome to my first published project, my Beer Project! Recently I’ve been perusing lots of online resources and working on learning and improving my skills in Python (a programming language) and related packages that pertain to data science in preparation for applying for grad school. This project was one of the ideas I thought of as a way to practice my skills and learn more: analyzing my craft-beer drinking history.
As many of my friends and family know, I’ve an avid disciple of the app Untappd. I discovered the app a few years ago while taking a beer history course in college, and have been using it since. Users can log (or check in) beers they drink and assign them a rating, comments, and a wide variety of other fields of information. I try to check in almost every new beer I drink, and in 2+ years of usage I’ve logged around 500 beers so far. So when I found out users can download their history from Untappd I thought it would be cool to analyze the data, looking at questions such as:
- What styles do I drink the most, and which have the highest and lowest ratings?
- What is (are) my all-time top beer(s)?
- Which breweries tend to get the most and least consistent ratings?
And many more questions. The project was entirely done in Python, with a focus on pandas (especially dataframes) and matplotlib to visualize.
How to view this project
You can view my entire source code (currently in the form of two different Jupyter Notebooks) on my github, which can always be found as the top link on the Portfolio menu tab above. In addition, under the sub-menu for this project, the two Notebooks have direct links (all of this is in addition to the direct links provided here). Jupyter Notebooks are a combination of the source code and with some text sections and graphs. If you don’t understand code or simply don’t care, don’t worry! It’s pretty easy to skip through and read just the paragraphs of text where I summarize my findings and look at the graphs I created.
The first Notebook is primarily some basic data-munging. For the uninitiated, data munging (or wrangling) is a buzzword in data science that basically means cleaning data and turning it into a format easier to use and analyze down the line. For this particular project, I only changed a few items:
- deleting unwanted columns
- fixing null values or missing values in columns for brewery city and brewery state
- deleting duplicate beers (beers that I’ve checked in more than once)
- creating a new “simple style” column to group some substyles together (i.e. “Session IPAs” and “White IPAs” under the general term “American IPA”)
- creating two new time columns to help make analyzing the data easier down the line
Most people won’t be super interested in this Notebook and can skip straight the second Notebook, which is the analysis section. I broke the second Notebook up into five sections:
- The Statistics of My Beer Ratings
- Beers Drank Over Time
- The Best and Worst Beer Styles
- Breweries and Beers: Miscellaneous Statistics
- The Future?
As the fifth section suggests, this project is on-going, as I have lots of ideas for where to take it in the future. I wanted to get a first version up and running and posted onto my website and github so I had something to show for myself, but this isn’t the final version. I mention at the end of the second Notebook some ideas for the future, but a big part is cleaning up the code (especially variables and numbers that are hardcoded) so someday this program could be universal and analyze anyone’s history from their Untappd profile. And, of course, expanding the analyses and providing some more in-depth categories.
Finally, if anyone has comments, questions, etc. please feel free to reach out and leave a comment below or email me. I’m sure lots of my code could be improved, and it’s also very possible I’ve made a few mistakes, so please let me know! Hopefully you all like it and find it as interesting as I have.