Are you an undergraduate student thinking about working in the data space?
Wondering what uni courses are the most valuable if you want to go into data?
Well I might have some answers for you below. =)
Table of Contents
- An Overview of the Roles
- Recommended Disciplines
- Recommended Areas of Study
- Extracurricular
- Final thoughts
An Overview of the Roles
If I were to interpret data science in the context of the value it could potentially bring to a company, I would argue that it serves a combination of the 3 functions:
- Streamline business operations and priorities
- Uncover hidden value from existing products/services
- Be the analytical backbone of certain software applications
Machine learning is a means to an end - a way to achieve the aforementioned outcomes.
Analytics can be insights-focused, BI, reporting - anything that involves ingesting the raw data collected by the company and outputting value.
Data engineering serves to build a solid foundation upon which analytics, machine learning and data science can stand on to achieve their outcomes.
How can these vague descriptions be translated into something meaningful?
Well, I can only say that studying the right subjects at university is the first step to a great data career!
Recommended Disciplines
My recommended combination of disciplines is… mathematics, statistics, computer science and maybe econometrics. Double majors or major/minor combinations are fine.
This may be unpopular opinion, but I think most data science majors do not provide a curriculum that has enough depth compared to, say, a computer science major. Whilst in university, focus on the theory and dive as deep as you can because if you get stuck you can ask for help.
Practical skills can be learned during internships and hobby projects.
The hardest part of doing this is juggling many disciplines of thought. You would be surprised though, just about everything you study in these 3 disciplines will be intertwined in some way.
If this is your calling, you will find alot of enjoyment piecing the puzzle together.
If you want to do research, prioritising a major in theoretical math will probably give you a stronger foundation. I won’t be discuss do just do alot of theory as much as you can and as difficult as you can :)
This is my breakdown of what proportions your university coursework should be in each of the disciplines for each of the data roles we discussed.
Data Role | Math | Stats | CS |
---|---|---|---|
Data science (ML) | 25 | 40 | 35 |
Analytics | 35 | 50 | 15 |
Data Engineering | 10 | 15 | 75 |
For data science, a combination of stats + CS is good. You’ll need to be able to code and have a solid foundation in math and statistics.
Analytics roles typically require less coding, and focus on generating reports/insights. This will probably involve running and analysing experiments (hence the emphasis on statistics). Also, an understanding of causal inference would be useful as well.
Data engineering is subbranch of software engineering so the majority of your coursework should be within the computer science (+ hardware) discipline.
Recommended Areas of Study
Alright, let’s talk about what subjects you should consider studying. Most of these subjects I’ve taken and found very beneficial.
Math
Calculus and linear algebra are fundamental - you should not skip this.
ODEs are good. PDEs are crazy and might be good.
Learning some analysis and eventually measure theory paves the road for theoretical statistics. Group theory and its more difficult abstract algebra cousins can be super fun but might not be too important for machine learning in industry.
Taking a course on mathematical computing would be useful although depending on the expertise of the lecturer your course might focus on solving physics-related problems computationally.
Statistics
In the statistics world there are a many rewarding subjects to study.
The applied subjects would be linear models (make sure it covers mixed effects models and models for modelling longitudinal data) and time series.
As far as I know, the cutting edge techniques nowadays involve deep learning but classical time series methods like ARIMA etc. are very useful to know.
Stohastic processes is also super interesting - I would recommend taking a subject in that area as well if you want.
Make sure you get comfortable with probability and definitely take a course on statistical inference - super useful theory.
Computer Science
In the computer science world, data structures & algorithms are essential.
In the data world, databases is essential. Without a doubt the courses that have produced the greatest day-to-day value for me is having a well-rounded understanding of databases.
Learn SQL, learn about different types of databases and why we need them because if you need data… you’re going to be writing SQL to get it from a database.
It pays huge dividends to know a little bit about how they work and what things you can do to optimise your queries and storage.
Low-level things like networks, security and operating systems are all very beneficial. In particular, I found distributed systems to be very useful.
A few years ago there was a very popular book called “Designing Data Intensive Applications” and the first 9 chapters of the book focuses on theoretical ideas in distributed computing and databases. The fact that this book was so popular should tell you how important those subjects are.
Take some software engineering courses that make you learn the 24 design patterns introduced in the book by the “Gang of Four” (or just read up on them…).
Linear programming and optimisation are also great subjects that combine CS (+ graphs) with math. I took an optimisation course, found it difficult but it was very rewarding.
Take a course that makes you code in C. Take a course that makes you code in Java. Definitely take a course that makes you code in C++. Take ML courses as well if you can.
Extra Notes
Alright. I think that’s all the subjects that I have found in my experience to be useful, either from personal experience or from talking to other professionals in the space.
Note: This is purely my own opinion. You do not have to do all of these subjects. Feel free to mix and match with your own interests.
Note 2: Of course, there are other math combinations that lean more towards pure math like abstract algebra, complex analysis etc. Likewise, CS combinations that are very programming heavy e.g. compilers, language paradigms etc.
Note 3: This subject list does not cover some other areas, like decision and information theory, asymptotics, economics or even engineering disciplines etc. etc. where much of this is applied in a specific context.
Note 4: Do the degree that interests you the most, be it engineering or neuroscience. If you have an interest in anything data, I would recommend doing a combination of mathematics, statistics and computer science subjects for electives.
Extracurricular
Your university degree is going to be challenging, but I have friends who do 5 subjects in these 3 disciplines and somehow survive so there’s no reason not to indulge in extra-curricular.
Aside from being involved in your math/stats/cs societies, the most helpful experiences I’ve had was running them.
From recruiting to organising, I definitely learnt alot about working in a team, setting expectations and managing team members. Its very cliche right… but until you do it… you don’t realise just how hard it is.
In a lighter semester, I would recommend doing consulting in a consulting society.
In the future you are likely to be working with people from all disciplines, and the value of learning business skills cannot be understated.
Whilst not everyone appreciates what you can do, it is important for you to appreciate what other people can do.
And I think that starts from having an open mind about the skills needed to operate in a different paradigm.
Final thoughts
My journey has been intellectually rewarding. Whatever your goals are, I hope my advice here will help you get closer to them.
Good luck, and enjoy the roller coaster!