Where Do I Start?

As people continue to discover the realm of data science, a question I often see is, “Where do I start?” This is tricky because in the same way that the term “technology” is broad, so too is the term “data science.” There is no right answer to this question, and there is no set of steps that everyone will follow in the same order, but there are some things you can do to get the ball rolling. We often hear the quote by Lao Tzu that “the journey of a thousand miles begins with one step.” So, how does one determine the first step or a starting point for a data science career? In this blog, I will share 5 tips on how you can get started on your data journey. Take note that this is not a roadmap or a list of everything you could possibly do to get into this field. It’s simply some things I think will help if you are just trying to figure things out and start building your portfolio.

  1. Research

    If there was ever a designated step 1, this would be it. The first thing you should do is research. Try to figure out what you’re getting yourself into or what you want to do. There are many careers that exist in the data world, and it’s not just limited to data analyst or data scientist. There are data architects, machine learning engineers, business intelligence analysts, and the list goes on. When you do some research into the types of careers that exist, you might find some descriptions that contain the things you might be interested in doing or skills that you already have. Do you think you would like to analyze data? Do you think you would like to build data pipelines? Do you think you would prefer to build predictive models? The starting point ultimately depends on what you are interested in and/or what you are skilled in. Once you figure that out, you can determine what tools you will need to move towards that direction. For example, a data engineer may need to be well versed with cloud technologies and learning how to build data warehouses, while a machine learning engineer may need to spend more time learning how to build models or webscraping for NLP. It’s okay if you want to experiment with different things, but the best thing you can do before jumping in is to research the possibilities so that you can try them out and see what you like.

  2. Collect Data

    Before you can do a project, you need to have data, and according to the type of project you want to do, that data may or may not be easily accessible. There must be something that you want to investigate or understand. Start with this in mind as you attempt to collect data. Your project must have a goal, and the best way to stay true to it is to remember what you want to investigate or what your research question is. It is also important to be mindful of your data sources. If you are working on a project that is intended to have real impacts on real people, make sure you collect representative data. You can retrieve data in many different formats from an array of sources. Here are some websites where you can find datasets.

    1. Kaggle

    2. Data.world

    3. Google Cloud

    4. GitHub

    5. Data.gov

    6. World Bank Open Data

    7. WHO Open Data Repository

    8. UNICEF Dataset

    9. Bureau of Labor Statistics

    10. US Census Bureau

    11. HealthData.gov

    12. FiveThirtyEight

    13. TuTiempo.net

  3. Learn How to Program

    There are 3 languages that are very popular in the data realm: SQL, Python and R. You should get comfortable with at least 2 of these (SQL and Python OR SQL and R). However, the role you are interested in will determine how advanced your programming skills need to be. Some people don’t need to be advanced Python programmers, but they may need to be an expert in SQL. This really takes us back to tip #1: Research will inform your next steps on what level of programming you will need to have under your belt. Regardless, learning how to program is a great way to position yourself for success in the field. There are several online courses or tracks available for beginners or intermediate programmers. Some are free and some are not. Datacamp, Udacity or even YouTube could be great resources for you.

  4. Learn How to Visualize Data

    In my line of work, I do more data engineering than analytics, but data engineering involves building pipelines that often end in a visual display of the data. Hence, even if you’re not a data analyst, it’s important to learn how to use data visualization tools. Whether you visualize data in Python or with some software like Tableau, you will need to create crisp, readable and appropriate visuals. Learning how to represent facts and figures visually can aid in your own understanding of the data. You will start to identify patterns that can inform the remainder of your investigation, or the investigation of the data scientists/analysts that you work with or for.

  5. Learn How to Interpret Data

    It’s not enough to collect some data and manipulate it or run models on it. You will need to learn how to interpret data, describe it and explain relationships between data points in common terms. In some cases, you may need to understand statistics in order to interpret the data. In other cases, you may just need to understand the industry that your project is based on, and that will give you a solid foundation to interpret the data that you have. Ultimately, a good project stems from good interpretation of the data that you have at your disposal.

In conclusion, I would like to reiterate that there is no true roadmap for breaking into the data space. There are some essentials that you must learn and there are some things that are simply “nice to have.” Everyone’s journey will be different, and the only one who can steer the path is you. In this article, I’ve given you some tips to consider, but I’m sure there are many other things you can do to get started. It all depends on you, your interests, your goals and your skillset.

Previous
Previous

Case Study: Building a Data Pipeline to Recommend Health-Conscious Skincare Products to Consumers

Next
Next

Free Resources for Data Visualization