Sunday 19 September 2021

The Data Science life cycle

 The life cycle of Data Science, it is clear from the name that we are going to discuss How a Data Science project is planned, executed, and completed. 

This completely depends on the type of project but yeah, we can see the basic steps involved in a Data Science Project.

According to me, the following are the steps involved: 

  1. Planning
  2. Data Acquisition
  3. Data Preparation
  4. Data Analysis
  5. Model making
  6. Improving the Accuracy
  7. Report Making or Story Telling
again continuing from step 1 to step 7.




Now let's discuss these steps in detail.

1. Planning: The first step in which we plan our project from start to end that -
  •  In how many sprints we are going to divide our project?
  • All these above-mentioned steps will repeat in every sprint.
  • We also plan for tools and technologies we are going to use in the project.
  • Financial planning is also done here.
  • Generally, Managers and Team Leads take these decisions.
2. Data Acquisition: The second step of the project.
  • We know that data are of three types: Structured, Semi-structured, Non-structured. So according to the data, we use ingestion techniques.
  • Also after knowing the types of data, we have to know that data is STREAMLINE or BATCH. If the data is streamlined means we have to process that data in real-time and if the data is batch then we can process it later according to our need. we can schedule it according to our needs.
  • Data Engineers are mainly involved here. 
3. Data Preparation: The most important step of the project.
  • The most important step is because, IF GARBAGE GO IN, GARBAGE COMES OUT. This step ensures that what we are going to feed our model. 
  • The data we are getting either is structured or semi-structured or non-structured, either is streamline or batch, we have to prepare our data for the model. Don't get confused here that structured data are in the correct format for the model directly, structured data means they are in tabular form, that shit. 
  • There is no clear bifurcation here that who will prepare data. Generally, both Data Engineers and Data Analysts work here.
  • The work of the Data Engineer here is to make data available every time for the process, and the Data Analyst process the data.
  • The data processing generally involve these main steps:
  1. Data cleansing
  2. Data types correction
  3. Tackling missing values
  4. Keeping important columns for analysis and model making
4. Data Analysis: The second most important step of the project.
  • This is the step where we understand our data.
  • We understand the important columns(Features) which can contribute to predicting Labels.
  • Here, we find different insights which give us information about the business that what has happened.
  • Here we can get some visualizations that can give many answers to the business questions.
  • This is the very step where you talk to the data and decide which models will be best suitable for the data.
  • This can be done by both Data Analyst or Data Scientist.

JOB RESPONSIBILITIES AS DATA SCIENTIST

5. Model Building: The step of Data Scientist.
  • This is the very step for which we are doing all the previous two steps ie. Data Preparation and Data Analysis.
  • If the previous two steps are not accurate then this step is also going to give you the wrong answer.
  • Here we give the data to the model and then we predict the result.
  • There are 1000s of models, so which model are we going to feed our data?
  • This totally depends on the types of data we have is either falling into Classification or Regression or Time Series Data and then according to that we use the Machine Learning model to predict the output.
6. Improving the Accuracy: Important step after Model Building
  • Only developing a model is not enough, after its development, we have to check its accuracy by using different Accuracy Metrics.
  • If the accuracy is satisfactory, then your work is done. But this is not that easy.
  • You need to backpropagate and check the data, check the parameters, and do some hyperparameter tuning. 
  • After tuning the model we check the accuracy, and if not satisfactory and we repeat and do more hyperparameter tuning and then again check the accuracy.
7. Report Making: Story Telling time.
  • This is the step in which you present your findings to the board of directors. 
  • This is the step where you can make them feel that they know about their business a little more today.
  • The way of storytelling matters much. 
Then we repeat all these steps for the next sprint of that project.
Make clear that one project can be divided into many sprints and in all these sprints we repeat these 7-8 steps. 
After completion of all the sprints then again we overall repeat these 7-8 steps and then check the accuracy and then we can say that the one Data Science project is complete.
I hope you understand the Life Cycle of Data Science. 

0 comments:

Best selling ebooks for bloggers

Best selling ebooks for bloggers
Best selling ebooks for bloggers