In May 2023 over 90,000 developers responded to Stack Overflow annual survey about how they learn and level up, which tools they're using, and which ones they want.
There are seven sections in this survey. The 2nd, 3rd, 4th and 5th sections will appear in a random order. Most questions in this survey are optional. Required questions are marked with *.
-
Basic Information
-
Education, Work, and Career
-
Technology and Tech Culture
-
Stack Overflow Usage + Community
-
Artificial Intelligence
-
Professional Developer Series (Optional)
-
Final Questions
I chose to analyze the data inputs which are listed below:
- Country
- EdLevel - Education Level
- Experience - Coding working experience in years
- CompConvertedYearly - Devs' salary per year
The data is chosen to predict the yearly salary depending on your country, experience, and level of education.
- CompConvertedYearly
Random forests are a way of averaging multiple deep decision trees, trained on different parts of the same training set, with the goal of reducing the variance.
Below is the description of the data cleaning steps before substantial training.
- Data preprocessing.
- Data cleaning from null values, outliers.
- Replacing columns with object data type values to numeric ones where needed.
- Choosing a regression model.
The link which will help you with large files especially datasets😃
Example👇
git lfs migrate import --include="*.csv"