Investigating Stack Overflow Survey

Baescott
4 min readJul 14, 2021

Part 1. Introduction

Once I searching a nice survey data to make one, I discovered that the survey from Stack Overflow(referred to as SO), Stack Overflow Annual Developer Survey, is good to be analyzed(link below).

After a period of time, I fortunately got an opportunity to enroll the lecture in Udacity(https://www.udacity.com), and they’re using this survey to explain how the data science work flows. It was pretty interesting example for me. So I decided to write SO survey analysis as my first Medium post.

Part 2. Take A Look at Data

I downloaded and used the data of 2020 via link above. At first, we should load data from my local directory and see how it looks like.

The shape of our data

This data has 64,461 rows and 61 columns. At first glance, we could see that there are various columns. We should check each column’s data type.

Each column’s data type

We could see that only 5 columns are numeric among 61. Numeric type columns are age, respondent and salary related things. Object type columns are almost about developing characteristics, education and occupation.

While I’m looking at this data, I discovered that there are some columns which give me some inquiries. Those columns are MainBranch (whether respondents are professional or not), JobSat (how respondents satisfy with their job), SOPartFreq (how frequently respondents participate in Stack Overflow) and ConvertedComp (converted salary according to their pay cycle). These make me set those 3 main questions.

  • Do professional developers feel more satisfaction than non-professionals? ( MainBranchand JobSat)
  • Do professionals participate more frequently in SO than non-professionals? ( MainBranchand SOPartFreq)
  • Can I predict someone’s converted compensation with other survey data? ( set ConvertedComp as y)

So I decided to make a change in analyzing direction to answer those.

Part 3–1. How Do Professionality Related To Their Job Satisfaction?

Before we get started, we need to see the distribution of professionals and non-professionals. As we can see that the most of respondents are professional developers.

Now we have to see how many respondents are satisfy with their jobs. The numbers written in below table are representing percentages.

We can see that the sum of percentages of professional developers replied to Very satisfied and Slightly satisfied is bigger than that of non-professional developers. However, we're surprised at the distribution of replies; they're almost similar!

Part 3–2. How Do Professionality Related To Frequency of Participation in Stack Overflow?

Let’s see how their relationship looks like. The numbers are percentages too.

As we can see, our 2nd questions also turned out it really does. The sum of percentages of professional developers replied to Multiple times per day and Daily or almost daily is bigger than that of non-professionals.

Part 3–3. How Could Other Questions in Survey Effect To Personal Salary?

We’ll finally do simple modeling for predicting each respondent’s salary. I built a simple linear regression model for do this. If you wonder how to do this, check the link below.

Let’s see our model works well. To see this, we should check the metric called R squared score, which is related to model’s performance. If this metric close to 1, the model works well.

As you can see, we could say that personal salary can be predicted if we have SO survey data.

Part 4. Conclusion & Further works

We take a very simple look of Stack Overflow Annual Developer Survey 2020. With this elaborately designed survey, we could get many insights such as 1st and 2nd questions’ answers. Moreover, we could even predict personal salary.

For more accurate and deep analysis, however, we need to download at least 3 survey data from Stack Overflow and do some time-relevant analysis. There could be some points like seasonal effect, annual change in personal salary or significant change in distribution of categorical data. So if I do analysis about survey later, I’ll do a lot more analysis about this.

--

--

Baescott
0 Followers

Doing data science in insurance company