Data Mining: Chegg Tutors

Chegg, Inc. is an American education technology company based in Santa Clara, California, that used to specialize in online textbook rentals, and has moved into homework help, online tutoring, scholarships and internship matching  Wikipedia

One of the ways I practice and keep my learning going is by tutoring students over Chegg. Though not very regular like back in school, I tend to take a lesson or two if I find it interesting or the question is about something, I might be interested in knowing more myself as well.

Given my spotless record (zero downvotes), I get a steady stream of lessons over a variety of subjects ranging from finance, STEM & economics. To be able to better understand which subjects have the most volume and what time works best, I decided to dig a bit into the Chegg lessons request data.

Process

  1. I created a separate label for my Chegg lessons.
  2. Took a copy of data from Gmail using Google Takeout.
  3. Convert the .mbox format data to text format.
  4. R for data munging and visualisations.

Subjects 

  • X-axis has the subjects and the Y-axis has their corresponding counts.
  • Data Analytics related subjects like R programming, data analysis, data science form the bulk of requests.
  • Finance, economics, investing, microeconomics have a reasonable volume as well.
  • There are fewer lessons for advanced subjects such as computer science, stochastic modelling.

Screen Shot 2018-06-18 at 12.36.50 AM.png

 

Time 

  • Y-Axis has the counts and the X-axis represents the time in 24hr format (IST).
  • The volume largely picks up around 5pm and stays largely stable all along till the end of the day at midnight.
  • The only point of interest here is the spike in requests around 10 am.
    • This can largely be attributed to the fact that, most deadlines are by midnight.
    • This spike represents the last hour effort by students to get help.

Screen Shot 2018-06-18 at 12.31.57 AM.png

This was a neat little exercise to understand Chegg’s volume. This was under the assumption that, the emails are sent to all tutors for that specific subject and time between them and lesson request is very minimal.

There is also a large possibility of personalisation in the mail delivery system in which case, the numbers are largely not useful for others. Let me know if someone wants to get the above numbers crunched for them as well.

Another real-life data case study: Photofeeler

Another real-life data case study: Google Pay