Exploring Udemy Programs Traits Utilizing Google Large Question

Spread the love


Introduction

Google Large Question is a safe, accessible, fully-manage, pay-as-you-go, server-less, multi-cloud knowledge warehouse Platform as a Service (PaaS) service offered by Google Cloud Platform that helps to generate helpful insights from massive knowledge that can assist enterprise stakeholders in efficient decision-making. Google Large Question supplies built-in machine studying functionality and SQL question engine to write down SQL, which can be utilized for analyzing giant datasets. We will develop a safe and extremely accessible knowledge warehouse utilizing Google Large Question.

Udemy is without doubt one of the hottest on-line studying platforms. Udemy supplies high-quality studying content material in design, advertising and marketing, improvement, finance & accounting, IT & software program, pictures & video, well being & wellness, workplace productiveness, and so forth. in several languages. Udemy is a vital supply of knowledge for a lot of college students, freelancers, and dealing professionals. Udemy is without doubt one of the finest platforms to study Python and React and to arrange for AWS and Azure certification. Nevertheless, learners is likely to be excited about taking programs from instructors extra aligned to their job titles, programs taken by many customers, and authorized builders like AWS licensed, Salesforce licensed, and so forth. To handle this drawback, we’ll construct an information warehouse for exploring Udemy course developments and insights utilizing Google Large Question.

Virtually all main cloud service suppliers, like Google, Amazon, Microsoft, and so forth., in the present day present knowledge warehouse instruments. Cloud-based knowledge warehouse instruments are extremely scalable and supply catastrophe restoration. Utilizing a knowledge warehouse we will retailer and analyze a considerable amount of knowledge and produce helpful knowledge insights with the assistance of information visualizations and experiences. Nicely-designed knowledge warehouses ship high-quality knowledge and enhance question efficiency by correctly defining the kind of knowledge, utilizing knowledge mining, synthetic intelligence, and so forth., and serving to in making smarter selections.

This text will focus on the strategy of constructing an information warehouse for exploring Udemy course developments and insights utilizing Google Large Question which is able to assist us to establish issues comparable to classifying programs based mostly on teacher job titles, the common ranking of all of the programs of an teacher, and so forth.

Studying Aims

On this article, we’ll study:

  1. Learn how to construct an information warehouse utilizing Google Large Question
  2. Learn how to use Google Large Question Sandbox
  3. Acquire data about creating datasets and tables in Large Question
  4. Querying Udemy knowledge in Large Question SQL question engine

This text was revealed as part of the Information Science Blogathon.

Desk of Contents

Venture Description

This venture goals to develop an information warehouse for exploring Udemy course developments and insights utilizing Google Large Question, which is able to assist us to establish issues comparable to classifying programs based mostly on teacher job titles, the common ranking of all of the programs of an teacher, and so forth. We’ll take the Udemy programs and teacher knowledge from Kaggle and obtain it into our native machine. The information downloaded from Kaggle is in CSV format.

Now, we’ll create the desk contained in the dataset within the Google Cloud Platform SQL question engine from the downloaded knowledge. After creating the desk, we’ll format the desk schema and carry out knowledge cleansing. We will carry out querying on imported knowledge to generate helpful insights comparable to classifying programs based mostly on teacher job titles, figuring out programs having most scores, instructors whose programs have good scores, and so forth.

Presently, we’ve got knowledge from just one supply, and we’re importing CSV format knowledge by means of batch ingestion utilizing the Google Cloud Platform UI interface. We will additionally import knowledge from a number of sources comparable to Cloud Storage, Azure Storage Account, and so forth. Aside from importing knowledge by means of the Google Cloud Platform UI interface, customers also can import knowledge utilizing CLI, and REST APIs, utilizing knowledge pipeline choices comparable to Cloud Dataflow, Cloud Dataproc, and so forth. Google Large Question additionally helps file codecs comparable to Parquet, Avro, and so forth., for knowledge loading and processing. Builders also can save, share and run queries within the SQL question engine on the scheduled time.

"

By querying Udemy knowledge, customers can decide which programs they need to buy based mostly heading in the right direction period, course scores, teacher job titles, course recognition, and so forth. Customers can save and share these queries. Customers also can save the outcomes of those queries to create dashboards utilizing Energy BI, Looker Studio, Tableau, and so forth. Customers also can extract extra knowledge from Udemy utilizing internet scraping methods and ingest it in Google Large Question SQL question engine to maintain the information up to date in order that customers can get extra correct outcomes.

Drawback Assertion

On this article, we shall be utilizing Udemy Programs Information 2023 dataset from Kaggle to develop an information warehouse for exploring Udemy course developments and insights utilizing Google Large Question, which is able to assist us to establish issues comparable to classifying programs based mostly on teacher job titles, the common ranking of all of the programs of an teacher, classifying programs based mostly on the variety of lectures within the course, figuring out not too long ago revealed and modified programs on Udemy, and so forth.

As already mentioned, we will extract extra knowledge from Udemy utilizing internet scraping methods as new programs and instructors carry on rising on the Udemy platform. We’ll create tables contained in the dataset within the Google Cloud Platform SQL question engine to import the programs and teacher knowledge downloaded from Kaggle. After desk creation, we’ll carry out knowledge cleansing and desk schema formatting.

Problem Statement

We will save, share and run queries within the SQL question engine on the scheduled time. Aside from this, we will additionally save the outcomes of the question execution in order that it may be utilized queries to create dashboards utilizing Energy BI, Looker Studio, Tableau, and so forth. This venture goals to develop an information warehouse utilizing Udemy knowledge, querying which customers can establish not too long ago revealed and modified programs on Udemy, classify programs based mostly heading in the right direction period and course scores, establish common scores of all of the programs of an teacher, classify programs based mostly on the variety of lectures within the course, and so forth.

Conditions

Beneath are some conditions to undertake this venture:

  1. Understanding of Information Warehouse: On this venture, we’ll construct an information warehouse to discover Udemy course developments and insights utilizing Google Large Question. Subsequently, understanding what an information warehouse is, why an information warehouse is beneficial, and what the information warehouse supplies by varied cloud distributors, and so forth., are necessary.
  2. Expertise with Google Cloud Platform: We’ll use Google Large Question, an information warehouse service obtainable contained in the Google Cloud Platform. So, expertise with the Google Cloud Platform is necessary to simply navigate the platform and perceive the useful resource creation course of, roles & entry permissions, and so forth.
  3. Expertise with SQL queries: We shall be writing queries within the SQL question engine to generate helpful insights, comparable to classifying programs based mostly on teacher job titles, figuring out programs having most scores, instructors whose programs have good scores, and so forth.
  4. Familiarity with Udemy and Kaggle: Understanding what Kaggle is, how it’s helpful for downloading datasets, and fundamental familiarity with the net studying platform Udemy shall be useful whereas creating the venture.
  5. Understanding of Google Large Question: As this venture makes use of Google Large Question for creating an information warehouse, it might be useful to have an understanding of Google Large Question’s frequent knowledge operations, ideas, and methods.

Realizing concerning the Dataset

On this article, we shall be utilizing Udemy Programs Information 2023 dataset from Kaggle. The dataset might be downloaded by visiting https://www.kaggle.com/datasets/ankushbisht005/udemy-courses-data-2023. The purpose behind utilizing this dataset is to establish not too long ago revealed and modified programs on Udemy, classify programs based mostly heading in the right direction period and course scores, establish common scores of all of the programs of an teacher, classify programs based mostly on the variety of lectures within the course, and so forth.

The Udemy Programs Information 2023 dataset has two recordsdata named programs.csv and instructors.csv. The programs.csv comprises info associated to the Udemy programs. The instructors.csv comprises the knowledge associated to the Udemy instructors. The programs.csv comprises 11 columns and 83,105 rows. The instructors.csv comprises 10 columns and 32,234 rows. The programs.csv comprises the instructors_id column, which provides the id of the trainer of the course. The instructors_id column is used to type the relation between programs.csv and instructors.csv.

Knowing about the dataset

The programs.csv comprises the distinctive id of the course, the course title, course ranking, course period, the variety of lectures within the Udemy course, the URL of the course, the creation date of the course, the date on which the course was final modified, variety of evaluations of the course and id of the course teacher. The instructors.csv comprises the distinctive id of the trainer, the title of the course teacher, the show title of the course teacher, the title of the course teacher, the job title of the course teacher, the trainer class, the URL of the trainer, initials of the course teacher, 50 X 50 picture of the trainer and 100 X 100 picture of the trainer. To study extra concerning the dataset, go to https://www.kaggle.com/datasets/ankushbisht005/udemy-courses-data-2023.

Method to the Venture

On this venture, we shall be utilizing Udemy Programs Information 2023 dataset from Kaggle to develop an information warehouse for exploring Udemy course developments and insights utilizing Google Large Question, which is able to assist us to establish issues comparable to classifying programs based mostly on teacher job titles, the common ranking of all of the programs of an teacher, classifying programs based mostly on the variety of lectures within the course, figuring out not too long ago revealed and modified programs on Udemy, and so forth.

Comply with the beneath steps to create an information warehouse utilizing Udemy Programs Information 2023 dataset from Kaggle:

Step 1: Create a New Venture utilizing Large Question Sandbox

To work with Google Large Question, builders can both create an account on the Google Cloud Platform or make the most of the Google Large Question Sandbox. I’ll use Google Large Question Sandbox on this article to create an information warehouse. The venture is used for organizing all of the Google cloud assets in GCP. Utilizing Identification and Entry Administration, we will specify which consumer is allowed to entry which assets in a venture.

Go to the beneath hyperlink to make use of the Google Large Question Sandbox: https://console.cloud.google.com/bigquery

Now, comply with the steps described beneath:

1. Click on on NEW PROJECT, then Present the Venture Title as Udemy-Venture and Location on the following display screen. Click on CREATE.

Step 1: Create a new Project using Big Query Sandbox
"

2. Udemy-Venture is efficiently created. Choose the Udemy-Venture to view the venture and handle consumer permissions and assets contained in the venture.

Google Big Query | trends

Step 2: Obtain the Dataset from Kaggle and Put it aside on the Native Machine

Go to https://www.kaggle.com/datasets/ankushbisht005/udemy-courses-data-2023 and click on Obtain. After unzipping the downloaded zip file, one can find two CSV recordsdata named  programs.csv and instructors.csv. The programs.csv comprises info associated to the Udemy programs. The instructors.csv comprises the knowledge associated to the Udemy instructors. The programs.csv comprises 11 columns and 83,105 rows. The instructors.csv comprises 10 columns and 32,234 rows. The instructors_id column is used to type the relation between programs.csv and instructors.csv.

Google Big Query

Step 3: Creating Dataset Inside Google Large Question Useful resource

Comply with the steps described beneath to create a dataset inside Google Large Question:

1.     Choose the title of the Venture -> Large Question within the assets card -> Click on Create dataset.

Google Big Query

2.     Present Udemy_dataset as Dataset ID, select Area in Location Kind, select Asia-south1 (Mumbai) as Area, and allow desk expiration.

"

3.     Click on CREATE DATASET

CREATE DATASET

Step 4: Create Tables within the Dataset Inside Google Large Question Useful resource

Comply with the steps described beneath to create tables within the dataset inside Google Large Question:

1.     Choose Udemy_dataset dataset -> Create desk

trends

2.     Select to create desk from add, choose the programs.csv file downloaded from Kaggle, choose file format as CSV, present programs as desk title, Native desk as a desk sort, select Auto to detect within the schema, and partition and cluster settings as per our necessities. Within the Advance choices, present 1 within the header rows to skip and select Encryption appropriate as per the requirement. Click on CREATE TABLE.

Google big Query

3.     Now, once more choose the Udemy_dataset dataset
-> Create desk. Select to create a desk from add, choose the instructors.csv file downloaded from Kaggle, choose file format as CSV, present instructors as desk title, Native desk as a desk sort, select Auto to detect within the schema, and partition and cluster settings as per our necessities. Within the Advance choices, present 1 within the header rows to skip and select Encryption appropriate as per the requirement. Click on CREATE TABLE.

"

Step 5: Verifying Tables Schema and Previewing Information

Go to the programs desk, and cross-verify the sector title, sort, and mode within the schema tab. View the row entry insurance policies of the programs desk and edit desk schema, if required. View the desk data within the DETAILS tab and edit the main points in case of corrections. We will additionally preview, copy, refresh, and share the information. Equally, go to the instructors’ desk, and cross-verify the sector title, sort, and mode within the schema tab. View the row entry insurance policies of the instructors’ desk and edit the desk schema if required.

"
"

To see 5000 data from the programs desk, execute the beneath question within the SQL question engine:

SELECT * FROM `udemy-project-381211.Udemy_dataset.programs` LIMIT 5000
trends

To see 5000 data from the instructors’ desk, execute the beneath question within the SQL question engine:

SELECT * FROM `udemy-project-381211.Udemy_dataset.instructors` LIMIT 5000
trends

A. Discover the title of all programs whose scores are better than 4.5 and greater than 10000 individuals has given the ranking for these programs. Show these programs in lowering order in fact scores and creation date.

SELECT title AS course_title FROM `udemy-project-381211.Udemy_dataset.programs` 
WHERE ranking>4.5 and num_reviews>10000
ORDER BY ranking DESC, created DESC
"

B. Discover the main points of the ten newly created Udemy programs.

SELECT  * FROM `udemy-project-381211.Udemy_dataset.programs` 
ORDER BY created DESC
LIMIT 10
Google big Query

C.    Discover the main points of the ten not too long ago modified Udemy programs.

SELECT  * FROM `udemy-project-381211.Udemy_dataset.programs` 
ORDER BY last_update_date DESC
LIMIT 10
"

D. Discover the main points of the JavaScript programs whose scores are better than 4 and greater than 20000 individuals have given the ranking for these programs.

SELECT *  FROM `udemy-project-381211.Udemy_dataset.programs` 
WHERE title LIKE '%JavaScript%' AND
ranking>4 AND num_reviews>20000
"

E. Show the title, ranking, and variety of lectures of the Udemy React programs which has better than 50-course lectures.

SELECT title AS course_title, ranking AS course_rating, num_published_lectures as course_lectures  
FROM `udemy-project-381211.Udemy_dataset.programs` 
WHERE title LIKE '%React%' AND
num_published_lectures>50
"

F. Discover the variety of programs, and course teacher title developed by the course instructors with course scores better than common scores of the programs.

SELECT COUNT(programs.id), instructors.title
FROM `Udemy_dataset.instructors` instructors
LEFT JOIN `Udemy_dataset.programs` programs
ON instructors.id = programs.instructors_id
WHERE programs.instructors_id IN 
(SELECT instructors_id FROM `Udemy_dataset.programs` 
WHERE ranking >(SELECT AVG(ranking) FROM `Udemy_dataset.programs`))
GROUP BY instructors.title
"

G. Show the course teacher title and title of the Udemy programs created by folks whose job title is an online developer
and whose course scores are better than 4.2.

SELECT instructors.display_name, programs.title as course_title
FROM `Udemy_dataset.instructors` instructors
LEFT JOIN `Udemy_dataset.programs` programs
ON instructors.id = programs.instructors_id
WHERE instructors.job_title LIKE '%Internet developer%' and programs.ranking>4.2
trends

H. Show the course title, course teacher title, scores, and course period of the Udemy programs the place the course period is larger than 40 minutes, 40 hours, or 40 questions.

SELECT programs.title as course_title, 
instructors.display_name as course_instructor, programs.ranking, programs.period
FROM `Udemy_dataset.instructors` instructors
LEFT JOIN `Udemy_dataset.programs` programs
ON instructors.id = programs.instructors_id
WHERE 
CASE WHEN programs.period LIKE '%.%'
      THEN CAST(LEFT(programs.period, STRPOS(programs.period,'.')-1) AS FLOAT64)>40
    WHEN programs.period LIKE '%complete%'
      THEN CAST(LEFT(programs.period, STRPOS(programs.period,'t')-1) AS FLOAT64)>40
    WHEN programs.period LIKE '%ques%'
      THEN CAST(LEFT(programs.period, STRPOS(programs.period,'q')-1) AS FLOAT64)>40
END
"

I.     Show the course teacher title and title of the Udemy programs created by licensed builders.

SELECT programs.title as course_title, instructors.display_name as course_instructor
FROM `Udemy_dataset.instructors` instructors 
LEFT JOIN `Udemy_dataset.programs` programs
ON instructors.id = programs.instructors_id
WHERE instructors.job_title LIKE '%licensed%'
Google big Query

J.     Discover all of the distinct job titles of Udemy course instructors.

SELECT DISTINCT instructors.job_title
FROM `Udemy_dataset.instructors` instructors
Google big Query | trends

Okay. Discover the title, scores, and teacher of all programs whose scores are better than 4 and greater than 17000 individuals have given the ranking for these programs. Show these programs in lowering order in fact scores.

SELECT programs.title as course_title, instructors.display_name as course_instructor, programs.ranking
FROM `Udemy_dataset.instructors` instructors
LEFT JOIN `Udemy_dataset.programs` programs
ON instructors.id = programs.instructors_id
WHERE programs.ranking > 4 and programs.num_reviews > 17000
ORDER BY programs.ranking DESC
Google big Query | trends

L. Discover the main points of the 20 newly created Azure Udemy programs.

SELECT  * FROM `udemy-project-381211.Udemy_dataset.programs` 
WHERE title LIKE '%Azure%'
ORDER BY created DESC
LIMIT 20
Google big Query

M. Discover the main points of the 15 newly created AWS Udemy programs.

SELECT  * FROM `udemy-project-381211.Udemy_dataset.programs` 
WHERE title LIKE '%AWS%'
ORDER BY created DESC
LIMIT 15
Google big Query | trends

N. Show all the main points of the Udemy SAS programs which have course lectures between 112 and 156 in rising order in fact title.

SELECT *  FROM `udemy-project-381211.Udemy_dataset.programs` 
WHERE title LIKE '%SAS %' AND
num_published_lectures BETWEEN 112 AND 156
ORDER BY title
Google big Query

O. Show the course teacher title, title, scores, and the course evaluations of the highest two Udemy Azure Information Manufacturing unit programs based mostly heading in the right direction scores and the variety of course evaluations.

SELECT programs.title as course_title, 
instructors.display_name as course_instructor, programs.ranking, programs.num_reviews
FROM `Udemy_dataset.instructors` instructors
LEFT JOIN `Udemy_dataset.programs` programs
ON instructors.id = programs.instructors_id
WHERE programs.title LIKE '%Azure Information Manufacturing unit %'
ORDER BY programs.num_reviews DESC, programs.ranking DESC  
LIMIT 2
Google big Query

P. Show the course teacher title, title, scores, and course evaluations of the most effective Udemy Salesforce course based mostly heading in the right direction scores and the variety of course evaluations.

SELECT programs.title as course_title, instructors.display_name as course_instructor, 
programs.ranking, programs.num_reviews
FROM `Udemy_dataset.instructors` instructors
LEFT JOIN `Udemy_dataset.programs` programs
ON instructors.id = programs.instructors_id
WHERE programs.title LIKE '%Salesforce %'
ORDER BY programs.num_reviews DESC, programs.ranking DESC  
LIMIT 1
Google big Query

From the above, we all know the way to construct an information warehouse for exploring Udemy course developments and insights utilizing Google Large Question. Beneath are some key developments and insights found whereas exploring the Udemy programs knowledge:

1. The preferred JavaScript programs have a mean ranking better than 4.6.

2. Solely 34 Udemy programs are created by instructors whose job title is an online developer and whose course scores are better than 4.2.

3. Virtually 150 Udemy programs are created by AWS, Azure, GCP, or Salesforce-certified builders.

4. Ramesh Retnasamy creates the most well-liked Azure Information Manufacturing unit course on Udemy.

5. Just lately created Azure and AWS programs are very fashionable on Udemy.

6. Udemy customers choose to enroll in SAS programs with about 100-150 lectures with good scores.

Conclusion

On this article, we’ve got seen the way to construct an information warehouse for exploring Udemy course developments and insights utilizing Google Large Question. A knowledge warehouse shops and analyze a considerable amount of knowledge and produce helpful knowledge insights with the assistance of information visualizations and experiences. We have now seen the way to create a desk by importing knowledge from Kaggle in Google Large Question. We additionally perceive the way to create relationships between tables to know knowledge higher. We checked out the way to analyze the information with the assistance of queries to get significant perception from the information. Beneath are the main takeaways from the above article:

  1. We have now seen how we will create tables in Google Large Question.
  2. We understood the way to question knowledge within the Large Question SQL question engine.
  3. We have now additionally recognized particulars of the Udemy programs created by folks whose job title is an online developer and whose course scores are better than 4.2.
  4. We have now additionally seen what number of programs on Udemy are created by licensed builders.
  5. We have now discovered the newly created Azure and AWS programs on Udemy foundation the developments.
  6. Aside from that, we’ve got additionally seen different course developments on Udemy by exploring Udemy knowledge contained in the SQL question engine.

The media proven on this article isn’t owned by Analytics Vidhya and is used on the Writer’s discretion. 

Leave a Reply

Your email address will not be published. Required fields are marked *