Web scraping is a technique to exrtract large amount of data from websites whereby the data is extracted and saved to a local file of computer or to a database. It is the project to scrap the questions links with their statements from the coding website. This website contains two sections of questions of python and java .Each of them contains different sections with coding problems.
- Load the document from which you want to scrap the data.
- Parse or to interpret the document to make the searching possible.
- Simply extract the data from the web pages.
- Transform the data into useful format.
user ---> Request ---> Server -----> Response ----->Html code
Beautiful soup is a python library which is used to puling the data from the html pages $ xml files. It provides efficient searching and modification techniques.
HTML code is like a Tree of tags and Beautiful soup is used to parse the tree for extracting the data from these tags.
<html>
<head> <body>
<meta> <title> <p> <p>
- requests (pip install requests)
- Beautiul Soup (pip install beautifulsoup4)
- UserAgent (pip install fake-useragent)
- xlsxwriter (pip install xlsxwriter)
- xlrd (pip install xlrd)
Below are the link of java and python sections of coding website which is used to scrap the data:
Scraped_questions.xlsx is the ouput file which contains two worksheet one for java section and another for python section.Each worksheet contains section link, questions links of their respective sections and problem statements of all questions.