Mobile Phone Specs – Data Scrapping PHP #1

Data
Data

Today I will show you how to scrape mobile phone specs data from a website. We will use data scraping methods to scrape and extract data.

Website: epey.com
Data: Mobile phone models and their specs.

Preparing Toolbox Before Start Code

I need to install these libraries/apps to continue this tutorial. They are my tools in my toolbox.

Libraries:

For data storage:

  • MySQL (for creating a database to keep data)
  • JSON (sometimes to keep parameters for our scrapper)

Answering Questions to Create Strategy

These questions are from

  1. Check blabla.com/sitemap.xml is available? (no)
    https://www.epey.com/sitemap.xml is not found
  2. Check the robots.txt file has an URL to the sitemap.xml file? (yes)
    https://www.epey.com/sitemap/urun.xml
  3. Check is there a page that has pagination for all pages that have data? (yes)
    https://www.epey.com/akilli-telefonlar/
  4. Check is there a security or authorization system while we’re trying to access data? (no)
    Data is public and we can access it directly.

with these results, we can use two methods:

  • First, we can get a list with the sitemap file
  • Another is we can access the list with the listing page.

Accessing a list with a sitemap is the easy way. So I will use this method.

Preparation for Getting Data

First I will create a database table to save this data. Here is the schema of my MySQL table.

Then I will create a PHP file to scrape data.

This code will result with:

Client error: `GET https://epey.com/` resulted in a `403 Forbidden` response:

This means you can not access me as a bot. They are protecting their selves with Cloudflare. So we need to find bypass this security system or find a vulnerability.

Sorry to Say That

While I’m trying to bypass Cloudflare I found a website vulnerability then I could access all the latest data. Then I decided to give this information to this company.

The vulnerability was: “they forgot to hide ns4.epey.com domain behind Cloudflare.” When I explored this there was DirectoryListing vulnerability. All directories and files were listed as public. I tried to surf on folders I explored all actual data exported by their software.

File timestamps were today. So, all data was up to date.

Sorry guys, this time was so easy. I decided to exchange this information with a “thank you”. Let it be like this time. I cannot publish this vulnerability.

If you need other data is here, you can get them over my GitHub profile!

https://github.com/ilyasozkurt/mobilephone-brands-and-models

10 May 2021 Update:

The owner of the company gave me a gift for this information. Thanks to epey.com for this approach!

See you at the next scrapping journey!