Today I will show you how to scrape mobile phone specs data from a website. We will use data scraping methods to scrape and extract data.

Data: Mobile phone models and their specs.

Preparing Toolbox Before Start Code

I need to install these libraries/apps to continue this tutorial. They are my tools in my toolbox.


For data storage:

  • MySQL (for creating a database to keep data)
  • JSON (sometimes to keep parameters for our scrapper)

Answering Questions to Create Strategy

These questions are from

  1. Check is available? (no) is not found
  2. Check the robots.txt file has an URL to the sitemap.xml file? (yes)
  3. Check is there a page that has pagination for all pages that have data? (yes)
  4. Check is there a security or authorization system while we’re trying to access data? (no)
    Data is public and we can access it directly.

with these results, we can use two methods:

  • First, we can get a list with the sitemap file
  • Another is we can access the list with the listing page.

Accesing list with sitemap is the easy way. So I will use this method.

Preparation for Getting Data

First I will create a database table to save this data. Here is the schema of my mysql table.

CREATE TABLE `mobile_phone_specs` (
  `id` int(11) NOT NULL,
  `name` varchar(255) CHARACTER SET utf8mb4 NOT NULL,
  `manufacturer` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
  `model` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
  `data` text CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

Then I will create a php file to scrape data.


//Change error mode to display all

//Change error displaying mode on
ini_set('display_errors', 1);

//Include libraries which we installed with composer
require 'vendor/autoload.php';

$domain = '';

//Create a client for http requests
$client = new \GuzzleHttp\Client([
    'base_uri' => $domain, //Set a base url before make requests
    'timeout' => 3.0 //If there is no answer after x seconds stop waiting

//Handle exceptions
try {

    //Get response object from client
    $response = $client->get('/');

    //print http response code from response object
    print $response->getBody()->getContents();

} catch (\GuzzleHttp\Exception\ConnectException $exception) {

    //Print client exception if is there any error.
    die('could not connect to host : ' . $exception->getMessage());

} catch (\GuzzleHttp\Exception\ClientException $exception) {

    //Print client exception if is there any error.


This code will result with:

Client error: `GET` resulted in a `403 Forbidden` response:

This means you can not access me as a bot. They are protecting their selves with Cloudflare. So we need to find bypass this security system or find a vulnerability.

While I’m trying to bypass Cloudflare I found a website vulnerability then I could access all the latest data. Then I decided to give this information to this company.

The vulnerability was: “they forgot to hide domain behind Cloudflare.” When I explored this there was DirectoryListing vulnerability. All directories and files were listed as public. I tried to surf on folders I explored all actual data exported by their software.

File timestamps was today. So, all data was up to date.

Sorry guys, this time was so easy. I decided to exchange this information with a “thank you”. Let it be like this time. I cannot publish this vulnerability.

10 May 2021 Update:

The owner of company gave me a gift for this information. Thanks to for this approach!

See you at next scrapping journey!