Web Scraping Tutorial (Less Than 5 Minutes!)

Web Scraping Tutorial

Web Scraping Tutorial Using PHP in Less Than 5Minutes

“Being a good citizen in a world full of spiders”  – Dimitrios Kouzis
There are a few things to be aware of – let gets start web scraping tutorial with the easiest one. Before developing spider please check robots.txt file. You will see which directories are allow or disallow.
Example of a robots.txt file at http://www.google.com/robots.txt

Disallow: /search 
Allow: /search/about
Allow: /search/howsearchworks
Disallow: /sdch
Disallow: /groups
Disallow: /index.html?
Disallow: /?
Allow: /?hl=

Disallow directories should be excluded from your crawler.

More similar examples:
http://ebay.com/robots.txt
https://www.amazon.com/robots.txt

Watch this video – you will be super pumped to learn Web Scraping

We have used ‘Simple HTML DOM Parser’ to extract data from ‘ http://www.example.com/ ‘ webpage.

Go to https://sourceforge.net/projects/simplehtmldom/ and download “PHP Simple HTML DOM Parser”
Unzip and copy ‘simple_html_dom.php’ to your ‘lib’ folder.

Add ‘Firebug’ latest Firefox browser add-ons to analysis website contents. From here: https://addons.mozilla.org/en-US/firefox/addon/firebug/

“PHP Simple HTML DOM Parser” Manual is available here.

Source Codes: Web-Scraping

Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like