PHP Web Scraper

PHP Web Scraper (13)

HTTP Get Request via PHP/cURL To Request Web Page Source File

php curl

After setting your PHP/MySQL environment with XAMPP, now we can start to create PHP script to retrieve a web page source file. There are many libraries in PHP to send request to our targeted web server and receive the response in a file format. One of the common way to achieve this is to use cURL extension in PHP. 

For now, we create a very simple PHP/cURL class to help us request web page from server. After that, we can proceed to "operate" source file to scrape information we need. Also, we need to modify and enhance the code of this class as we going further.

First, create a folder "scraper" under C:\xampp\htdocs, then create a text file using Notepad++ called httpcurl.php under directory C:\xampp\htdocs\scraper.

Read more...

Install Development Environment with XAMPP on Windows PC

xampp logo

Before we start writing PHP code, it is good to install XAMPP package from Apache Friends. XAMPP is a full-featured AMPP (Apache, MySQL, PHP, Perl) and a non-commercial middleware stacks available on Linux. After installation, you can use your local PC or laptop to run web bot or spider scripts, or even test out a full-featured product site, such as Joomla or Wordpress before upload to live server. 

XAMPP is very stable and you can run screen scraping scripts for weeks from your PC without problems, assuming your scripts are clean, no memory leak etc. Domain name and web hosting are not needed to run your PHP/MySQL program on XAMPP. Sometimes you do not even need internet access during script writing.

Read more...

Books on Screen Scraping with PHP

There are a few books that worth reading if you are serious to learn how to write screen scrapers or webbots using PHP/cURL. Of course you can also find lots of information from internet, such as Stack Overflow, GitHub etc...

Currently I have few books on screen scraping and there are three that using PHP/cURL programming. I highly recommend these three books to those who want to learn screen scraping using PHP/cURL.

Webbots, Spiders and Screen Scrapers - Written by Michael Schrenk

Read more...

The Beginning...

In 2010, I downloaded and installed an open source internet mall (ECMALL from China) as part of my learning for internet marketing. ECMALL enabled multiple users to open web store and sell products. It also supported transactions via Paypal. It was and still is a popular open source internet mall in China and other countries.

The installation process was easy with the instructions given. With no much knowledge in PHP, I changed the language files and created an English version of ECMALL. However, after released for two months, there was not even a single person came to sign up as seller! The website was still very new and with little content, no one was able to find it through search engine. I need to decide closing down the website or find a way to attract buyers and sellers.

Read more...
Subscribe to this RSS feed