Log in Register

Log in

PHP Email Extractor Script with cURL and Regular Expression (2)

targeted email extractor

In this post, we need to make a small modification to previous PHP script for targeted email extraction.

First, we need to revisit the source file of the targeted web page, we can see that there are repeated blocks of agent contacts, with name, email and phone number. Total of 10 blocks per page. 

The plan is to use the script to "cut out" each block and stores into array, then extract name, email and phone number from each block.

As you can see, each block starts with <div class="negotiators-wrapper"> tag and ends with </div></div>. Note that both </div> tags are separated by carriage return and new line feed in this example. 


Email Extractor Script with PHP cURL and Regular Expression (1)

email spider

In this post, I will explain how to use PHP/cURL to extract / harvest email addresses from websites. The script will involve regular expression to match HTML tag for extraction.

If we send out email and address the person as "Dear Sir" or "Dear Madam", most likely the email will end up as spam. We do not want to just extract email addresses only, but also other information related to the email addresses, such as name, telephone, company, job position etc. When we send out email from the list collected, we want to be able to address the contact person as detail as possible, such as with his/her name, job position in the company, contact number etc.

Of course, please do not abuse the ability of email extraction and send out unwanted spam mails, products/services advertising, violate copyright law or disturbing network bandwidth etc. If you get into trouble, talk to your lawyer please.


How to Install Joomla 3.x on Local Windows PC using XAMPP for Website Development

Install Joomla with XAMPP

If you want to create a new website, you don't have to immediately register a domain name and buy a website hosting plan.

1) You might change your plan later and find that the domain name not suitable to your content.

2) You can complete the website development and release to live server after fill up with substantial contents.

3) You can show to your customers from your PC how the website works without connection to Internet and etc.

Assuming you already have XAMPP (if not, you can find the installation guide here), you can easily install Joomla to your local Windows PC and develop your website before release to internet. Here, I am using Joomla 3.1.5 as example. You can actually install Wordpress, Drupal and many types of PHP/MySQL related program with the same steps. Remember to turn on XAMPP on your PC.

Go to Joomla website to get the latest version of Joomla CMS. Select Download button. The current version is 3.1.5 as of this point. 


HTTP Get Request via PHP/cURL To Request Web Page Source File

php curl

After setting your PHP/MySQL environment with XAMPP, now we can start to create PHP script to retrieve a web page source file. There are many libraries in PHP to send request to our targeted web server and receive the response in a file format. One of the common way to achieve this is to use cURL extension in PHP. 

For now, we create a very simple PHP/cURL class to help us request web page from server. After that, we can proceed to "operate" source file to scrape information we need. Also, we need to modify and enhance the code of this class as we going further.

First, create a folder "scraper" under C:\xampp\htdocs, then create a text file using Notepad++ called httpcurl.php under directory C:\xampp\htdocs\scraper.


Install Development Environment with XAMPP on Windows PC

xampp logo

Before we start writing PHP code, it is good to install XAMPP package from Apache Friends. XAMPP is a full-featured AMPP (Apache, MySQL, PHP, Perl) and a non-commercial middleware stacks available on Linux. After installation, you can use your local PC or laptop to run web bot or spider scripts, or even test out a full-featured product site, such as Joomla or Wordpress before upload to live server. 

XAMPP is very stable and you can run screen scraping scripts for weeks from your PC without problems, assuming your scripts are clean, no memory leak etc. Domain name and web hosting are not needed to run your PHP/MySQL program on XAMPP. Sometimes you do not even need internet access during script writing.


My Joomla Experiences

joomla logo

I had my first website (an online shopping store) in 2005. It was written in ASP programming language by a small company with three young programmers. It was a complete online shopping system at that time, with a simple CMS, shopping cart, checkout system and payment via Paypal. I was able to upload products and articles through backend interface.


Books on Screen Scraping with PHP

There are a few books that worth reading if you are serious to learn how to write screen scrapers or webbots using PHP/cURL. Of course you can also find lots of information from internet, such as Stack Overflow, GitHub etc...

Currently I have few books on screen scraping and there are three that using PHP/cURL programming. I highly recommend these three books to those who want to learn screen scraping using PHP/cURL.

Webbots, Spiders and Screen Scrapers - Written by Michael Schrenk

Subscribe to this RSS feed