Email Extractor Script with PHP cURL and Regular Expression (1)

email spider

In this post, I will explain how to use PHP/cURL to extract / harvest email addresses from websites. The script will involve regular expression to match HTML tag for extraction.

If we send out email and address the person as "Dear Sir" or "Dear Madam", most likely the email will end up as spam. We do not want to just extract email addresses only, but also other information related to the email addresses, such as name, telephone, company, job position etc. When we send out email from the list collected, we want to be able to address the contact person as detail as possible, such as with his/her name, job position in the company, contact number etc.

Of course, please do not abuse the ability of email extraction and send out unwanted spam mails, products/services advertising, violate copyright law or disturbing network bandwidth etc. If you get into trouble, talk to your lawyer please.

Read more...

How to Install Joomla 3.x on Local Windows PC using XAMPP for Website Development

Install Joomla with XAMPP

If you want to create a new website, you don't have to immediately register a domain name and buy a website hosting plan.

1) You might change your plan later and find that the domain name not suitable to your content.

2) You can complete the website development and release to live server after fill up with substantial contents.

3) You can show to your customers from your PC how the website works without connection to Internet and etc.

Assuming you already have XAMPP (if not, you can find the installation guide here), you can easily install Joomla to your local Windows PC and develop your website before release to internet. Here, I am using Joomla 3.1.5 as example. You can actually install Wordpress, Drupal and many types of PHP/MySQL related program with the same steps. Remember to turn on XAMPP on your PC.

Go to Joomla website to get the latest version of Joomla CMS. Select Download button. The current version is 3.1.5 as of this point. 

Read more...

HTTP Get Request via PHP/cURL To Request Web Page Source File

php curl

After setting your PHP/MySQL environment with XAMPP, now we can start to create PHP script to retrieve a web page source file. There are many libraries in PHP to send request to our targeted web server and receive the response in a file format. One of the common way to achieve this is to use cURL extension in PHP. 

For now, we create a very simple PHP/cURL class to help us request web page from server. After that, we can proceed to "operate" source file to scrape information we need. Also, we need to modify and enhance the code of this class as we going further.

First, create a folder "scraper" under C:\xampp\htdocs, then create a text file using Notepad++ called httpcurl.php under directory C:\xampp\htdocs\scraper.

Read more...

Install Development Environment with XAMPP on Windows PC

xampp logo

Before we start writing PHP code, it is good to install XAMPP package from Apache Friends. XAMPP is a full-featured AMPP (Apache, MySQL, PHP, Perl) and a non-commercial middleware stacks available on Linux. After installation, you can use your local PC or laptop to run web bot or spider scripts, or even test out a full-featured product site, such as Joomla or Wordpress before upload to live server. 

XAMPP is very stable and you can run screen scraping scripts for weeks from your PC without problems, assuming your scripts are clean, no memory leak etc. Domain name and web hosting are not needed to run your PHP/MySQL program on XAMPP. Sometimes you do not even need internet access during script writing.

Read more...

My Joomla Experiences

joomla logo

I had my first website (an online shopping store) in 2005. It was written in ASP programming language by a small company with three young programmers. It was a complete online shopping system at that time, with a simple CMS, shopping cart, checkout system and payment via Paypal. I was able to upload products and articles through backend interface.

Read more...

Books on Screen Scraping with PHP

There are a few books that worth reading if you are serious to learn how to write screen scrapers or webbots using PHP/cURL. Of course you can also find lots of information from internet, such as Stack Overflow, GitHub etc...

Currently I have few books on screen scraping and there are three that using PHP/cURL programming. I highly recommend these three books to those who want to learn screen scraping using PHP/cURL.

Webbots, Spiders and Screen Scrapers - Written by Michael Schrenk

Read more...

The Beginning...

In 2010, I downloaded and installed an open source internet mall (ECMALL from China) as part of my learning for internet marketing. ECMALL enabled multiple users to open web store and sell products. It also supported transactions via Paypal. It was and still is a popular open source internet mall in China and other countries.

The installation process was easy with the instructions given. With no much knowledge in PHP, I changed the language files and created an English version of ECMALL. However, after released for two months, there was not even a single person came to sign up as seller! The website was still very new and with little content, no one was able to find it through search engine. I need to decide closing down the website or find a way to attract buyers and sellers.

Read more...
Subscribe to this RSS feed