JCE Text Editor - Must Have Component in Joomla

jce logo

The standard text editor in Joomla since 1.5 until 3.x is TinyMCE. It has all the basic functions you need to write content (text and images) and publish your blog. In fact, text editor is one of the most important components in Joomla. The TinyMCE editor is like a simplified word processor and with limited features.

However, I am very used to a third party text editor, JCE - Joomla Content Editor, which is a much better text editor than TinyMCE. I am using JCE in all my websites. Furthermore, JCE is FREE, so you should give it a try.

Read more...

Create MySQL Database for PHP Web Spider Extracted Emails Addresses (4)

store extracted email to database

In this final part of PHP/cURL email extractor, I will show you how to store extracted data into MySQL database. You can store email addresses and contact information collected not just from one website, but also from various websites into the same database.

You might want to store email collected based on your purpose. For example, if you have a real estate website and a internet shopping website, then information collected should be stored into two different categories (tables in MySQL database).

First, you need to activate XAMPP on your PC, both Apache and MySQL. At browser URL, go to "http://localhost/phpmyadmin/". Go to top menu bar and select "Database". To create a new database for our tutorial, enter "email_collection" and press "Create" button, as shown in the picture below.

You can download the source file for PHP cURL Email Extractor from here.

Read more...

PHP Web Spider to Crawl Web Pages Pagination and Extract Emails (3)

email web crawler

In this post, I will show you how to further modify our email extractor script to inject crawling capability and collect as many targeted email list as possible.

The trick is quite simple - we are not crawling entire website and check each and every web page for email addresses. Doing this will consume a lot of bandwidth and our time. We just need to crawl web pages with targeted email listing: so as long as we know total pages to crawl, then looping from first page to last page and job is done!

First, inspect the pagination of our targeted website. In the example we use, it has page 1, 2, 3,... and "Last" page button. Pressing this button will take us to the last page, which is page 169. Each page has 10 email addresses, so we can get almost 1690 emails from this website. The total number of pages (169 at present) can be changed in future. If we want to reuse our email extractor script with minimum manual editing, it needs to be able to auto detect total pages.

Read more...

PHP Email Extractor Script with cURL and Regular Expression (2)

targeted email extractor

In this post, we need to make a small modification to previous PHP script for targeted email extraction.

First, we need to revisit the source file of the targeted web page, we can see that there are repeated blocks of agent contacts, with name, email and phone number. Total of 10 blocks per page. 

The plan is to use the script to "cut out" each block and stores into array, then extract name, email and phone number from each block.

As you can see, each block starts with <div class="negotiators-wrapper"> tag and ends with </div></div>. Note that both </div> tags are separated by carriage return and new line feed in this example. 

Read more...

Email Extractor Script with PHP cURL and Regular Expression (1)

email spider

In this post, I will explain how to use PHP/cURL to extract / harvest email addresses from websites. The script will involve regular expression to match HTML tag for extraction.

If we send out email and address the person as "Dear Sir" or "Dear Madam", most likely the email will end up as spam. We do not want to just extract email addresses only, but also other information related to the email addresses, such as name, telephone, company, job position etc. When we send out email from the list collected, we want to be able to address the contact person as detail as possible, such as with his/her name, job position in the company, contact number etc.

Of course, please do not abuse the ability of email extraction and send out unwanted spam mails, products/services advertising, violate copyright law or disturbing network bandwidth etc. If you get into trouble, talk to your lawyer please.

Read more...

How to Install Joomla 3.x on Local Windows PC using XAMPP for Website Development

Install Joomla with XAMPP

If you want to create a new website, you don't have to immediately register a domain name and buy a website hosting plan.

1) You might change your plan later and find that the domain name not suitable to your content.

2) You can complete the website development and release to live server after fill up with substantial contents.

3) You can show to your customers from your PC how the website works without connection to Internet and etc.

Assuming you already have XAMPP (if not, you can find the installation guide here), you can easily install Joomla to your local Windows PC and develop your website before release to internet. Here, I am using Joomla 3.1.5 as example. You can actually install Wordpress, Drupal and many types of PHP/MySQL related program with the same steps. Remember to turn on XAMPP on your PC.

Go to Joomla website to get the latest version of Joomla CMS. Select Download button. The current version is 3.1.5 as of this point. 

Read more...

HTTP Get Request via PHP/cURL To Request Web Page Source File

php curl

After setting your PHP/MySQL environment with XAMPP, now we can start to create PHP script to retrieve a web page source file. There are many libraries in PHP to send request to our targeted web server and receive the response in a file format. One of the common way to achieve this is to use cURL extension in PHP. 

For now, we create a very simple PHP/cURL class to help us request web page from server. After that, we can proceed to "operate" source file to scrape information we need. Also, we need to modify and enhance the code of this class as we going further.

First, create a folder "scraper" under C:\xampp\htdocs, then create a text file using Notepad++ called httpcurl.php under directory C:\xampp\htdocs\scraper.

Read more...
Subscribe to this RSS feed