PHP 8 Legs

Webbots, Web Spiders, Screen Scrapers, Sphinx Search, Joomla,网络机器人,网络蜘蛛,网络爬虫,Sphinx搜索,囧啦

  • Home
    Home This is where you can find all the blog posts throughout the site.
  • Categories
    Categories Displays a list of categories from this blog.
  • Tags
    Tags Displays a list of tags that have been used in the blog.
  • Bloggers
    Bloggers Search for your favorite blogger from this site.
  • Login
    Login Login form

The Beginning...

Posted by on in Web Scraping
  • Font size: Larger Smaller
  • Hits: 8172
  • 1 Comment
  • Subscribe to this entry
  • Print

In 2010, I downloaded and installed an open source internet mall (ECMALL from China) as part of my learning for internet marketing. ECMALL enabled multiple users to open web store and sell products. It also supported transactions via Paypal. It was and still is a popular open source internet mall in China and other countries.

The installation process was easy with the instructions given. With no much knowledge in PHP, I changed the language files and created an English version of ECMALL. However, after released for two months, there was not even a single person came to sign up as seller! The website was still very new and with little content, no one was able to find it through search engine. I need to decide closing down the website or find a way to attract buyers and sellers.

I remembered when I had my physical store in a prominent shopping mall, sales person from other new malls will approach us to open a branch at their place. With this idea, I browsed other internet malls and copied email addresses of sellers. I then sent emails to invite them sign up as sellers. Early birds will be able to get free advertisement for a period of time. Results were encouraging as more than half of mail recipients signed up as sellers.

However, manual cut and paste email addresses and store owners' name was not a fun act. I consumed a lot of time to collect emails into Excel spread sheet and sent out one by one. I needed an automated way to accomplish the task. Google search results showed there are many ways to collect email addresses, from buying off the shelf email extractor, screen scraper software to writing programming code, Snoopy, Simple HTML Dom... I tried many methods and eventually learning writing screen scraping/spider program using PHP/MySQL.

Learning up screen scraping can be a very useful skill. On top of harvesting email, you can extract companies information from directory websites, copy product information from current web store then reformat to MySQL entries of new store in another web store, create price comparison website, news aggregator site, auto login to targeted website to perform some tasks etc...

In this blog, I will share my experiences and programming code how to perform screen scraping from small to large scale. 

Rate this blog entry:

I am a full time internet retailer, selling physical products through my own websites and various internet marketplaces. I write PHP web bots and screen scraper scripts during my free time for email marketing to increase web traffic, scraping products from one website to another to minimize manual entries, aggregate content for new websites etc.
I am available for hire as freelance PHP coder on web bots, screen scraper and datamining. I quote fixed price for your project if the detail of requirements are clearly outlined.
I also help customers to build and host Joomla based business/content/blogging website, shopping cart with Virtuemart, Presta Shop, Open Cart, EC Shop, EC Mall etc. First year hosting is free.
I accept payment via Paypal. If you would like to contact me, please write to freeman [a] TQ.


  • Guest
    Mahmud Ahsan Thursday, 24 October 2013

    Ah nice to see a good blog starting for spidering using PHP. Hope to see more articles about this topic :)

Leave your comment

Guest Friday, 19 September 2014