In 2010, I downloaded and installed an open source internet mall (ECMALL from China) as part of my learning for internet marketing. ECMALL enabled multiple users to open web store and sell products. It also supported transactions via Paypal. It was and still is a popular open source internet mall in China and other countries.
The installation process was easy with the instructions given. With no much knowledge in PHP, I changed the language files and created an English version of ECMALL. However, after released for two months, there was not even a single person came to sign up as seller! The website was still very new and with little content, no one was able to find it through search engine. I need to decide closing down the website or find a way to attract buyers and sellers.
I remembered when I had my physical store in a prominent shopping mall, sales person from other new malls will approach us to open a branch at their place. With this idea, I browsed other internet malls and copied email addresses of sellers. I then sent emails to invite them sign up as sellers. Early birds will be able to get free advertisement for a period of time. Results were encouraging as more than half of mail recipients signed up as sellers.
However, manual cut and paste email addresses and store owners' name was not a fun act. I consumed a lot of time to collect emails into Excel spread sheet and sent out one by one. I needed an automated way to accomplish the task. Google search results showed there are many ways to collect email addresses, from buying off the shelf email extractor, screen scraper software to writing programming code, Snoopy, Simple HTML Dom... I tried many methods and eventually learning writing screen scraping/spider program using PHP/MySQL.
Learning up screen scraping can be a very useful skill. On top of harvesting email, you can extract companies information from directory websites, copy product information from current web store then reformat to MySQL entries of new store in another web store, create price comparison website, news aggregator site, auto login to targeted website to perform some tasks etc...
In this log, I will share my experiences and programming code how to perform screen scraping from small to large scale.