Friday, 17 February 2017

Things to know about web scraping

Things to know about web scraping

First things first, it is important to understand what web scraping means and what is its purpose. Web scraping is a computer software technique through which people can extract information and content from various websites. The main purpose is to use that information in a way that the site owner does not have direct control over it. Most people use web scraping in order to turn commercial advantage of their competitors into their own.

There are many scraping tools available on the Internet, but because some people might think that web scraping goes long beyond their duties, many small companies that provide this type of services have appeared on the market. This way, you can turn this challenging and complex process into an easy web scraping one, which, believe it or not, exists for nearly as long as the web. All you have to do is some quick research on the Internet and find the best consultant that is willing to help you with this matter. When it comes to the industries that web scraping is targeting, it is worth mentioning that some of them prevail over others. One good example is digital publishers and directories. They are one of the easiest targets for web scrappers, because most of their intellectual property is available to a large number of people. Industries like travel or real estate are also a good place for scraping, along with ecommerce, which is an obvious target too. Time-limited promotions and even flash sales are the reasons why ecommerce is seen as a candy by web scrapers.

Source: http://www.amazines.com/article_detail.cfm/6196289?articleid=6196289

Saturday, 11 February 2017

Data Mining Basics

Data Mining Basics

Definition and Purpose of Data Mining:

Data mining is a relatively new term that refers to the process by which predictive patterns are extracted from information.

Data is often stored in large, relational databases and the amount of information stored can be substantial. But what does this data mean? How can a company or organization figure out patterns that are critical to its performance and then take action based on these patterns? To manually wade through the information stored in a large database and then figure out what is important to your organization can be next to impossible.

This is where data mining techniques come to the rescue! Data mining software analyzes huge quantities of data and then determines predictive patterns by examining relationships.

Data Mining Techniques:

There are numerous data mining (DM) techniques and the type of data being examined strongly influences the type of data mining technique used.

Note that the nature of data mining is constantly evolving and new DM techniques are being implemented all the time.

Generally speaking, there are several main techniques used by data mining software: clustering, classification, regression and association methods.

Clustering:

Clustering refers to the formation of data clusters that are grouped together by some sort of relationship that identifies that data as being similar. An example of this would be sales data that is clustered into specific markets.

Classification:

Data is grouped together by applying known structure to the data warehouse being examined. This method is great for categorical information and uses one or more algorithms such as decision tree learning, neural networks and "nearest neighbor" methods.

Regression:

Regression utilizes mathematical formulas and is superb for numerical information. It basically looks at the numerical data and then attempts to apply a formula that fits that data.

New data can then be plugged into the formula, which results in predictive analysis.

Association:

Often referred to as "association rule learning," this method is popular and entails the discovery of interesting relationships between variables in the data warehouse (where the data is stored for analysis). Once an association "rule" has been established, predictions can then be made and acted upon. An example of this is shopping: if people buy a particular item then there may be a high chance that they also buy another specific item (the store manager could then make sure these items are located near each other).

Data Mining and the Business Intelligence Stack:

Business intelligence refers to the gathering, storing and analyzing of data for the purpose of making intelligent business decisions. Business intelligence is commonly divided into several layers, all of which constitute the business intelligence "stack."

The BI (business intelligence) stack consists of: a data layer, analytics layer and presentation layer.

The analytics layer is responsible for data analysis and it is this layer where data mining occurs within the stack. Other elements that are part of the analytics layer are predictive analysis and KPI (key performance indicator) formation.

Data mining is a critical part of business intelligence, providing key relationships between groups of data that is then displayed to end users via data visualization (part of the BI stack's presentation layer). Individuals can then quickly view these relationships in a graphical manner and take some sort of action based on the data being displayed.

Source:http://ezinearticles.com/?Data-Mining-Basics&id=5120773