Hossein Zahed

Web Developer, Entrepreneur, Software Educator

Data Mining Techniques

There are several major data mining techniques have been developed and used in data mining projects recently including association, classification, clustering, prediction and sequential patterns. We will briefly examine those data mining techniques with example to have a good overview of them.

 

Association

Association is one of the best known data mining technique. In association, a pattern is discovered based on a relationship of a particular item on other items in the same transaction. For example, the association technique is used in market basket analysis to identify what products that customers frequently purchase together. Based on this data businesses can have corresponding marketing campaign to sell more products to make more profit.

 

Classification

Classification is a classic data mining technique based on machine learning. Basically classification is used to classify each item in a set of data into one of predefined set of classes or groups. Classification method makes use of mathematical techniques such as decision trees, linear programming, neural network and statistics. In classification, we make the software that can learn how to classify the data items into groups. For example, we can apply classification in application that “given all past records of employees who left the company, predict which current employees are probably to leave in the future.” In this case, we divide the employee’s records into two groups that are “leave” and “stay”. And then we can ask our data mining software to classify the employees into each group.

 

Clustering

Clustering is a data mining technique that makes meaningful or useful cluster of objects that have similar characteristic using automatic technique. Different from classification, clustering technique also defines the classes and put objects in them, while in classification objects are assigned into predefined classes. To make the concept clearer, we can take library as an example. In a library, books have a wide range of topics available. The challenge is how to keep those books in a way that readers can take several books in a specific topic without hassle. By using clustering technique, we can keep books that have some kind of similarities in one cluster or one shelf and label it with a meaningful name. If readers want to grab books in a topic, he or she would only go to that shelf instead of looking the whole in the whole library.

 

Prediction

The prediction as it name implied is one of a data mining techniques that discovers relationship between independent variables and relationship between dependent and independent variables. For instance, prediction analysis technique can be used in sale to predict profit for the future if we consider sale is an independent variable, profit could be a dependent variable. Then based on the historical sale and profit data, we can draw a fitted regression curve that is used for profit prediction.

 

Sequential Patterns

Sequential patterns analysis in one of data mining technique that seeks to discover similar patterns in data transaction over a business period. The uncover patterns are used for further business analysis to recognize relationships among data.

The Differences Between Data, Information and Knowledge

We frequently hear the words Data, Information and Knowledge used as if they are the same thing.

You hear people talking about the Internet as a “vast network of human knowledge” or that they’ll “e-mail through the data.”

By defining what we mean by data, information and knowledge – and how they interact with one another – it should be much easier.

 

Data

Data is/are the facts of the World. For example, take yourself. You may be 5ft tall, have brown hair and blue eyes. All of this is “data”. You have brown hair whether this is written down somewhere or not.

Human beings have used data as long as we’ve existed to form knowledge of the world.In many ways, data can be thought of as a description of the World. We can perceive this data with our senses, and then the brain can process this.

Until we started using information, all we could use was data directly. If you wanted to know how tall I was, you would have to come and look at me. Our knowledge was limited by our direct experiences. 

 

Information

Information allows us to expand our knowledge beyond the range of our senses. We can capture data in information, then move it about so that other people can access it at different times.

Here is a simple analogy for you.

If I take a picture of you, the photograph is information. But what you look like is data.

I can move the photo of you around, send it to other people via e-mail etc. However, I’m not actually moving you around – or what you look like. I’m simply allowing other people who can’t directly see you from where they are to know what you look like. If I lose or destroy the photo, this doesn’t change how you look.

So, in the case of the lost tax records, the CDs were information. The information was lost, but the data wasn’t. Mrs Jones still lives at 14 Whitewater road, and she was still born on 15th August 1971.

 

Knowledge

Firstly, let’s look at Knowledge. Knowledge is what we know. Think of this as the map of the World we build inside our brains. Like a physical map, it helps us know where things are – but it contains more than that. It also contains our beliefs and expectations. “If I do this, I will probably get that.” Crucially, the brain links all these things together into a giant network of ideas, memories, predictions, beliefs, etc.

It is from this “map” that we base our decisions, not the real world itself. Our brains constantly update this map from the signals coming through our eyes, ears, nose, mouth and skin.

You can’t currently store knowledge in anything other than a brain, because a brain connects it all together. Everything is inter-connected in the brain. Computers are not artificial brains. They don’t understand what they are processing, and can’t make independent decisions based upon what you tell them.

There are two sources that the brain uses to build this knowledge - information and data.

The Infogineering Model explains how these interact…

 

Why does it matter that people mix them up?

When people confuse data with information, they can make critical mistakes. Data is always correct (I can’t be 29 years old and 62 years old at the same time) but information can be wrong (there could be two files on me, one saying I was born in 1981, and one saying I was born in 1948).

Information captures data at a single point. The data changes over time. The mistake people make is thinking that the information they are looking at is always an accurate reflection of the data.

By understanding the differences between these, you can better understand how to make better decisions based on the accurate facts.


In Brief

Data: Facts, a description of the World

Information: Captured Data and Knowledge

Knowledge: Our personal map/model of the World

Gene which sparked human brain leap identified

Scientists have identified the gene which may have driven the crucial step in evolution where man learned to talk.

 

 

By duplicating itself two and a half million years ago the gene could have given early human brains the power of speech and invention, leaving cousins such as chimpanzees behind.

The gene, known as SRGAP2, helps control the development of the neocortex – the part of the brain responsible for higher functions like language and conscious thought.

Having an extra copies slowed down the development of the brain, allowing it to forge more connections between nerve cells and in doing so grow bigger and more complex, researchers said.

In a study published in the Cell journal, the scientists reported that the gene duplicated about 3.5 million years ago to create a "daughter" gene, and again a million years later creating a "granddaughter" copy.

Although humans and chimpanzees separated six million years ago, we still share 96 per cent of our genome and the gene is one of only about 30 which have copied themselves since that time.