Hossein Zahed

Web Developer, Entrepreneur, Software Educator

Data Mining Techniques

There are several major data mining techniques have been developed and used in data mining projects recently including association, classification, clustering, prediction and sequential patterns. We will briefly examine those data mining techniques with example to have a good overview of them.

 

Association

Association is one of the best known data mining technique. In association, a pattern is discovered based on a relationship of a particular item on other items in the same transaction. For example, the association technique is used in market basket analysis to identify what products that customers frequently purchase together. Based on this data businesses can have corresponding marketing campaign to sell more products to make more profit.

 

Classification

Classification is a classic data mining technique based on machine learning. Basically classification is used to classify each item in a set of data into one of predefined set of classes or groups. Classification method makes use of mathematical techniques such as decision trees, linear programming, neural network and statistics. In classification, we make the software that can learn how to classify the data items into groups. For example, we can apply classification in application that “given all past records of employees who left the company, predict which current employees are probably to leave in the future.” In this case, we divide the employee’s records into two groups that are “leave” and “stay”. And then we can ask our data mining software to classify the employees into each group.

 

Clustering

Clustering is a data mining technique that makes meaningful or useful cluster of objects that have similar characteristic using automatic technique. Different from classification, clustering technique also defines the classes and put objects in them, while in classification objects are assigned into predefined classes. To make the concept clearer, we can take library as an example. In a library, books have a wide range of topics available. The challenge is how to keep those books in a way that readers can take several books in a specific topic without hassle. By using clustering technique, we can keep books that have some kind of similarities in one cluster or one shelf and label it with a meaningful name. If readers want to grab books in a topic, he or she would only go to that shelf instead of looking the whole in the whole library.

 

Prediction

The prediction as it name implied is one of a data mining techniques that discovers relationship between independent variables and relationship between dependent and independent variables. For instance, prediction analysis technique can be used in sale to predict profit for the future if we consider sale is an independent variable, profit could be a dependent variable. Then based on the historical sale and profit data, we can draw a fitted regression curve that is used for profit prediction.

 

Sequential Patterns

Sequential patterns analysis in one of data mining technique that seeks to discover similar patterns in data transaction over a business period. The uncover patterns are used for further business analysis to recognize relationships among data.

The Differences Between Data, Information and Knowledge

We frequently hear the words Data, Information and Knowledge used as if they are the same thing.

You hear people talking about the Internet as a “vast network of human knowledge” or that they’ll “e-mail through the data.”

By defining what we mean by data, information and knowledge – and how they interact with one another – it should be much easier.

 

Data

Data is/are the facts of the World. For example, take yourself. You may be 5ft tall, have brown hair and blue eyes. All of this is “data”. You have brown hair whether this is written down somewhere or not.

Human beings have used data as long as we’ve existed to form knowledge of the world.In many ways, data can be thought of as a description of the World. We can perceive this data with our senses, and then the brain can process this.

Until we started using information, all we could use was data directly. If you wanted to know how tall I was, you would have to come and look at me. Our knowledge was limited by our direct experiences. 

 

Information

Information allows us to expand our knowledge beyond the range of our senses. We can capture data in information, then move it about so that other people can access it at different times.

Here is a simple analogy for you.

If I take a picture of you, the photograph is information. But what you look like is data.

I can move the photo of you around, send it to other people via e-mail etc. However, I’m not actually moving you around – or what you look like. I’m simply allowing other people who can’t directly see you from where they are to know what you look like. If I lose or destroy the photo, this doesn’t change how you look.

So, in the case of the lost tax records, the CDs were information. The information was lost, but the data wasn’t. Mrs Jones still lives at 14 Whitewater road, and she was still born on 15th August 1971.

 

Knowledge

Firstly, let’s look at Knowledge. Knowledge is what we know. Think of this as the map of the World we build inside our brains. Like a physical map, it helps us know where things are – but it contains more than that. It also contains our beliefs and expectations. “If I do this, I will probably get that.” Crucially, the brain links all these things together into a giant network of ideas, memories, predictions, beliefs, etc.

It is from this “map” that we base our decisions, not the real world itself. Our brains constantly update this map from the signals coming through our eyes, ears, nose, mouth and skin.

You can’t currently store knowledge in anything other than a brain, because a brain connects it all together. Everything is inter-connected in the brain. Computers are not artificial brains. They don’t understand what they are processing, and can’t make independent decisions based upon what you tell them.

There are two sources that the brain uses to build this knowledge - information and data.

The Infogineering Model explains how these interact…

 

Why does it matter that people mix them up?

When people confuse data with information, they can make critical mistakes. Data is always correct (I can’t be 29 years old and 62 years old at the same time) but information can be wrong (there could be two files on me, one saying I was born in 1981, and one saying I was born in 1948).

Information captures data at a single point. The data changes over time. The mistake people make is thinking that the information they are looking at is always an accurate reflection of the data.

By understanding the differences between these, you can better understand how to make better decisions based on the accurate facts.


In Brief

Data: Facts, a description of the World

Information: Captured Data and Knowledge

Knowledge: Our personal map/model of the World

YouTube: 72 hours of video are uploaded every minute

The YouTube team recently celebrated its 7th birthday, and announced an amazing statistic: 72 hours of video are uploaded to the site every minute. And if you think that's a lot, the amount of video being watched in that minute is 1,000 times greater.

Just a few weeks ago the news that YouTube was doing 60 hours per minute created a sensation on the Internet, and this increase shows how fast the site continues to grow. We got a little more detail on these numbers from Christian Kaiser, YouTube's engineering director.

The 72-hours figure is an average, of course: there are busy times and quiet periods as different time zones and regions wake up, upload video and log off. So although 72 hours per minute is a good way to sum it up, they might be processing far more than that at any given time.

And they're not flubbing the numbers, either. That 72 hour figure doesn't include the various versions that YouTube encodes — high definition, 3-D, etc. People are simply uploading three times as much video as they were two years ago.

So what accounts for this huge increase in video? It's pretty much just more use across the board, says Kaiser: 

We’ve got more video coming from more people in more places, as well as bringing in thousands of full-length films and original programming to YouTube. As examples, mobile now contributes to three hours of our total upload time, and we’ve signed MGM, Paramount, Walt Disney Studios and other studios to bring thousands of movies to YouTube.

And how about viewership stats? They say that users are watching 3 billion hours of video every month. In case you're wondering, that's about 70,000 hours being watched each minute.

There's a lot of competition out there, though, and YouTube makes a big target. Facebook's own video numbers are rising meteorically as well. Smaller and more specialized sites like Vimeo and Ustream are growing quickly too, as more and better video is uploaded from DSLRs and mobile phone cameras. But with its existing popularity and Hollywood partners, YouTube probably still has a few good years left on top.

 

 

Half of US Cellular Subscribers Own Smartphones

The smartphone juggernaut continues in the U.S. According to a new Nielsen report, 49.7 percent of mobile subscribers owned smartphones as of February. That's up from 36 percent a year ago. 

Two-thirds of those who got a new phone in the last three months chose a smartphone over a feature phone, the research firm says.  

Android-based phones lead the U.S. smartphone market with a 48 percent share, while Apple's iPhone is at 32 percent, and BlackBerry is at 11.6 percent. 

"Among recent acquirers who got their smartphone within the last three months, 48 percent of those surveyed in February said they chose an Android and 43 percent bought an iPhone," the research firm said. Only 5 percent opted for a BlackBerry.

Another report, from comScore earlier this month, said that Android is on nearly half the phones  carried by the country's 101 million smartphone subscribers, while about another third use iPhones.

RIM, which makes BlackBerry, has continued to decline as a favorite among U.S. purchasers. In the summer of 2011, Nielsen pegged RIM's U.S. market share at about 20 percent.

In Nielsen's "State of the Media: The Mobile Media" report, released in December and based on a survey of 25,000 mobile customers, the research firm said smartphone ownership is predominant among those age 18 to 34.

Nearly two-thirds -- 64 percent -- of 25– to 34-year-olds and 53 percent of 18– to 24-year-olds own smartphones, "and they have led in smartphone penetration compared to other age groups since 2009."

ASP.NET Ajax Timer Control

Timer controls allow you to do postbacks at certain intervals. If used together with UpdatePanels, which is the most common approach, it allows for timed partial updates of your page, but it can be used for posting back the entire page as well. In this chapter we will focus on using timers with UpdatePanels, so if you haven't already read the chapter on UpdatePanels, please do so now. 

Here is a small example of using the Timer control. It simply updates a timestamp every 5 seconds.

<%@ Page Language="C#" AutoEventWireup="true" CodeFile="Default.aspx.cs"
Inherits="_Default" %>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head runat="server">
    <title>Timers</title>
</head>
<body>
    <form id="form1" runat="server">
        <asp:ScriptManager ID="ScriptManager1" runat="server" />
        <asp:Timer runat="server" id="UpdateTimer" interval="5000" 
ontick="UpdateTimer_Tick" />
        <asp:UpdatePanel runat="server" id="TimedPanel" updatemode="Conditional">
            <Triggers>
                <asp:AsyncPostBackTrigger controlid="UpdateTimer" eventname="Tick" />
            </Triggers>
            <ContentTemplate>
                <asp:Label runat="server" id="DateStampLabel" />
            </ContentTemplate>
        </asp:UpdatePanel>
    </form>
</body>
</html>

We only have a single CodeBehind function, which you should add to your CodeBehind file:

protected void UpdateTimer_Tick(object sender, EventArgs e)
{
    DateStampLabel.Text = DateTime.Now.ToString();
}

This is all very simple. We have a normal UpdatePanel, which carries a Trigger reference to our new Timer control. This means that the panel is updated when the Timer "ticks", that is, fires the Tick event. The Timer control uses the interval attribute to define the number of milliseconds to occur before firing the Tick event. As you can see from our CodeBehind code listing, we just update the DateStampLabel each time the Timer fires. This could be done more efficient with a simple piece of JavaScript, which updates the time on the clientside instead of involving the server. The example is only used to demonstrate the potential of the Timer control. 

Another approach is including the Timer inside the UpdatePanel. Doing so would save us from defining a trigger, but you should be aware of the fact that the behavior will be different, depending on whether you have the Timer inside or outside an UpdatePanel. When a Timer is inside an UpdatePanel, the Timer is not re-constructed until the UpdatePanel is fully updated. That means that if you have a Timer with an interval of 60 seconds, and the update takes 5 seconds, the next event won't be fired 60 seconds after the previous, but 65 seconds after. On the other hand, if the Timer is outside the UpdatePanel, the user will only look at the content of the panel for 55 seconds before it's updated again. 

You should always remember that even though partial updates are not as heavy on the server as real postbacks, the hosting server is still contacted, and when using timers, you may get a lot of partial postbacks, which can slow things down. Always use as high intervals as possible, and consider if contacting the server is really necessary or not.