Data cleaning helps to remove all unnecessary data. Data cleaning attempts to fill in missing values, smooth out noise while identifying outliers and correct inconsistencies in the data. Data cleaning is usually an iterative two-step process consisting of discrepancy detection and data transformation.
5.3.4 Data analysis
Data analysis is also known as analysis of data or data analytics, is a process of inspecting, cleansing, transforming and modeling data with the goal of discovering useful information, suggesting conclusions and supporting decision-making. Data analysis has multiple facets and approaches encompassing diverse techniques.
5.3.5 Apriori Algorithm
Apriori algorithm is used to find frequent item-sets. …show more content…
It proceeds by identifying the frequent individual items in the database and extending them to larger and larger item-sets as long as those itemsets appear sufficiently often in the database. The frequent itemsets determined by apriori algorithm can be used to determine association rule.
It works in two different steps:
1) Systematically identify item-sets that occur frequently in the data set with a support greater than a pre-specified threshold.
2) Calculate the confidence of all possible rules given the frequent item-sets and keep only those with a confidence greater than a pre-specified …show more content…
It contains if-then rules which support the data. Market basket analysis is an association rule which deals with the content of point-of-sale transaction of large retailers. It identifies the relationship among the attribute which is present in the database. It assigns relationship of one item with another item.
It is a fact that all the managers in any kind of shop or departmental stores would like to gain knowledge about the buying behavior of every customer. This market basket analysis system helps the managers to understand the sets of items which is customers likely to purchase. Association rule is an advanced form of the process of searching frequent item-sets in which such item-sets will be processed the information that can be read by the user. It shows the correlation between data and analyses the information regarding support and confidence. This information helps to take further decision. It extracts important correlation among the data which is present in the database.
An association rule is an implication expression of the form X Y, where X and Y are disjoint item-set. The strength of an association rule can be measured in terms of its support and