Improved Algorithm for Mining of High Utility patterns in one phase Based on Map Reduce Framework on Hadoop
Abstract
database alludes to the disclosure of itemsets with high utility like
benefits. In spite of the fact that various significant calculations
have been proposed lately, they bring about the problem of
causing a sizably voluminous number of applicant itemsets for
high utility itemsets. Such a large number of candidate itemsets
degrades the mining performance in terms of execution time
and space requirement. Earlier work shows this on two phase
candidate generation. This approach suffers from scalability issue
due to the huge number of candidates. Our paper presents the
efficient approach where we can generate high utility patterns
in one phase without generating candidates. Here we have
taken experiments on linear data structure, our pattern growth
approach is to search a reverse set enumeration tree and to prune
search space by utility upper bounding. Also high utility patterns
are identified by a closure property and singleton property. Iin
this venture we are displaying new approach which is extending
these calculations to conquer the restrictions utilizing the Map
Reduce structure on Hadoop. Experimental results show that the
proposed algorithms, not only reduce the number of candidates
effectively but also outperform other algorithms substantially in
terms of runtime, especially when databases contain lots of long
transactions.
Full Text:
PDFReferences
R. Agarwal, C. Aggarwal, and V. Prasad, Depth first generation of
long patterns, in Proc. ACM SIGKDD Int. Conf. Knowl. DiscoveryData
Mining, 2000, pp. 108118.
R. Agrawal, T. Imielinski, and A. Swami, Mining association rules
between sets of items in large databases in Proc. ACM SIGMOD Int.
Conf. Manage. Data, 1993, pp. 207216.
R. Agrawal and R. Srikant, Fast algorithms for mining association rules
in Proc. 20th Int. Conf. Very Large Databases, 1994, pp. 487499
C. F. Ahmed, S. K. Tanbeer, B.-S. Jeong, and Y.-K. Lee, Efficient tree
structures for high utility pattern mining in incremental databases IEEE
Trans. Knowl. Data Eng., vol. 21, no. 12, pp. 1708 1721, Dec. 2009.
R. Bayardo and R. Agrawal, Mining the most interesting rules in Proc.
th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 1999, pp.
F. Bonchi, F. Giannotti, A. Mazzanti, and D. Pedreschi, ExAnte: A
preprocessing method for frequent-pattern mining IEEE Intell. Syst., vol.
, no. 3, pp. 2531, May/Jun. 2005.
F. Bonchi and B. Goethals, FP-Bonsai: The art of growing and pruning
small FP-trees in Proc. 8th Pacific-Asia Conf. Adv. Knowl. Discovery
Data Mining, 2004, pp. 155160
F. Bonchi and C. Lucchese, Extending the state-of-the-art of constraintbased pattern discovery Data Knowl. Eng., vol. 60, no. 2, pp. 377399,
C. Bucila, J. Gehrke, D. Kifer, and W. M. White, Dualminer: A dualpruning algorithm for itemsets with constraints Data Mining Knowl.
Discovery, vol. 7, no. 3, pp. 241272, 2003.
R. Chan, Q. Yang, and Y. Shen, Mining high utility itemsets in Proc.
Int. Conf. Data Mining, 2003, pp. 1926.
S. Dawar and V. Goyal, UP-Hist tree: An efficient data structure for
mining high utility patterns from transaction databases in Proc. 19th Int.
Database Eng. Appl. Symp., 2015, pp. 5661.
T. De Bie, Maximum entropy models and subjective interestingness: An
application to tiles in binary databasesData Mining Knowl. Discovery,
vol. 23, no. 3, pp. 407446, 2011.
L. De Raedt, T. Guns, and S. Nijssen, Constraint programming for
itemset mining in Proc. ACM SIGKDD, 2008, pp. 204212.
A. Erwin, R. P. Gopalan, and N. R. Achuthan, Efficient mining of high
utility itemsets from large datasets in Proc. 12th Pacific-Asia Conf. Adv.
Knowl. Discovery Data Mining, 2008, pp. 554561.
P. Fournier-Viger, C.-W. Wu, S. Zida, and V. S. Tseng, FHM: Faster
high-utility itemset mining using estimated utility cooccurrence pruning
in Proc. 21st Int. Symp. Found. Intell. Syst., 2014, pp. 8392.
L. Geng and H. J. Hamilton, Interestingness measures for data mining:
A survey ACM Comput. Surveys, vol. 38, no. 3, p. 9, 2006.
J. Han, J. Pei, and Y. Yin, Mining frequent patterns without candidate
generation in Proc. ACM SIGMOD Int. Conf. Manage. Data, 2000, pp.
R. J. Hilderman, C. L. Carter, H. J. Hamilton, and N. Cercone, Mining
market basket data using share measures and characterized itemsets in
Proc. PAKDD, 1998, pp. 7286.
R. J. Hilderman and H. J. Hamilton, Measuring the interestingness of
discovered knowledge: A principled approach Intell. Data Anal., vol. 7,
Raymond Chan; Qiang Yang; Yi-Dong Shen, ”Mining high utility
itemsets” In Proc. of Third IEEE Intl Conf. on Data Mining ,November
Refbacks
- There are currently no refbacks.
Copyright © IJETT, International Journal on Emerging Trends in Technology