Handling Various Issues In Text Classification : A Review
Abstract
depending upon its content into predefined categories. This helps in
providing conceptual views of collection of documents and has
important real world applications. Text classification is the
priliminary reqisite of text retrieval and understanding systems. The
text retrieval system retrieves text in reply to a user defined query.
While the text understanding system transforms text in such a way
that it produces summaries, answer questions or data extract. This
survey provides a brief review of generic text classification
processes, phases of that process, the existing work done on the text
classification and the various methods and algorithms for the
effective text classification.
Full Text:
PDFReferences
F. Sebastiani, “Text categorization”, Alessandro Zanasi (ed.) Text Mining
and its Applications, WIT Press, Southampton, UK, pp. 109-129, 2005.
Y. Yang, “An evaluation of statistical approaches to text categorization”,
Journal of Information Retrieval, 1(1/2):67–88, 1999.
M. Ikonomakis, S. Kotsiantis, V. T ampakas, “Text Classification Using
Machine Learning Techniques”, Wseas Transactions on Computers, Issue 8,
Volume 4, August 2005, pp. 966-974.
L. Douglas Bakerti and Andrew Kachites McCallumlt, “Distributional
Clustering of Words for Text Classification”, In Proceedings of the 21st annual
international ACM SIGIR conference on Research and development in
information retrieval, Pages 96-103, 1998
Fernando Pereira, Naftali Tishby, and Lillian Lee. “Distributional clustering
of english words.” In Proceedings of the 81st Annual Meeting of the Association
for Computational Linguistics, pages 183-90, 1993.
Gabriel Pui Cheong Fung et al. “Text Classification without Negative
Examples Revisit” IEEE TRANSACTIONS ON KNOWLEDGE AND DATA
ENGINEERING, VOL. 18, NO. 1, JANUARY 2006.
Rongbo Du et al.“Web Filtering Using Text Classification”, IEEE
International Conference on Networks, 28 September - 1 October 2003, 325-
Sabri Serkan Güllüoğlu, “Segmenting Customers With Data Mining
Techniques”, ISBN: 978-1-4799-6376-8/15/©(2015) IEEE.
Manisha Girotra, Kanika Nagpal, Saloni Minocha, Neha Sharma,
“Comparative Survey on Association Rule Mining Algorithms”, International
Journal of Computer Applications (0975 – 8887) Volume 84 – No (10,
December 2013).
Patricia Morreale, Steve Holtz, Allan Goncalves, “Data Mining and
Analysis of Large Scale Time Series Network Data”, 2013 27th International
Conference on Advanced Information Networking and Applications Workshops.
Huan Liu, Hiroshi Motoda, Lei Yu , “A Selective Sampling Approach to
Active Feature Selection” ACM Journal on Artificial Intelligence, Volume 159
Issue 1-2, November 2004.
Wei Zhao “A New Feature Selection Algorithm in Text Categorization
“International Symposium on Computer, Communication, Control and
Automation 2010.
Yiming Yang “An Evolution of statistical Approaches to Text
Categorization” Information Retrieval 1, 69-90 1999.
Kjersti Aas and Line Eikvil “Text Categorization: A Survey” Report No.
ISBN 82-539-0425-8, June, 1999.
Rayidi Ghani “Combining labeled and unlabeled data for text Classification
with a large number of Categories”, Data Mining, 2001, ICDM 2001,
Proceedings IEEE International Conference.
J. Han, M. Kamber, “Data Mining: Concepts and Techniques,” Elsevier,
Second Edition, 2006.
J. R. Quinlan, “C4.5: Programs for Machine Learning”, Morgan Kaufmann
Series in Machine Learning, 1993.
C.Apte, F. Damerau, and S.M. Weiss “Automated Learning of Decision
Rules for Text Categorization”, ACM Transactions on Information Systems,
Sholom M. Weiss Nitin Indurkhya, “Rule-based Machine Learning
Methods for Functional Prediction”, Journal of Artificial Intelligence Research 3
-403 1995
F. Sebastiani, “Machine Learning in Automated Text Categorization”,
ACM 2002.
D. Lewis, “Naive Bayes at Forty: The Independence Assumption in
Information Retrieval”, Proc. ECML-98, 10th European Conf. Machine 1998.
Vidhya. K.A G.Aghila, “A Survey of Naive Bayes Machine Learning
approach in Text Document Classification”, (IJCSIS) International Journal of
Computer Science and Information Security, Vol. 7, 2010.
McCallum, A. and Nigam K., "A Comparison of Event Models for Naive
Bayes Text Classification". AAAI/ ICML -98 Workshop on Learning for Text
Categorization.
Sang- Bum Kim, et al, “Some Effective Techniques for Naive Bayes Text
Classification “IEEE Transactions on Knowledge and Data Engineering, Vol.
, November 2006.
Yirong Shen and Jing Jiang” Improving the Performance of Naive Bayes
for Text Classification”CS224N Spring 2003.
Michael J. Pazzani “Searching for dependencies in Bayesian classifiers”
Proceedings of the Fifth Int. workshop on AI and, Statistics. Pearl, 1988.
Dino Isa “Text Document Pre-Processing Using the Bayes Formula for
Classification Based on the Vector Space Mode”, Computer and Information
Science November, 2008.
Bayes Jingnian Chen a, b, Houkuan Huang a, Shengfeng Tian a, Youli Qua
a “Feature selection for text classification with Naive”, China Expert Systems
with Applications 36 5432–54352009.
Joachims, T. “Text categorization with support vector machines: learning
with many relevant features”. In Proceedings of ECML-98, 10th European
Conference on Machine Learning (Chemnitz, DE), pp. 137–142 1998.
Y. Y. X. Liu, “A re-examination of Text categorization Methods” IGIR-99,
Chen donghui Liu zhijing, “A new text categorization method based on
HMM and SVM”, IEEE2010
Loubes, J. M. and van de Geer, S “Support vector machines and the Bayes
rule in classification”, Data mining knowledge and discovery 6 259-275.2002.
Yu-ping Qin Xiu-kun Wang, “Study on Multi-label Text Classification
Based on SVM” Sixth International Conference on Fuzzy Systems and
Knowledge Discovery 2009
Dagan, I., Karov, Y., and Roth, D. “Mistake-Driven Learning in Text
Categorization.” In Proceedings of CoRR. 1997
MIgual E .Ruiz, Padmini Srinivasn, “Automatic Text Categorization Using
Neural networks”, Advaces in Classification Research, Volume VIII.
Cheng Hua Li , Soon Choel Park “An efficient document classification
model using an improved back propagation neural network and singular value
decomposition”, Expert Systems with Applications, 3208–3215, 2009.
Hwee TOU Ng Wei Boon Goh Kok Leong Low, “Feature Selection,
Perception Learning, and a Usability Case Study for Text Categorization”,
SIGIR 97 Philadelphia PA.
Amy J.C. Trappey a, Fu-Chiang Hsu a, Charles V. Trappey b, Chia-I. Lin
“Development of a patent document classification and search platform using a
back-propagation network”, Expert Systems with Applications 31 755–765
Xiang Wang, Ruhua Chen ; Yan Jia ; Bin Zhou. “Short Text Classification
using Wikipedia Concept based Document Representation”, 978-1-4799-2876-
/13 $31.00 © 2013 IEEE DOI 10.1109/ITA.2013.114.
Menaka S and Radha N, “ Text Classification using Keyword Extraction
Technique” Volume 3, Issue 12, December 2013.
Lin Chen, Zhou Jie, Li Bi-Cheng “A Text Categorization Framework Based
on Concept Structure”V3-569, 978-1-4244-6349-7/10/2010 IEEE.
Antonia Kyriakopoulou and Theodore Kalamboukis, “Text Classification
Using Clustering” In Proceedings of the ECML-PKDD Discovery Challenge
Workshop, 2006.
Pu Wang , Jian Hu, Hua-Jun Zeng, Zheng Chen. “Using Wikipedia
knowledge to improve text classification”, Published online: 17 September 2008
© Springer-Verlag London Limited 2008, Knowledge Information System
(2009) 19:265–281, DOI 10.1007/s10115-008-0152-4.
Pu Wang and Carlotta Domeniconi “Building Semantic Kernels for Text
Classification using Wikipedia” Proceedings of the 14th ACM SIGKDD
international conference on Knowledge discovery and data mining,Pages 713-
Florian Beil, Martin Ester, Xiaowei Xu, “Frequent Term-Based Text
Clustering” Proceedings of the eighth ACM SIGKDD international conference
on Knowledge discovery and data mining, Pages 436-442
Muhammad Rafi, Sundus Hassan, Mohammad Shahid Shaikh “Contentbased
Text Categorization using Wikitology” International Journal of Computer
Science Issues (IJCSI);Jul2012, Vol. 9 Issue 4, p404.
Jingnian Chen, Houkuan Huang, Shengfeng Tian, Youli Qu, “Feature
selection for text classification with Naïve Bayes” springer on Expert Systems
with Applications 36 (2009) 5432–5435.
Huan Liu and Lei Yu “ Toward Integrating Feature Selection Algorithms
for Classification and Clustering” IEEE Transactions On Knowledge And Data
Engineering, VOL. 17, NO. 4, APRIL 2005.
Wenqian Shang,Houkuan Huang,Haibin Zhu, Yongmin Lin, Youli Qu,
Zhihai Wang , “A novel feature selection algorithm for text categorization”
Science direct journal”, Expert Systems with Applications 33 (2007) 1–5.
Sang-Bum Kim, Hae-Chang Rim, DongSuk Yook, Heui-Seok Lim, “Some
Effective Techniques for Naive Bayes Text Classification”, IEEE Transactions
On Knowledge And Data Engineering, VOL. 18, NO. 11, NOVEMBER 2006.
Andrew kachites and McCallum “Multilabeled text classification with
mixture model trained by Em”Citeseer,1999.
Abdullah Bawakid and Mourad Oussalah, “A Semantic Summarization
System: University of Birmingham at TAC 2008”, Proceedings of the First text
Analysis Conference,November 17-19,2008.
Rafeeque P C and Sendhilkumar S, “A Survey on Short Text Analysis in
Web”, IEEE-ICoAC 2011.
Byron Knoll, “Text Prediction and Classification Using String Matching”,
Journal of Machine Learning Research 2 (2002) 419-444
C. Blake and C. Merz, “UCI repository of machine learning databases,"1998.
R Development Core Team, R: A language and environment for statistic al
computing. R Foundation for Statistical Computing, Vienna, Austria, 2004.
ISBN 3-900051-07-0.
P . J. Hayes and S. P . Weinstein, “CONSTRUE/TIS: a system for contentbased
indexing of a database of news stories," in Second Annual Conference on
Innovative Applications of Artificial Intel ligence, 1990.
T. Joachims, “Text categorization with support vector machines: learning
with many relevant features," tech. rep., University of Dortmund, Fachbereich
Informatik, 1997.
R. Hersh, C. Buckley , T. J. Leone, and D. H. Hick am, Ohsumed: An in
teractive retriev al ev aluation and new large test collection for research," in Pro
c e e dings of the 17th Annual International ACM-SIGIR Conference on Rese ar
ch and Development in Information Retrieval. Dublin, Ir eland, 3-6 July 1994
(Sp e cial Issue of the SIGIR F orum) (W. B. Croft and C. J. v an Rijsbergen,
eds.), pp. 192{201, ACM/Springer, 1994.
S. Scott, “Feature engineering for a symbolic approach to text
classification," Master's thesis, Ottawa, CA, 1998.
Refbacks
- There are currently no refbacks.
Copyright © IJETT, International Journal on Emerging Trends in Technology