Handling Various Issues In Text Classification : A Review

Sayali Rasane, D. V. Patil


Text classification is a process of document classification
depending upon its content into predefined categories. This helps in
providing conceptual views of collection of documents and has
important real world applications. Text classification is the
priliminary reqisite of text retrieval and understanding systems. The
text retrieval system retrieves text in reply to a user defined query.
While the text understanding system transforms text in such a way
that it produces summaries, answer questions or data extract. This
survey provides a brief review of generic text classification
processes, phases of that process, the existing work done on the text
classification and the various methods and algorithms for the
effective text classification.

Full Text:



F. Sebastiani, “Text categorization”, Alessandro Zanasi (ed.) Text Mining

and its Applications, WIT Press, Southampton, UK, pp. 109-129, 2005.

Y. Yang, “An evaluation of statistical approaches to text categorization”,

Journal of Information Retrieval, 1(1/2):67–88, 1999.

M. Ikonomakis, S. Kotsiantis, V. T ampakas, “Text Classification Using

Machine Learning Techniques”, Wseas Transactions on Computers, Issue 8,

Volume 4, August 2005, pp. 966-974.

L. Douglas Bakerti and Andrew Kachites McCallumlt, “Distributional

Clustering of Words for Text Classification”, In Proceedings of the 21st annual

international ACM SIGIR conference on Research and development in

information retrieval, Pages 96-103, 1998

Fernando Pereira, Naftali Tishby, and Lillian Lee. “Distributional clustering

of english words.” In Proceedings of the 81st Annual Meeting of the Association

for Computational Linguistics, pages 183-90, 1993.

Gabriel Pui Cheong Fung et al. “Text Classification without Negative



Rongbo Du et al.“Web Filtering Using Text Classification”, IEEE

International Conference on Networks, 28 September - 1 October 2003, 325-

Sabri Serkan Güllüoğlu, “Segmenting Customers With Data Mining

Techniques”, ISBN: 978-1-4799-6376-8/15/©(2015) IEEE.

Manisha Girotra, Kanika Nagpal, Saloni Minocha, Neha Sharma,

“Comparative Survey on Association Rule Mining Algorithms”, International

Journal of Computer Applications (0975 – 8887) Volume 84 – No (10,

December 2013).

Patricia Morreale, Steve Holtz, Allan Goncalves, “Data Mining and

Analysis of Large Scale Time Series Network Data”, 2013 27th International

Conference on Advanced Information Networking and Applications Workshops.

Huan Liu, Hiroshi Motoda, Lei Yu , “A Selective Sampling Approach to

Active Feature Selection” ACM Journal on Artificial Intelligence, Volume 159

Issue 1-2, November 2004.

Wei Zhao “A New Feature Selection Algorithm in Text Categorization

“International Symposium on Computer, Communication, Control and

Automation 2010.

Yiming Yang “An Evolution of statistical Approaches to Text

Categorization” Information Retrieval 1, 69-90 1999.

Kjersti Aas and Line Eikvil “Text Categorization: A Survey” Report No.

ISBN 82-539-0425-8, June, 1999.

Rayidi Ghani “Combining labeled and unlabeled data for text Classification

with a large number of Categories”, Data Mining, 2001, ICDM 2001,

Proceedings IEEE International Conference.

J. Han, M. Kamber, “Data Mining: Concepts and Techniques,” Elsevier,

Second Edition, 2006.

J. R. Quinlan, “C4.5: Programs for Machine Learning”, Morgan Kaufmann

Series in Machine Learning, 1993.

C.Apte, F. Damerau, and S.M. Weiss “Automated Learning of Decision

Rules for Text Categorization”, ACM Transactions on Information Systems,

Sholom M. Weiss Nitin Indurkhya, “Rule-based Machine Learning

Methods for Functional Prediction”, Journal of Artificial Intelligence Research 3

-403 1995

F. Sebastiani, “Machine Learning in Automated Text Categorization”,

ACM 2002.

D. Lewis, “Naive Bayes at Forty: The Independence Assumption in

Information Retrieval”, Proc. ECML-98, 10th European Conf. Machine 1998.

Vidhya. K.A G.Aghila, “A Survey of Naive Bayes Machine Learning

approach in Text Document Classification”, (IJCSIS) International Journal of

Computer Science and Information Security, Vol. 7, 2010.

McCallum, A. and Nigam K., "A Comparison of Event Models for Naive

Bayes Text Classification". AAAI/ ICML -98 Workshop on Learning for Text


Sang- Bum Kim, et al, “Some Effective Techniques for Naive Bayes Text

Classification “IEEE Transactions on Knowledge and Data Engineering, Vol.

, November 2006.

Yirong Shen and Jing Jiang” Improving the Performance of Naive Bayes

for Text Classification”CS224N Spring 2003.

Michael J. Pazzani “Searching for dependencies in Bayesian classifiers”

Proceedings of the Fifth Int. workshop on AI and, Statistics. Pearl, 1988.

Dino Isa “Text Document Pre-Processing Using the Bayes Formula for

Classification Based on the Vector Space Mode”, Computer and Information

Science November, 2008.

Bayes Jingnian Chen a, b, Houkuan Huang a, Shengfeng Tian a, Youli Qua

a “Feature selection for text classification with Naive”, China Expert Systems

with Applications 36 5432–54352009.

Joachims, T. “Text categorization with support vector machines: learning

with many relevant features”. In Proceedings of ECML-98, 10th European

Conference on Machine Learning (Chemnitz, DE), pp. 137–142 1998.

Y. Y. X. Liu, “A re-examination of Text categorization Methods” IGIR-99,

Chen donghui Liu zhijing, “A new text categorization method based on

HMM and SVM”, IEEE2010

Loubes, J. M. and van de Geer, S “Support vector machines and the Bayes

rule in classification”, Data mining knowledge and discovery 6 259-275.2002.

Yu-ping Qin Xiu-kun Wang, “Study on Multi-label Text Classification

Based on SVM” Sixth International Conference on Fuzzy Systems and

Knowledge Discovery 2009

Dagan, I., Karov, Y., and Roth, D. “Mistake-Driven Learning in Text

Categorization.” In Proceedings of CoRR. 1997

MIgual E .Ruiz, Padmini Srinivasn, “Automatic Text Categorization Using

Neural networks”, Advaces in Classification Research, Volume VIII.

Cheng Hua Li , Soon Choel Park “An efficient document classification

model using an improved back propagation neural network and singular value

decomposition”, Expert Systems with Applications, 3208–3215, 2009.

Hwee TOU Ng Wei Boon Goh Kok Leong Low, “Feature Selection,

Perception Learning, and a Usability Case Study for Text Categorization”,

SIGIR 97 Philadelphia PA.

Amy J.C. Trappey a, Fu-Chiang Hsu a, Charles V. Trappey b, Chia-I. Lin

“Development of a patent document classification and search platform using a

back-propagation network”, Expert Systems with Applications 31 755–765

Xiang Wang, Ruhua Chen ; Yan Jia ; Bin Zhou. “Short Text Classification

using Wikipedia Concept based Document Representation”, 978-1-4799-2876-

/13 $31.00 © 2013 IEEE DOI 10.1109/ITA.2013.114.

Menaka S and Radha N, “ Text Classification using Keyword Extraction

Technique” Volume 3, Issue 12, December 2013.

Lin Chen, Zhou Jie, Li Bi-Cheng “A Text Categorization Framework Based

on Concept Structure”V3-569, 978-1-4244-6349-7/10/2010 IEEE.

Antonia Kyriakopoulou and Theodore Kalamboukis, “Text Classification

Using Clustering” In Proceedings of the ECML-PKDD Discovery Challenge

Workshop, 2006.

Pu Wang , Jian Hu, Hua-Jun Zeng, Zheng Chen. “Using Wikipedia

knowledge to improve text classification”, Published online: 17 September 2008

© Springer-Verlag London Limited 2008, Knowledge Information System

(2009) 19:265–281, DOI 10.1007/s10115-008-0152-4.

Pu Wang and Carlotta Domeniconi “Building Semantic Kernels for Text

Classification using Wikipedia” Proceedings of the 14th ACM SIGKDD

international conference on Knowledge discovery and data mining,Pages 713-

Florian Beil, Martin Ester, Xiaowei Xu, “Frequent Term-Based Text

Clustering” Proceedings of the eighth ACM SIGKDD international conference

on Knowledge discovery and data mining, Pages 436-442

Muhammad Rafi, Sundus Hassan, Mohammad Shahid Shaikh “Contentbased

Text Categorization using Wikitology” International Journal of Computer

Science Issues (IJCSI);Jul2012, Vol. 9 Issue 4, p404.

Jingnian Chen, Houkuan Huang, Shengfeng Tian, Youli Qu, “Feature

selection for text classification with Naïve Bayes” springer on Expert Systems

with Applications 36 (2009) 5432–5435.

Huan Liu and Lei Yu “ Toward Integrating Feature Selection Algorithms

for Classification and Clustering” IEEE Transactions On Knowledge And Data

Engineering, VOL. 17, NO. 4, APRIL 2005.

Wenqian Shang,Houkuan Huang,Haibin Zhu, Yongmin Lin, Youli Qu,

Zhihai Wang , “A novel feature selection algorithm for text categorization”

Science direct journal”, Expert Systems with Applications 33 (2007) 1–5.

Sang-Bum Kim, Hae-Chang Rim, DongSuk Yook, Heui-Seok Lim, “Some

Effective Techniques for Naive Bayes Text Classification”, IEEE Transactions

On Knowledge And Data Engineering, VOL. 18, NO. 11, NOVEMBER 2006.

Andrew kachites and McCallum “Multilabeled text classification with

mixture model trained by Em”Citeseer,1999.

Abdullah Bawakid and Mourad Oussalah, “A Semantic Summarization

System: University of Birmingham at TAC 2008”, Proceedings of the First text

Analysis Conference,November 17-19,2008.

Rafeeque P C and Sendhilkumar S, “A Survey on Short Text Analysis in

Web”, IEEE-ICoAC 2011.

Byron Knoll, “Text Prediction and Classification Using String Matching”,

Journal of Machine Learning Research 2 (2002) 419-444

C. Blake and C. Merz, “UCI repository of machine learning databases,"1998.

R Development Core Team, R: A language and environment for statistic al

computing. R Foundation for Statistical Computing, Vienna, Austria, 2004.

ISBN 3-900051-07-0.

P . J. Hayes and S. P . Weinstein, “CONSTRUE/TIS: a system for contentbased

indexing of a database of news stories," in Second Annual Conference on

Innovative Applications of Artificial Intel ligence, 1990.

T. Joachims, “Text categorization with support vector machines: learning

with many relevant features," tech. rep., University of Dortmund, Fachbereich

Informatik, 1997.

R. Hersh, C. Buckley , T. J. Leone, and D. H. Hick am, Ohsumed: An in

teractive retriev al ev aluation and new large test collection for research," in Pro

c e e dings of the 17th Annual International ACM-SIGIR Conference on Rese ar

ch and Development in Information Retrieval. Dublin, Ir eland, 3-6 July 1994

(Sp e cial Issue of the SIGIR F orum) (W. B. Croft and C. J. v an Rijsbergen,

eds.), pp. 192{201, ACM/Springer, 1994.

S. Scott, “Feature engineering for a symbolic approach to text

classification," Master's thesis, Ottawa, CA, 1998.


  • There are currently no refbacks.

Copyright © IJETT, International Journal on Emerging Trends in Technology