Handling Various Issues In Text Classification : A Review

Sayali Rasane; D. V. Patil

Handling Various Issues In Text Classification : A Review

Sayali Rasane, D. V. Patil

Abstract

Text classification is a process of document classification
depending upon its content into predefined categories. This helps in
providing conceptual views of collection of documents and has
important real world applications. Text classification is the
priliminary reqisite of text retrieval and understanding systems. The
text retrieval system retrieves text in reply to a user defined query.
While the text understanding system transforms text in such a way
that it produces summaries, answer questions or data extract. This
survey provides a brief review of generic text classification
processes, phases of that process, the existing work done on the text
classification and the various methods and algorithms for the
effective text classification.

Full Text:

PDF

References

F. Sebastiani, â€œText categorizationâ€, Alessandro Zanasi (ed.) Text Mining

and its Applications, WIT Press, Southampton, UK, pp. 109-129, 2005.

Y. Yang, â€œAn evaluation of statistical approaches to text categorizationâ€,

Journal of Information Retrieval, 1(1/2):67â€“88, 1999.

M. Ikonomakis, S. Kotsiantis, V. T ampakas, â€œText Classification Using

Machine Learning Techniquesâ€, Wseas Transactions on Computers, Issue 8,

Volume 4, August 2005, pp. 966-974.

L. Douglas Bakerti and Andrew Kachites McCallumlt, â€œDistributional

Clustering of Words for Text Classificationâ€, In Proceedings of the 21st annual

international ACM SIGIR conference on Research and development in

information retrieval, Pages 96-103, 1998

Fernando Pereira, Naftali Tishby, and Lillian Lee. â€œDistributional clustering

of english words.â€ In Proceedings of the 81st Annual Meeting of the Association

for Computational Linguistics, pages 183-90, 1993.

Gabriel Pui Cheong Fung et al. â€œText Classification without Negative

Examples Revisitâ€ IEEE TRANSACTIONS ON KNOWLEDGE AND DATA

ENGINEERING, VOL. 18, NO. 1, JANUARY 2006.

Rongbo Du et al.â€œWeb Filtering Using Text Classificationâ€, IEEE

International Conference on Networks, 28 September - 1 October 2003, 325-

Sabri Serkan GÃ¼llÃ¼oÄŸlu, â€œSegmenting Customers With Data Mining

Techniquesâ€, ISBN: 978-1-4799-6376-8/15/Â©(2015) IEEE.

Manisha Girotra, Kanika Nagpal, Saloni Minocha, Neha Sharma,

â€œComparative Survey on Association Rule Mining Algorithmsâ€, International

Journal of Computer Applications (0975 â€“ 8887) Volume 84 â€“ No (10,

December 2013).

Patricia Morreale, Steve Holtz, Allan Goncalves, â€œData Mining and

Analysis of Large Scale Time Series Network Dataâ€, 2013 27th International

Conference on Advanced Information Networking and Applications Workshops.

Huan Liu, Hiroshi Motoda, Lei Yu , â€œA Selective Sampling Approach to

Active Feature Selectionâ€ ACM Journal on Artificial Intelligence, Volume 159

Issue 1-2, November 2004.

Wei Zhao â€œA New Feature Selection Algorithm in Text Categorization

â€œInternational Symposium on Computer, Communication, Control and

Automation 2010.

Yiming Yang â€œAn Evolution of statistical Approaches to Text

Categorizationâ€ Information Retrieval 1, 69-90 1999.

Kjersti Aas and Line Eikvil â€œText Categorization: A Surveyâ€ Report No.

ISBN 82-539-0425-8, June, 1999.

Rayidi Ghani â€œCombining labeled and unlabeled data for text Classification

with a large number of Categoriesâ€, Data Mining, 2001, ICDM 2001,

Proceedings IEEE International Conference.

J. Han, M. Kamber, â€œData Mining: Concepts and Techniques,â€ Elsevier,

Second Edition, 2006.

J. R. Quinlan, â€œC4.5: Programs for Machine Learningâ€, Morgan Kaufmann

Series in Machine Learning, 1993.

C.Apte, F. Damerau, and S.M. Weiss â€œAutomated Learning of Decision

Rules for Text Categorizationâ€, ACM Transactions on Information Systems,

Sholom M. Weiss Nitin Indurkhya, â€œRule-based Machine Learning

Methods for Functional Predictionâ€, Journal of Artificial Intelligence Research 3

-403 1995

F. Sebastiani, â€œMachine Learning in Automated Text Categorizationâ€,

ACM 2002.

D. Lewis, â€œNaive Bayes at Forty: The Independence Assumption in

Information Retrievalâ€, Proc. ECML-98, 10th European Conf. Machine 1998.

Vidhya. K.A G.Aghila, â€œA Survey of Naive Bayes Machine Learning

approach in Text Document Classificationâ€, (IJCSIS) International Journal of

Computer Science and Information Security, Vol. 7, 2010.

McCallum, A. and Nigam K., "A Comparison of Event Models for Naive

Bayes Text Classification". AAAI/ ICML -98 Workshop on Learning for Text

Categorization.

Sang- Bum Kim, et al, â€œSome Effective Techniques for Naive Bayes Text

Classification â€œIEEE Transactions on Knowledge and Data Engineering, Vol.

, November 2006.

Yirong Shen and Jing Jiangâ€ Improving the Performance of Naive Bayes

for Text Classificationâ€CS224N Spring 2003.

Michael J. Pazzani â€œSearching for dependencies in Bayesian classifiersâ€

Proceedings of the Fifth Int. workshop on AI and, Statistics. Pearl, 1988.

Dino Isa â€œText Document Pre-Processing Using the Bayes Formula for

Classification Based on the Vector Space Modeâ€, Computer and Information

Science November, 2008.

Bayes Jingnian Chen a, b, Houkuan Huang a, Shengfeng Tian a, Youli Qua

a â€œFeature selection for text classification with Naiveâ€, China Expert Systems

with Applications 36 5432â€“54352009.

Joachims, T. â€œText categorization with support vector machines: learning

with many relevant featuresâ€. In Proceedings of ECML-98, 10th European

Conference on Machine Learning (Chemnitz, DE), pp. 137â€“142 1998.

Y. Y. X. Liu, â€œA re-examination of Text categorization Methodsâ€ IGIR-99,

Chen donghui Liu zhijing, â€œA new text categorization method based on

HMM and SVMâ€, IEEE2010

Loubes, J. M. and van de Geer, S â€œSupport vector machines and the Bayes

rule in classificationâ€, Data mining knowledge and discovery 6 259-275.2002.

Yu-ping Qin Xiu-kun Wang, â€œStudy on Multi-label Text Classification

Based on SVMâ€ Sixth International Conference on Fuzzy Systems and

Knowledge Discovery 2009

Dagan, I., Karov, Y., and Roth, D. â€œMistake-Driven Learning in Text

Categorization.â€ In Proceedings of CoRR. 1997

MIgual E .Ruiz, Padmini Srinivasn, â€œAutomatic Text Categorization Using

Neural networksâ€, Advaces in Classification Research, Volume VIII.

Cheng Hua Li , Soon Choel Park â€œAn efficient document classification

model using an improved back propagation neural network and singular value

decompositionâ€, Expert Systems with Applications, 3208â€“3215, 2009.

Hwee TOU Ng Wei Boon Goh Kok Leong Low, â€œFeature Selection,

Perception Learning, and a Usability Case Study for Text Categorizationâ€,

SIGIR 97 Philadelphia PA.

Amy J.C. Trappey a, Fu-Chiang Hsu a, Charles V. Trappey b, Chia-I. Lin

â€œDevelopment of a patent document classification and search platform using a

back-propagation networkâ€, Expert Systems with Applications 31 755â€“765

Xiang Wang, Ruhua Chen ; Yan Jia ; Bin Zhou. â€œShort Text Classification

using Wikipedia Concept based Document Representationâ€, 978-1-4799-2876-

Menaka S and Radha N, â€œ Text Classification using Keyword Extraction

Techniqueâ€ Volume 3, Issue 12, December 2013.

Lin Chen, Zhou Jie, Li Bi-Cheng â€œA Text Categorization Framework Based

on Concept Structureâ€V3-569, 978-1-4244-6349-7/10/2010 IEEE.

Antonia Kyriakopoulou and Theodore Kalamboukis, â€œText Classification

Using Clusteringâ€ In Proceedings of the ECML-PKDD Discovery Challenge

Workshop, 2006.

Pu Wang , Jian Hu, Hua-Jun Zeng, Zheng Chen. â€œUsing Wikipedia

knowledge to improve text classificationâ€, Published online: 17 September 2008

(2009) 19:265â€“281, DOI 10.1007/s10115-008-0152-4.

Pu Wang and Carlotta Domeniconi â€œBuilding Semantic Kernels for Text

Classification using Wikipediaâ€ Proceedings of the 14th ACM SIGKDD

international conference on Knowledge discovery and data mining,Pages 713-

Florian Beil, Martin Ester, Xiaowei Xu, â€œFrequent Term-Based Text

Clusteringâ€ Proceedings of the eighth ACM SIGKDD international conference

on Knowledge discovery and data mining, Pages 436-442

Muhammad Rafi, Sundus Hassan, Mohammad Shahid Shaikh â€œContentbased

Text Categorization using Wikitologyâ€ International Journal of Computer

Science Issues (IJCSI);Jul2012, Vol. 9 Issue 4, p404.

Jingnian Chen, Houkuan Huang, Shengfeng Tian, Youli Qu, â€œFeature

selection for text classification with NaÃ¯ve Bayesâ€ springer on Expert Systems

with Applications 36 (2009) 5432â€“5435.

Huan Liu and Lei Yu â€œ Toward Integrating Feature Selection Algorithms

for Classification and Clusteringâ€ IEEE Transactions On Knowledge And Data

Engineering, VOL. 17, NO. 4, APRIL 2005.

Wenqian Shang,Houkuan Huang,Haibin Zhu, Yongmin Lin, Youli Qu,

Zhihai Wang , â€œA novel feature selection algorithm for text categorizationâ€

Science direct journalâ€, Expert Systems with Applications 33 (2007) 1â€“5.

Sang-Bum Kim, Hae-Chang Rim, DongSuk Yook, Heui-Seok Lim, â€œSome

Effective Techniques for Naive Bayes Text Classificationâ€, IEEE Transactions

On Knowledge And Data Engineering, VOL. 18, NO. 11, NOVEMBER 2006.

Andrew kachites and McCallum â€œMultilabeled text classification with

mixture model trained by Emâ€Citeseer,1999.

Abdullah Bawakid and Mourad Oussalah, â€œA Semantic Summarization

System: University of Birmingham at TAC 2008â€, Proceedings of the First text

Analysis Conference,November 17-19,2008.

Rafeeque P C and Sendhilkumar S, â€œA Survey on Short Text Analysis in

Webâ€, IEEE-ICoAC 2011.

Byron Knoll, â€œText Prediction and Classification Using String Matchingâ€,

Journal of Machine Learning Research 2 (2002) 419-444

C. Blake and C. Merz, â€œUCI repository of machine learning databases,"1998.

R Development Core Team, R: A language and environment for statistic al

computing. R Foundation for Statistical Computing, Vienna, Austria, 2004.

ISBN 3-900051-07-0.

P . J. Hayes and S. P . Weinstein, â€œCONSTRUE/TIS: a system for contentbased

indexing of a database of news stories," in Second Annual Conference on

Innovative Applications of Artificial Intel ligence, 1990.

T. Joachims, â€œText categorization with support vector machines: learning

with many relevant features," tech. rep., University of Dortmund, Fachbereich

Informatik, 1997.

R. Hersh, C. Buckley , T. J. Leone, and D. H. Hick am, Ohsumed: An in

teractive retriev al ev aluation and new large test collection for research," in Pro

c e e dings of the 17th Annual International ACM-SIGIR Conference on Rese ar

ch and Development in Information Retrieval. Dublin, Ir eland, 3-6 July 1994

(Sp e cial Issue of the SIGIR F orum) (W. B. Croft and C. J. v an Rijsbergen,

eds.), pp. 192{201, ACM/Springer, 1994.

S. Scott, â€œFeature engineering for a symbolic approach to text

classification," Master's thesis, Ottawa, CA, 1998.

Refbacks

There are currently no refbacks.

Username
Password
Remember me