An Approach to Enhance Text Categorization through Shrinkage in a Hierarchy of Modules

Authors

  • Takudzwa Fadziso Chinhoyi University of Technology

DOI:

https://doi.org/10.18034/abcjar.v8i2.562

Keywords:

Text Categorization, Shrinkage, Naïve Bayes, Hierarchy of modules

Abstract

Most organizations carried out their activities by design and develop a large volume of programmed documents as an essential element of their external and internal performance. When documents are well-known in a large volume of subject matter classification, the classifications are frequently prepared in order. Newsgroup and yahoo databases are two cases studied. This article indicates that the precision of a naïve Bayes text classifier can be importantly enhanced by taking benefit of a hierarchy of categories. A statistical approach known as shrinkage was adopted that levels variable prediction of a data-sparse child with its blood relation in direction to acquire more vigorous variable predictions. The test results on 3 real-time datasets from Yahoo, UseNet, and shared webpages display enhanced performance with about 29% error reduction over the customarily flat classifier.

Metrics

Metrics Loading ...

Downloads

Download data is not yet available.

Author Biography

  • Takudzwa Fadziso, Chinhoyi University of Technology
    Institute of Lifelong Learning and Development Studies, Chinhoyi University of Technology, ZIMBABWE

References

Carlin, B. and Louis, T. (1996). Bayes and Empirical Bayes Methods for Data Analysis. Chapman and Hall.

Cohen, W. W. (1995). Fast effective rule induction, in ‘International Conference on Machine Learning’, pp. 115–123. DOI: https://doi.org/10.1016/B978-1-55860-377-6.50023-2

D’Alessio, S., Murray, K., Schiaffino, R. & Kershenbaum, A. (2000). The effect of using hierarchical classifiers in text categorization, in ‘Proc. of the 6th Int. Conf. “Recherche d’Information Assistee par Ordinateur”’, Paris, FR, pp. 302–313.

Dempster, AP., Laird, NM., and Rubin, DB. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39:1-38. DOI: https://doi.org/10.1111/j.2517-6161.1977.tb01600.x

Dumais, S. T. & Chen, H. (2000). Hierarchical classification of Web content, in ‘Proc. of the 23rd ACM Int. Conf. on Research and Development in Information Retrieval (SIGIR)’, Athens, GR, pp. 256–263. DOI: https://doi.org/10.1145/345508.345593

EePeng, LIM., Aixin, SUN. and Wee-Keong, NG. (2003). Performance measurement framework for hierarchical text classification. Journal of the American Society for Information Science and Technology (JASIST). 54, (11), 1014-1028. Research Collection School Of Information Systems. Available at: https://ink.library.smu.edu.sg/sis_research/166 DOI: https://doi.org/10.1002/asi.10298

Ganapathy, A. (2016). Speech Emotion Recognition Using Deep Learning Techniques. ABC Journal of Advanced Research, 5(2), 113-122. https://doi.org/10.18034/abcjar.v5i2.550 DOI: https://doi.org/10.18034/abcjar.v5i2.550

Ganapathy, A. (2017). Friendly URLs in the CMS and Power of Global Ranking with Crawlers with Added Security. Engineering International, 5(2), 87-96. https://doi.org/10.18034/ei.v5i2.541 DOI: https://doi.org/10.18034/ei.v5i2.541

Ganapathy, A. (2018). Cascading Cache Layer in Content Management System. Asian Business Review, 8(3), 177-182. https://doi.org/10.18034/abr.v8i3.542 DOI: https://doi.org/10.18034/abr.v8i3.542

Ganapathy, A., & Neogy, T. K. (2017). Artificial Intelligence Price Emulator: A Study on Cryptocurrency. Global Disclosure of Economics and Business, 6(2), 115-122. https://doi.org/10.18034/gdeb.v6i2.558 DOI: https://doi.org/10.18034/gdeb.v6i2.558

James, W. and Stein, C. (1961). Estimation with quadratic loss. In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability 1. Pp. 361-379. University of California Press.

Joachims, T. (1997). A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization. In International Conference on Machine Learning (ICML)

Koller, D. & Sahami, M. (1997). Hierarchically classifying documents using very few words, in ‘Proc. of the 14th Int. Conf. on Machine Learning. Nashville, US, pp. 170–178.

Labrou, Y. & Finin, T. W. (1999). Yahoo! as an ontology: Using Yahoo! categories to describe documents, in ‘Proc. of the 8th Int. Conf. on Information Knowledge Management. Kansas City, MO, pp. 180–187. DOI: https://doi.org/10.1145/319950.319976

Lewis, D. and Ringuette, M. (1994). A comparison of two learning algorithms for text categorization. In Third Annual Symposium on Document Analysis and Information Retrieval. Pp. 81-93.

McCallum, A. K. & Nigam, K. (1998). A comparison of event models for Na¨ıve Bayes text classification, in ‘Proc. of the Workshop on Text Categorization (AAAI98)’, Madison, WI, pp. 41–48.

McCallum, A. K., Rosenfeld, R., Mitchell, T. M. & Ng, A. Y. (1998). Improving text classification by shrinkage in a hierarchy of classes, in ‘Proc. of the 15th Int. Conf. on Machine Learning’, Madison, US, 359–367.

Nigam, K., McCallum, A., Thrun, S. and Mitchell, T. (1998). Learning to classify text from labeled and unlabeled documents. In Submitted to AAI-98. http://www.cs.cinn.edu/~mccallum. DOI: https://doi.org/10.21236/ADA350490

Paruchuri, H. (2017). Credit Card Fraud Detection using Machine Learning: A Systematic Literature Review. ABC Journal of Advanced Research, 6(2), 113-120. https://doi.org/10.18034/abcjar.v6i2.547 DOI: https://doi.org/10.18034/abcjar.v6i2.547

Paruchuri, H. (2018). AI Health Check Monitoring and Managing Content Up and Data in CMS World. Malaysian Journal of Medical and Biological Research, 5(2), 141-146. https://doi.org/10.18034/mjmbr.v5i2.554 DOI: https://doi.org/10.18034/mjmbr.v5i2.554

Paruchuri, H., & Asadullah, A. (2018). The Effect of Emotional Intelligence on the Diversity Climate and Innovation Capabilities. Asia Pacific Journal of Energy and Environment, 5(2), 91-96. https://doi.org/10.18034/apjee.v5i2.561 DOI: https://doi.org/10.18034/apjee.v5i2.561

Salton, G. (1991). Developments in automatic text retrieval. Science, 253: 974-979. DOI: https://doi.org/10.1126/science.253.5023.974

Sasaki, M. & Kita, K. (1998). Rule-based text categorization using hierarchical categories, in ‘Proc. of the IEEE Int. Conf. on Systems, Man, and Cybernetics. La Jolla, US, 2827–2830.

Stein, C. (1955). Inadmissibility of the usual estimator for the mean of a multivariate normal distribution. In Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability 1. Pp. 197-206. University of California Press. DOI: https://doi.org/10.1525/9780520313880-018

Toutanova, K., Chen, F., Popat, K. & Hofmann, T. (2001). Text classification in a hierarchical mixture model for small training sets, in ‘Proc. of the 10th Int. Conf. on Information and Knowledge Management. Atlanta, USA, pp. 105–112. DOI: https://doi.org/10.1145/502585.502604

Vadlamudi, S. (2015). Enabling Trustworthiness in Artificial Intelligence - A Detailed Discussion. Engineering International, 3(2), 105-114. https://doi.org/10.18034/ei.v3i2.519 DOI: https://doi.org/10.18034/ei.v3i2.519

Vadlamudi, S. (2016). What Impact does Internet of Things have on Project Management in Project based Firms?. Asian Business Review, 6(3), 179-186. https://doi.org/10.18034/abr.v6i3.520 DOI: https://doi.org/10.18034/abr.v6i3.520

Vadlamudi, S. (2017). Stock Market Prediction using Machine Learning: A Systematic Literature Review. American Journal of Trade and Policy, 4(3), 123-128. https://doi.org/10.18034/ajtp.v4i3.521 DOI: https://doi.org/10.18034/ajtp.v4i3.521

Vadlamudi, S. (2018). Agri-Food System and Artificial Intelligence: Reconsidering Imperishability. Asian Journal of Applied Science and Engineering, 7(1), 33-42. Retrieved from https://journals.abc.us.org/index.php/ajase/article/view/1192

Wang, K., Zhou, S. & He, Y. (2001). Hierarchical classification of real life documents, in ‘Proc. of the 1st SIAM Int. Conf. on Data Mining. Chicago, USA. DOI: https://doi.org/10.1137/1.9781611972719.22

Wang, K., Zhou, S. & Liew, S. C. (1999). Building hierarchical classifiers using class proximity, in ‘Proc. of the 25th Int. Conf. on Very Large Data Bases. Edinburgh, UK, 363–374.

Yang, Y. (1999). An evaluation of statistical approaches to text categorization. Information Retrieval, 1(1-2), 69–90. DOI: https://doi.org/10.1023/A:1009982220290

--0--

Downloads

Published

2019-12-31

How to Cite

Fadziso, T. (2019). An Approach to Enhance Text Categorization through Shrinkage in a Hierarchy of Modules. ABC Journal of Advanced Research, 8(2), 123-130. https://doi.org/10.18034/abcjar.v8i2.562

Most read articles by the same author(s)

1 2 3 4 5 6 7 8 9 10 > >>