Welcome to Journal of Beijing Institute of Technology
Volume 23Issue 1
.
Turn off MathJax
Article Contents
JIANG Yun-chen, LUO Sen-lin, HAN Lei, PAN Li-min. Text retrieval algorithm that decreases confusion[J]. JOURNAL OF BEIJING INSTITUTE OF TECHNOLOGY, 2014, 23(1): 108-116.
Citation: JIANG Yun-chen, LUO Sen-lin, HAN Lei, PAN Li-min. Text retrieval algorithm that decreases confusion[J].JOURNAL OF BEIJING INSTITUTE OF TECHNOLOGY, 2014, 23(1): 108-116.

Text retrieval algorithm that decreases confusion

  • Received Date:2012-12-18
  • To overcome the problem that the confusion between texts limits the precision in text retrieval, a new text retrieval algorithm that decrease confusion (DCTR) is proposed. The algorithm constructs the searching template to represent the user's searching intention through positive and negative training. By using the prior probabilities in the template, the supported probability and anti-supported probability of each text in the text library can be estimated for discrimination. The searching result can be ranked according to similarities between retrieved texts and the template. The complexity of DCTR is close to term frequency and mversed document frequency (TF-IDF). Its distinguishing ability to confusable texts could be advanced and the performance of the result would be improved with increasing of training times.
  • loading
  • [1]
    Ding Guodong, Bai Shuo, Wang Bin. A survey of statistical language modeling for text retrieval[J]. Journal of Computer Research and Development, 2006, 43(5): 769-776. (in Chinese)
    [2]
    Liu Haifeng, Wang Yuanyuan. Research of several problems in text retrieval based on VSM[J]. Journal of Information, 2006, 25(10): 57-59. (in Chinese)
    [3]
    Lee Changki, Lee Gary Geunbae. Probabilistic information retrieval model for a dependency structured indexing system[J]. Information Processing and Management, 2005, 41(2): 161-175.
    [4]
    Zaragoza Hugo, Hiemstra Djoerd, Tipping Michael. Bayesian extension to the language model for Ad Hoc information retrieval[C]//Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, 2003: 4-9.
    [5]
    Roelleke T, Wang Jun. TF-IDF Uncovered: a study of theories and probabilities[C]//Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, 2008: 435-442.
    [6]
    Gao Jing, Zhang Jun. Clustered SVD strategies in latent semantic indexing[J]. Information Processing and Management, 2005, 41(5): 1051-1063.
    [7]
    Zhai Chengxiang.Statistical language models for information retrieval a critical review[J]. Foundations and Trends in Information Retrieval, 2008, 2(3): 137-213.
    [8]
    Ponte J M, Croft W B. A language modeling approach to information retrieval[C]//Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, 1998: 275-281.
    [9]
    Hofmann T. Probabilistic latent semantic indexing[C]//Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, 1999: 50-57.
    [10]
    Lafferty J, Zhai Chengxiang. Document language models, query models, and risk minimization for information retrieval[C]//Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, 2001: 111-119.
    [11]
    Shakery A, Zhai ChengXiang. A probabilistic relevance propagation model for hypertext retrieval[C]//Proceedings of the 15th ACM International Conference on Information and Knowledge Management, New York, 2006: 550-558.
    [12]
    Zhang Wenjin.Probabilistic model of text information retrieval[J]. Journal of Information, 2005, 25(3): 107-110. (in Chinese)
    [13]
    Song Fei, Croft W B. A general language model for information retrieval[C]//Proceedings of the 8 thInternational Conference of Information and Knowledge Management, New York, 1999: 316-321.
    [14]
    Zhai C, Lafferty J. A study of smoothing methods for language models applied to information retrieval[J]. ACM Transaction on Information Systems, 2004, 22(2): 179-214.
    [15]
    MacKay D, Peto L. A hierarchical dirichlet language model[J]. Natural Language Engineering, 1995, 1(3): 289-307.
    [16]
    Ney H, Essen U, Kneser R. On structuring probabilistic dependencies in stochastic language modeling[J]. Computer Speech and Language, 1994, 8(1): 1-28.
    [17]
    Gai Jie, Wang Yi, Wu Gangshan. The theory and application of latent semantic analysis[J]. Computer Application Research, 2004, 21(3): 9-12. (in Chinese)
    [18]
    Zhang Min, Ma Shaoping, Song Ruihua.DF or IDF? on the use of primary feature model for web information retrieval [J]. Journal of Software, 2005, 16(5): 1012-1020. (in Chinese)
    [19]
    Efron M. Query expansion and dimensionality reduction: notions of optimality in rocchio relevance feedback and latent semantic indexing[J]. Information Processing and Management, 2008, 44(1): 163-180.
  • 加载中

Catalog

    通讯作者:陈斌, bchen63@163.com
    • 1.

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (779) PDF downloads(17) Cited by()
    Proportional views
    Related

    /

      Return
      Return
        Baidu
        map