信息检索导论（英文版）_（美），曼宁，（美），拉哈万，（德），舒策著_9787115218247

信息检索导论（英文版）

定价：¥69

中教价：¥54.51 (7.90折）

库存数： 0

丛书名：图灵原版计算机科学系列

购买数量：

《信息检索导论(英文版)》是信息检索的教材，旨在从计算机科学的视角提供一种现代的信息检索方法。书中从基本概念讲解网络搜索以及文本分类和文本聚类等，对收集、索引和搜索文档系统的设计和实现的方方面面、评估系统的方法、机器学习方法在文本收集中的应用等给出了最新的讲解。
书中所有重要的思想都是用示例进行解释，图文并茂。《信息检索导论(英文版)》非常适合作为计算机科学及相关专业的高年级本科生和研究生的“信息检索”课程的入门教材，当然也同样适合研究人员和专业人士阅读。

As recently as the 1990s， studies showed that most people preferred getting information from other people rather than from information retrieval OR） systems. Of course， in that time period， most people also used human travel agents to book their travel. However， during the last decade， relentless opti- mization of information retrieval effectiveness has driven web search engines to new quality levels at which most people are satisfied most of the time， and web search has become a standard and often preferred source of information finding. For example， the 2004 Pew Internet Survey （Fallows 2004） found that "92% of Internet users say the Internet is a good place to go for getting everyday information." To the surprise of many， the feld of information re- trieval has moved from being a primarily academic discipline to being the basis underlying most peoples preferred means of information access. This book presents the scientific underpinnings of this field， at a level accessible to graduate students as well as advanced undergraduates.
Information retrieval did not begin with the Web. In response to various challenges of providing information access， the field of IR evolved to give principled approaches to searching various forms of content. The field be- gan with scientific publications and library records but soon spread to other forms of content， particularly those of information professionals， such as journalists， lawyers， and doctors. Much of the scientific research on IR has occurred in these contexts， and much of the continued practice of IR deals with providing access to unstructured information in various corporate and governmental domains， and this work forms much of the foundation of our book.

    An example information retrieval problem
    A fat book that many people own is Shakespeares Collected Works.Suppose you wanted to determine which plays of Shakespeare contain the words Brutus AND Caesar AND NOT Calpurnia.One way to do that is to start at the beginning and to read through all the text，noting for each play whether it contains Brutus and Caesar and excluding it from consideration if it contains Calpurnia.The simplest form of document retrieval is for a computer to do this sort of linear scan through documents.This process is commonly referred to as grepping through text，after the Unix command g r e p，which performs this process.Grepping through text can be a very effective process， especially given the speed of modem computers，and often allows useful possibilities for wildcard pattern matching through the use of regular expressions.With modem computers.for simple querying of modest collections （the size of Shakespeares Collected Works is a bit under one million words of text in total），you really need nothing more.
    But for many purposes，you do need more：
    1.To process large document collections quickly.The amount of online data has grown at least as quickly as the speed of computers，and we would now like to be able to search collections that total in the order of biHions to trillions of words.
    2.To allow more flexible matching operations.For example，it is impractical to perform the query Romans NEAR countrymen with g r e p，where NEAR might be defined as within 5 words or within the same sentence？
    3.To allow ranked retrieval.In many cases，you want the best answer to an information need among many documents that contain certain words. The way to avoid linearly scanning the texts for each query is to index the documents in advance.Let us stick with Shakespeares Collected Works，and use it to introduce the basics of the Boolean retrieval model.Suppose we record foreachdocument—here aplayofShakespeare’s—whetheritcontainseach word out of all the words Shakespeare used（Shakespeare used about 32，000 different words）.The result is a binary term—document incidence matrix,as in  Figure 1.1.Terms are the indexed units（further discussed in Section 2.2）;they are usuany words,and for the moment you can think of them as wordsf but the information retrieval literature normally speaks of terms because some of them，such as perhaps I-9 or Hong Kong are not usuaHy thought of as words.

你还可能感兴趣

我要评论

您的姓名	验证码：
留言内容