信息檢索

出版時(shí)間:2009-10  出版社:人民郵電出版社  作者:(美)格羅斯曼,(美)弗里德 著  頁數(shù):332  
Tag標(biāo)簽:無  

內(nèi)容概要

本書是“信息檢索”課程的優(yōu)秀教材,書中對(duì)信息檢索的概念、原理和算法進(jìn)行了詳細(xì)介紹,內(nèi)容主要包括檢索策略、檢索實(shí)用工具、跨語言信息檢索、查詢處理、集成結(jié)構(gòu)化及數(shù)據(jù)和文本、并行信息檢索以及分布式信息檢索等,并給出了闡述算法的大量實(shí)例。    本書有一定的深度和廣度,而且所有的內(nèi)容都用當(dāng)前的技術(shù)闡述,是高等院校計(jì)算機(jī)及信息管理等相關(guān)專業(yè)本科生和研究生的理想教材,對(duì)信息檢索領(lǐng)域的科研和技術(shù)人員也是很好的參考書。

作者簡介

格羅斯曼(David A.Grossman),佐治亞梅森大學(xué)博士。現(xiàn)在伊利諾伊理工大學(xué)計(jì)算機(jī)系任教。曾在美國政府部門高級(jí)技術(shù)服務(wù)中心和研究發(fā)展辦公室擔(dān)任項(xiàng)目經(jīng)理。主要研究領(lǐng)域包括信息檢索、結(jié)構(gòu)化與非結(jié)構(gòu)化數(shù)據(jù)集成以及數(shù)據(jù)挖掘。

書籍目錄

1. INTRODUCTION2. RETRIEVAL STRATEGIES 2.1  Vector Space Model 2.2  Probabilistic Retrieval Strategies 2.3  Language Models 2.4  Inference Networks 2.5  Extended Boolean Retrieval 2.6  Latent Semantic Indexing 2.7  Neural Networks 2.8  Genetic Algorithms 2.9  Fuzzy Set Retrieval 2.10 Summary 2.11 Exercises3. RETRIEVAL UTILITIES 3.1  Relevance Feedback 3.2  Clustering 3.3  Passage-based Retrieval 3.4  N-grams 3.5  Regression Analysis 3.6  Thesauri 3.7  Semantic Networks 3.8  Parsing 3.9  Summary 3.10 Exercises4.  CROSS-LANGUAGE INFORMATION RETRIEVAL 4.1  Introduction 4.2  Crossing the Language Barrier 4.3  Cross-Language Retrieval Strategies 4.4  Cross Language Utilities 4.5  Summary 4.6  Exercises5. EFFICIENCY 5.1  Inverted Index 5.2  Query Processing 5.3  Signature Files 5.4  Duplicate Document Detection 5.5  Summary 5.6  Exercises6. INTEGRATING STRUCTURED DATA AND TEXT 6.1  Review of the Relational Model 6.2  A Historical Progression 6.3  Information Retrieval as a Relational Application 6.4  Semi-Structured Search using a Relational Schema 6.5  Multi-dimensional Data Model 6.6  Mediators 6.7  Summary 6.8  Exercises7.  PARALLEL INFORMATION RETRIEVAL 7.1  Parallel Text Scanning 7.2  Parallel Indexing 7.3  Clustering and Classification 7.4  Large Parallel Systems 7.5  Summary 7.6  Exercises8. DISTRIBUTED INFORMATION RETRIEVAL 8.1  A Theoretical Model of Distributed Retrieval 8.2  Web Search 8.3  Result Fusion 8.4  Peer-to-Peer Information Systems 8.5  Other Architectures 8.6  Summary 8.7  Exercises9. SUMMARY AND FUTURE DIRECTIONSReferencesIndex

章節(jié)摘錄

  3.4.1  DAmore and Mah  Initial information retrieval research focused on n-grams as presented in[DAmore and Mah, 1985]. The motivation behind their work was the fact thatit is difficult to develop mathematical models for terms since the potential fora term that has not been seen before is infinite. With n-grams, only a fixednumber of n-grams can exist for a given value of n. A mathematical modelwas developed to estimate the noise in indexing and to determine appropriatedocument similarity measures.  DAmore and Mahs method replaces terms with n-grams in the vector spacemodel. The only remaining issue is computing the weights for each n-gram.Instead of simply using n-gram frequencies, a scaling method is used to nor-malize the length of the document. DAmore and Mahs contention was that alarge document contains more n-grams than a small document, so it should bescaled based on its length.  To compute the weights for a given n-gram, DAmore and Mah estimatedthe number of occurrences of an n-gram in a document. The first simplifyingassumption was that n-grams occur with equal likelihood and follow a binomialdistribution. Hence, it was no more likely for n-gram "ABC" to occur than"DEE" The Zipfian distribution that is widely accepted for terms is not true forn-grams. DAmore and Mah noted that n-grams are not equally likely to occur,but the removal of frequently occurring terms from the document collectionresulted in n-grams that follow a more binomial distribution than the terms.  DAmore and Mah computed the expected number of occurrences of an n-gram in a particular document. This is the product of the number of n-gramsin the document (the document length) and the probability that the n-gramoccurs. The n-grams probability of occurrence is computed as the ratio ofits number of occurrences to the total number of n-grams in the document.DAmore and Mah continued their application of the binomial distribution toderive an expected variance and, subsequently。

媒體關(guān)注與評(píng)論

  “本書涉及最新的研究成果,語言經(jīng)得起推敲,還精心準(zhǔn)備了大量的實(shí)例說明,適合作為研究生和本科生信息檢索課程的首選教材。”  ——美國馬薩諸塞大學(xué)阿默斯特校區(qū)計(jì)算機(jī)系杰出教授 W.Bruce Croft  “推薦把本書作為計(jì)算機(jī)科學(xué)專業(yè)學(xué)生的首選教材,同時(shí)也適用于SE0專業(yè)人員和Web開發(fā)者閱讀,將搜索技術(shù),算法和啟發(fā)式方法運(yùn)用于他們的項(xiàng)目中?!薄  畔⒓夹g(shù)與服務(wù)顧問 E.Garcia博士

編輯推薦

  隨著Google、百度等搜索引擎公司的崛起,信息檢索已經(jīng)成為令人振奮的熱門研究領(lǐng)域?!  缎畔z索:算法與啟發(fā)式方法(英文版·第2版)》從發(fā)展的角度描述了ad hoc信息檢索,討論了用來實(shí)現(xiàn)大規(guī)模數(shù)據(jù)檢索的最新算法,詳細(xì)介紹了推理網(wǎng)絡(luò)和系統(tǒng)的效率,并且對(duì)每種方法都給出了詳細(xì)可行的實(shí)例。此外,《信息檢索:算法與啟發(fā)式方法(英文版·第2版)》整合了結(jié)構(gòu)化和非結(jié)構(gòu)化數(shù)據(jù)的處理技術(shù),這是其他教材所不具備的?! 〉?版新增加了IR語言模型和跨語言檢索,還討論了許多當(dāng)前的熱點(diǎn)話題,如XML、P2P信息檢索、文本查重、文檔并行聚類、不同檢索策略的融合、信息中間表示等。  《信息檢索:算法與啟發(fā)式方法(英文版·第2版)》兼顧了學(xué)科廣度和主題深度,把握了最新的發(fā)展趨勢,是信息檢索領(lǐng)域的一本名著,更為許多著名高校(如美國普林斯頓大學(xué)、羅格斯大學(xué))采用為教材?! ‰S著Google、百度等搜索引擎公司的崛起,信息檢索已經(jīng)成為令人振奮的熱門研究領(lǐng)域。  《信息檢索:算法與啟發(fā)式方法(英文版·第2版)》從發(fā)展的角度描述了ad hoc信息檢索,討論了用來實(shí)現(xiàn)大規(guī)模數(shù)據(jù)檢索的最新算法,詳細(xì)介紹了推理網(wǎng)絡(luò)和系統(tǒng)的效率,并且對(duì)每種方法都給出了詳細(xì)可行的實(shí)例。此外,《信息檢索:算法與啟發(fā)式方法(英文版·第2版)》整合了結(jié)構(gòu)化和非結(jié)構(gòu)化數(shù)據(jù)的處理技術(shù)。這是其他教材所不具備的?! 〉?版新增加了IR語言模型和跨語言檢索,還討論了許多當(dāng)前的熱點(diǎn)話題,如XML、P2P信息檢索、文本查重、文檔并行聚類、不同檢索策略的融合、信息中間表示等?!  缎畔z索:算法與啟發(fā)式方法(英文版·第2版)》兼顧了學(xué)科廣度和主題深度,把握了最新的發(fā)展趨勢,是信息檢索領(lǐng)域的一本名著,更為許多著名高校(如美國普林斯頓大學(xué)、羅格斯大學(xué))采用為教材。

圖書封面

圖書標(biāo)簽Tags

評(píng)論、評(píng)分、閱讀與下載


    信息檢索 PDF格式下載


用戶評(píng)論 (總計(jì)3條)

 
 

  •   還不錯(cuò),值得一讀,算法寫的不錯(cuò)。
  •   呵呵呵呵,好啊
  •   對(duì)我的學(xué)習(xí)幫助很大
 

250萬本中文圖書簡介、評(píng)論、評(píng)分,PDF格式免費(fèi)下載。 第一圖書網(wǎng) 手機(jī)版

京ICP備13047387號(hào)-7