LIBR 558 Information Retrieval Systems: Structures and Algorithms -- COURSE SYLLABUS (3)
Program: School of Library, Archival and Information Studies/ MLIS
Year: 2008-2009
Course Schedule: Tuesday, 2:00 - 4:50 p.m.
Location: IKBLC 460
Instructor: Dr. Edie Rasmussen
Office location: Room 477, SLAIS, Irving K. Barber Learning Centre
Office phone: 604-827-5486
Office hours: Thursday, 10:00 a.m. – noon
E-mail address: edie.rasmussen@ubc.ca
Course website: http://www.slais.ubc.ca/courses/libr558/08-09-wt2/index.htm
Course Goal: To provide an introduction to the methods used in the storage and retrieval of textual, pictorial, graphic, and voice data.
Course Objectives:
Upon completion of this course students will:
- understand the complexity of information retrieval ;
- understand the functions of an information retrieval system;
- be able to understand and measure the contribution of the components of an information retrieval system to its performance;
- be able to isolate the factors which optimize the information retrieval process;
- be aware of current issues in information retrieval, including search engines.
Course Topics:
- Documents and queries
- Information retrieval models
- Evaluating information retrieval systems
- Implementing information retrieval systems
- Improving effectiveness of information retrieval systems
- Multimedia information retrieval systems
- Information retrieval on the WWW
- Users and information retrieval
Format of the course: Lectures, presentations, guest speakers, lab sessions
Required and Recommended Reading:
Readings by Week:
Week 1: Information Retrieval Systems and their Design
Griffiths, J.-M. and King, D.W. (2002). US information retrieval system evolution and evaluation (1945-1975). IEEE Annals of the History of Computing, 24(30: 35-55. [Available at http://portal.acm.org/citation.cfm?id=604214 ]
Lesk, M. (1996). The Seven Ages of Information Retrieval. UDT Occasional Papers #5. [Available at http://www.ifla.org/VI/5/op/udtop5/udt-op5.pdf ]
Week 2: Documents and Queries Representing Document Content
Croft, B., Metzler, D., and Strohman, T. (in press). Ch. 4: Processing Text. In: Search Engines: Information Retrieval in Practice. . [Available at http://www.pearsonhighered.com/croft1epreview/samples.html ]
Manning, C.D., Raghavan, P., and Schϋtze (2008). Ch. 2: The term vocabulary and postings lists. In. Introduction to Information Retrieval. Cambridge : Cambridge University Press. Pp. 18-44. [Available at http://nlp.stanford.edu/IR-book/pdf/02voc.pdf ]
Week 3: Information Retrieval Models I (Boolean, Vector Space)
Manning, C.D., Raghavan, P., and Schϋtze (2008). Ch. 1: Boolean retrieval. In. Introduction to Information Retrieval. Cambridge : Cambridge University Press. Pp. 1-17. [Available at http://nlp.stanford.edu/IR-book/pdf/01bool.pdf ]
Valery I. F., Shapiro, J.; Taksa, I. & Voiskunskii, V. G. (1999). Boolean Search: Current State and Perspectives. Journal of the American Society for Information Science, 50(1), 86-95. [Available at http://www3.interscience.wiley.com/cgi-bin/fulltext/30002143/PDFSTART ]
Week 4: Information Retrieval Models II (Probabilistic, Language Modelling)
Crestani, F., Lalmas, M., Van Rijsbergen, C.J., and Campbell, I. “Is This Document Relevant? . . . Probably”: A Survey of Probabilistic Models in Information Retrieval. ACM Computing Surveys 30(4): 528-552. [Available at http://delivery.acm.org/10.1145/300000/299920/p528-crestani.pdf?key1=299920&key2=1275588221&coll=GUIDE&dl=GUIDE&CFID=14528348&CFTOKEN=44641854 ]
Lemur Project Tutorials: Starting Out: Overview: Language Models and Information Retrieval. [Available at http://www.lemurproject.org/tutorials/begin_overview-3.php?version=print ]
Liu, X and Croft, W.B. (2005). Statistical language modeling for information retrieval. Annual Review of Information Science and Technology 39(1): 1-31.
Week 5: Measuring Effectiveness of IR Systems
Croft, B., Metzler, D., and Strohman, T. (in press). Ch. 8: Evaluating Search Engines. In: Search Engines: Information Retrieval in Practice. . [Available at http://www.pearsonhighered.com/croft1epreview/samples.html ]
Harman, D.K., and Voorhees, E.M. (2006). TREC: An Overview. Annual Review of Information Science and Technology 40: 113-155. [Available at http://www3.interscience.wiley.com/journal/109882836/abstract ]
Mizzaro, S. (1997). Relevance: the whole history. Journal of the American Society for Information Science 48(9): 810-832.
Robertson, S. (200-8). On the history of evaluation in IR. Journal of Information Science 34(4): 439-2008. [Available at http://jis.sagepub.com/cgi/content/abstract/34/4/439 ]
Tague-Sutcliffe, J. (1992). The pragmatics of information retrieval experimentation, revisited. Information Processing & Management 28(4): 467-490.
Voorhees, E. (2007). TREC: Continuing information retrieval’s tradition of experimentation. Communications of the ACM 50(11): 51-54. [Available at http://portal.acm.org/citation.cfm?doid=1297797.1297822 ]
Week 6: Improving Effectiveness of IR Systems
Efthimiadis, E. (1996). Query expansion. Annual Review of Information Science and Technology31: 121-187.
Harman, D. (1992). “Relevance feedback and other query modification techniques”. In: Frakes, W.B. and Baeza-Yates, R. (eds.), Information Retrieval: Data Structures & Algorithms. Englewood Cliffs, NJ: Prentice-Hall. Pp. 241-263.
Manning, C.D., Raghavan, P., and Schϋtze (2008). Ch. 9: Relevance feedback and query expansion. In. Introduction to Information Retrieval. Cambridge : Cambridge University Press. Pp. 162-177. [Available at http://nlp.stanford.edu/IR-book/pdf/09expand.pdf ]
Salton, G. & Buckley, C. (1990). Improving retrieval performance by relevance feedback. Journal of the American Society for Information Science 41: 288-297
Week 7: Implementing IR Systems
Zobel, J. and Moffat, A. (2006). Inverted files for text search engines. ACM Computing Surveys 38(2): 1-56.
Week 8: Information Retrieval on the Web
Manning, C.D., Raghavan, P., and Schϋtze (2008). Ch. 19: Web Search Basics. In. Introduction to Information Retrieval. Cambridge : Cambridge University Press. Pp. 385-404. [Available at http://nlp.stanford.edu/IR-book/pdf/19web.pdf ]
Ntoulas, A., Najork, M., Manasse, M. and Fetterly, D. (2006). Detecting spam Web pages through content analysis. WWW 2006. Pp. 83-92. [Available at http://portal.acm.org/citation.cfm?doid=1135777.1135794
Witten, I.H. (2008). Searching…in a Web. Journal of Universal Computer Science 14(10): 1739-1762. [Available at http://www.cs.waikato.ac.nz/~ihw/papers/08-IHW-SearchinginWeb.pdf ]
Week 10: Multimedia Information Retrieval
Daatta, R., Joshi, D., Li, J. and Wang, J.Z. (2008). Image retrieval: ideas, influences and trends of the new age. ACM Computing Surveys 40(2): 5.1-5.60.
Enser, E. (2008). The evolution of visual information retrieval. Journal of Information Science 34: 531-546. [Available at http://jis.sagepub.com/cgi/content/abstract/34/4/531 ]
Smeulders, A.W.M., Worring, M., Santini, S., Gupta, A and Jain, R. (2000). Content-based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(12); 1349-1380. [Available at http://portal.acm.org/citation.cfm?id=357873 ]
Week 11: Related Information Retrieval Problems (Summarization, Classification, Question Answering, Text Mining)
Fan, W., Wallace, L., Rich, S. and Zhang, Z. (2006). Tapping the power of text mining. Communications of the ACM 49(9): 76-82.
Hearst, M. (2003). What is Text Mining? Essay available at http://people.ischool.berkeley.edu/~hearst/text-mining.html
Roussinov, D., Weiguo, F. and Robles-Flores, J. (2008). Beyond keywords: automated question answering on the Web. Communications of the ACM 51(9): 60-65. [Available at http://portal.acm.org/citation.cfm?id=1378743 ]
Sparck Jones, K. (1999). Automatic summarizing: factors and directions. In Advances in Automatic Text Summarization. Boston: MIT Press. Pp. 1-12. [Available at http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.54.9907 ]
Week 12: Users and Information Retrieval
Hearst, M.A. (1999). Chapter 10, “User Interfaces and Visualization). In Modern Information Retrieval (Baeza-Yates, R. & Ribeiro-Neto, B., eds.) New York: ACM. pp. 257-323. [Available at http://people.ischool.berkeley.edu/~hearst/irbook/print/chap10.pdf ]
Shneiderman, B. and Plaisant, C. (2004). Chapter 14: Information Search and Visualization. In: Designing the User Interface. 4 th ed. Boston: Pearson. Pp. 559–607.
Recommended (Basic Textbooks):
- Baeza-Yates, R. & Ribeiro-Neto, B. (1999). Modern Information Retrieval. New York: ACM.
- Chowdhury, G.G. (1999). Introduction to Modern Information Retrieval. London: Library Association.
- Frakes, W.B. and Baeza-Yates, R. (eds.) (1992). Information Retrieval: Data Structures & Algorithms. Englewood Cliffs, NJ: Prentice-Hall.
- Korfhage, R.R. (1997). Information Storage and Retrieval. New York: John Wiley.
- Manning, C.D., Raghavan, P., and Schϋtze (2008). Introduction to Information Retrieval. Cambridge : Cambridge University Press
- Witten, I.H., Moffat, A., and Bell, T.C. (1999). Managing Gigabytes: Compressing and Indexing Documents and Images. 2 nd ed. San Francisco, CA: Morgan Kaufmann.
Course Assignments and Weight in relation to final course mark :
Assignment |
Weight |
A Short Paper discussing the basic issues in automatic analysis of text for information retrieval. |
10% |
Midterm Exam |
30% |
A Short Paper analyzing the issues involved in information retrieval in a nontraditional or non-textual environment. |
10% |
Tutorial or Project (Group optional) Either develop/present a brief tutorial explaining some aspect of IR, or implement an information retrieval software package and evaluate it using a standard text collection and metrics. |
40% |
Participation |
10% |
Course Schedule:
Week |
Date |
Topic |
Readings/ Assignment |
1 |
Jan 6 |
Introduction to Course Information Retrieval Systems and their Design |
See weekly readings above |
2 |
Jan 13 |
Documents and queries Representing document content |
|
3 |
Jan 20 |
Information retrieval models I (to be rescheduled) |
|
4 |
Jan 27 |
Information retrieval models II |
Short paper I |
5 |
Feb 3 |
Measuring effectiveness of IR systems |
|
6 |
Feb 10 |
Improving effectiveness of IR systems |
|
- |
Feb 17 |
READING WEEK |
|
7 |
Feb 24 |
Implementing IR systems |
|
8 |
Mar 3 |
Information retrieval on the WWW |
|
9 |
Mar 10 |
Mid-term examination |
Mid-term examination |
10 |
Mar 17 |
Multimedia information retrieval (to be rescheduled) |
|
11 |
Mar 24 |
Related information retrieval problems |
Short paper II |
12 |
Mar 31 |
Users and information retrieval |
|
13 |
Apr 7 |
Course summary Project/paper presentations |
Final papers/projects |
Attendance : The calendar states: “Regular attendance is expected of students in all their classes (including lectures, laboratories, tutorials, seminars, etc.). Students who neglect their academic work and assignments may be excluded from the final examinations. Students who are unavoidably absent because of illness or disability should report to their instructors on return to classes.”
Regular on-time attendance in class is an important and required part of this course. There is no single textbook for the course, and many of the readings are difficult, so interpretation of the material in class is important. Handouts of PowerPoint presentations will be distributed during class, but you should obtain notes from one another class member if you cannot attend. In particular, material required for the mid-term examination may only be presented in class. Repeated absences or tardiness will result in a lower course mark.
Disability Accommodation : The University accommodates students with disabilities who have registered with the Disability Resource Centre [ http://www.students.ubc.ca/access/drc.cfm ]. You must register with the Disability Resource Centre to be granted special accommodations for any on-going conditions.
Religious Accommodation : The University accommodates students whose religious obligations conflict with attendance, submitting assignments, or completing scheduled tests and examinations. Please let your instructor know in advance, preferably in the first week of class, if you will require any accommodation on these grounds. UBC policy on Religious Holidays can be found at : http://www.universitycounsel.ubc.ca/policies/policy65.pdf [
Other Accommodations : Students who plan to be absent for varsity athletics, family obligations, or other similar commitments, cannot assume they will be accommodated, and should discuss their commitments with the instructor before the course drop date.
Academic Dishonesty : Please review the UBC Calendar Academic regulations for the University policy on cheating, plagiarism, and other forms of academic dishonesty: http://www.students.ubc.ca/calendar/index.cfm?tree=3,54,111,959 . Also visit and review the contents of these two resources: Plagiarism Resource Centre: For Students: http://www.library.ubc.ca/home/plagiarism/welcome.html and Plagiarism Avoided: Taking Responsibility For Your Work: http://www.arts.ubc.ca/Plagiarism_Avoided.373.0.html for useful information on avoiding plagiarism and on correct documentation practice. Students are held responsible for knowing and following all University regulations regarding academic dishonesty. If a student does not know how to properly cite a source or what constitutes proper use of a source it is the student's personal responsibility to obtain the needed information and to apply it within University guidelines and policies. If evidence of academic dishonesty is found in a course assignment, previously submitted work in this course may be reviewed for possible academic dishonesty and grades modified as appropriate. University policy requires that all incidents of academic dishonesty must be forwarded to the Dean’s office for review and possible action.
Evaluation : UBC marking policies are followed. All assignments will be awarded letter grades using the evaluative criteria given on the SLAIS web site . Prior arrangements must be made with the instructor for assignment extensions. Late penalties may be imposed; these will be discussed when extensions are requested.
Written & Spoken English Requirement : Written and spoken work may receive a lower mark if it is, in the opinion of the instructor, deficient in English.
Course Discussion List: You must sign up for the course Internet discussion list:
Send a message to majordomo@interchange.ubc.ca . with a blank subject line and the message “subscribe l-558 end”. Class assignments, clarifications or answer to questions, and general announcements will be made via the course discussion list. If you have questions for me personally send them to edie.rasmussen@ubc.ca . If the answers seem to be of general interest I will send them to the course discussion list, though without any identification as to the source of the query.




