WSDM2011 Tutorials

Tutorial Co-chairs

  • Qing Li, City University of Hong Kong, Hong Kong

Tutorial Information

  • Date: Wednesday, February 9, 2011
  • Time:
    • Morning Session from 8:30 am to 12:00 pm with a coffee break from 10:00 am to 10:30 am
    • Afternoon Session from 1:30 pm to 5:00 pm with a coffee break from 3:00 pm to 3:30 pm
    • Lunch from 12:00 pm to 1:30 pm
  • Venue: Please see below.
Tang Room II, 3/F Ming Room I, 4/F
Morning Sessions

8:30 am to 12:00 pm
Crowdsourcing 101: Putting the "Wisdom of the Crowd" to Work for You.

Omar Alonso and Matthew Lease
Introduction to Display Advertising.

Andrei Broder, Vanja Josifovski and Jayavel Shanmugasundaram
Afternoon Sessions

1:30 pm to 5:00 pm
Exploiting Statistical and Relational Information on the Web and in Social Media.

Lise Getoor and Lilyana Mihalkova
Web Retrieval: The Role of Users.

Ricardo Baeza-Yates and Yoelle Maarek


  • Title: Crowdsourcing 101: Putting the “Wisdom of the Crowd” to Work for You
  • Presenters:
    • Omar Alonso, Microsoft, USA
    • Matthew Lease, University of Texas at Austin, USA
  • Date: Wednesday, February 9, 2011
  • Time: 8:30 am to 12:00 pm
  • Venue: Tang Room II, 3/F

Crowdsourcing has emerged in recent years as an exciting new avenue for leveraging the tremendous potential and resources of today’s digitally-connected, diverse, distributed workforce. Generally speaking, crowdsourcing describes outsourcing of tasks to a large group of people instead of assigning such tasks to an in-house employee or contractor. Crowdsourcing platforms such as Amazon Mechanical Turk and CrowdFlower have gained particular attention as active online market places for reaching and tapping into this glut of a still largely under-utilized workforce. Crowdsourcing offers intriguing new opportunities for accomplishing different kinds of tasks or achieving broader participation than previously possible, as well as completing standard tasks more accurately in less time and at lower cost. Unlocking the potential of crowdsourcing in practice, however, requires a tri-partite understanding of principles, platforms, and best practices. This tutorial will introduce the opportunities and challenges of crowdsourcing while discussing the three issues above. This will provide attendees with a basic foundation to begin applying crowdsourcing in the context of their own particular tasks.


  • Title: Introduction to Display Advertising
  • Presenters:
    • Andrei Broder, Yahoo! Research, Santa Clara, CA, USA
    • Vanja Josifovski, Yahoo! Research, Santa Clara, CA, USA
    • Jayavel Shanmugasundaram, Yahoo! Research, Santa Clara, CA, USA
  • Date: Wednesday, February 9, 2011
  • Time: 8:30 am to 12:00 pm
  • Venue: Ming Room I, 4/F

Web advertising supports a large swath of the Internet ecosystem. It brings revenue to countless publishers that rent space on their pages for advertising: from small mom-and-pop shops to major search engines. It also providesvaluable traffic to numerous commercial Web sites and has fueled the development of Web search engines. Today, Web advertising is increasingly impacting the world outside the Internet by shaping the attitudes of numerous users. Computational advertising is a new scientific discipline that aims to formalize the problem of finding the best ad for a given user in a given context. In traditional advertising, the number of venues is small, the cost per venue is higher, and little or no personalization is possible (as for example in print magazines). In contrast, in online advertising there are billion of opportunities (page views), hundreds of millions of ads and it is possible to provide personalization with quantifiable results. This brings the advertising into the realm of the other ”computational” sciences. An overview of the current state of computational advertising can be found in Display advertising is one of the two major advertising channels on the web (in addition to search advertising). Display advertising on the Web is usually done by graphical ads placed on the publishers’ Web pages. There is noexplicit user query, and the ad selection is performed based on the page where the ad is placed (contextual targeting) or user’s past activities (behavioral targeting). In both cases, sophisticated text analysis and learning algorithms are needed to provide relevant ads to the user.

Display advertising includes both a brand awareness component, where the aim of the advertiser is to promote awareness of a brand or a product, as well as a direct response component, where the aim of the advertiser is a click or conversion that leads to a visit to the advertiser’s Web site or other downstream economic activity. In addition, advertisers can also choose one of several payment types: CPM (Cost Per Mille — or 1000 — impressions/user visits), CPC (Cost Per Click), or CPA (Cost Per Action/Conversion, which may involve, for instance, filling out a form or an actual purchase). Dealing with multiple objectives and payment types again requires sophisticated learning algorithms to enable conversion and comparison between the payment types.

Finally, in display advertising, advertisers can choose to buy ads on a guaranteed basis many months in advance (these are typically CPM buys). For instance, an advertiser can request 100 million impressions during Superbowl 2011, and the publisher guarantees these visits ahead of time (even though the users have not actually shown up!). In essence, purchasing on a guaranteed basis is like purchasing goods on a futures market. Advertiser can also choose to buy on a non-guaranteed basis (these can be CPM, CPC or CPA buys), and in this case, they only pay for each impression, click or conversion. Many of the mechanisms required to support these forms of buying, such as traffic forecasting, ad selection, and pricing are just starting to attract the attention of the research community, and there is ample opportunity for impactful research in this area.


  • Title: Exploiting Statistical and Relational Information on the Web and in Social Media
  • Presenters:
    • Lise Getoor, University of Maryland College Park, USA
    • Lilyana Mihalkova, University of Maryland College Park, USA
  • Date: Wednesday, February 9, 2011
  • Time: 1:30 pm to 5:00 am
  • Venue: Tang Room II, 3/F

The popularity of Web 2.0, characterized by a proliferation of social media sites, and Web 3.0, with more richly semantically annotated objects and relationships, brings to light a variety of important prediction, ranking, and extraction tasks. The input to these tasks is often best seen as a (noisy) multi-relational graph, such as the graph of the Web itself; the click graph, defined by user interactions with Web sites; and the social graph, defined by friendships and affiliations on social media sites.

This tutorial will provide an overview of statistical relational learning and inference techniques, motivating and illustrating them using web and social media applications. We will start by briefly surveying some of the sources of statistical and relational information on the web and in social media and will then dedicate most of the tutorial time to an introduction to representations and techniques for learning and reasoning with multi-relational information, viewing them through the lens of web and social media domains. We will end with a discussion of current trends and related fields, such as privacy in social networks and probabilistic databases.


  • Title: Web Retrieval: The Role of Users
  • Presenters:
    • Ricardo Baeza-Yates, Yahoo! Research, Spain
    • Yoelle Maarek, Yahoo! Research, Spain
  • Date: Wednesday, February 9, 2011
  • Time: 1:30 pm to 5:00 pm
  • Venue: Ming Room I, 4/F

Web retrieval methods have evolved through three major steps in the last decade or so. They started from standard document-centric IR in the early days of the Web, then made a major step forward by leveraging the structure of the Web, using link analysis techniques in both crawling and ranking challenges. A more recent, no less important but maybe more discrete step forward, has been to enter the user in this equation in two ways:  * Implicitly, through the analysis of usage data captured by query logs, and session and click information in general; the goal here being to improve ranking as well as to measure user’s happiness and engagement.  * Explicitly, by offering novel interactive features; the goal here being to better answer users’ needs. This half day tutorial will cover the user-related challenges associated with the implicit and explicit role of users in Web retrieval. More specifically, we will review and discuss challenges associated with two types of activities, namely:

  • Usage data analysis and metrics - It is critical to monitor how users take advantage and interact with Web retrieval systems, as this implicit relevant feedback aggregated at a large scale, can approximate quite accurately the level of success of a given feature. Here we have to consider not only clicks statistics but also the time spent in a page, the number of actions per session, etc.
  • User interaction - Given the intrinsic problems posed by the Web, the key challenge for the user is to conceive a good query to be submitted to the search system, one that leads to a manageable and relevant answer. The retrieval system must complete search requests fast and give back relevant results, even for poorly formulated queries, as is the common case in the Web. Web retrieval engines thus interact with the user at two key stages:
    • Expressing a query: Human beings have needs or tasks to accomplish, which are frequently not easy to express as “queries”. Queries, even when expressed in a more natural manner, are just a reflection of human needs and are thus, by definition, imperfect. This phenomenon could be compared to Plato’s cave metaphor, where shadows are mistaken for reality.
    • Interpreting results: Even if the user is able to perfectly express a query, the answer might be split over thousands or millions of Web pages or not exist at all. In this context, numerous questions need to be addressed. Examples include: How do we handle a large answer? How do we rank results? How do we select the documents that really are of interest to the user? Even in the case of a single document candidate, the document itself could be large. How do we browse such documents efficiently?

tutorials.txt · Last modified: 2011/01/24 15:54 by admin

2/9 Workshops and tutorials
2/9 Reception
2/10 Opening ceremony
2/10 Keynote I, oral and poster presentations
2/11 Keynote II, oral and poster presentations
2/11 Excursion and banquet
2/12 Oral presentations
2/12 Closing ceremony

Platinum Sponsors

Gold Sponsors

Silver Sponsors

Special Sponsors

Technical Sponsors