Seminario MAVIR: P. Mika / B. Liu

Peter Mika (Yahoo! Research Barcelona): SearchMonkey, Micro-formats and Yahoo! APIs

Bin Liu (UI at Chicago): Web Content Mining and NLP

lunes 16, marte 17 y miércoles 18 de noviembre 2009 a las 15h00


Lugar de celebración

Salón de Grados
ETSI Industriales, UNED
c/ Juan del Rosal, 12
28040 Madrid

Cómo llegar y planos



SearchMonkey, Micro-formats and Yahoo! APIs

Ponente: Peter Mika (Yahoo! Research Barcelona)

Horario: dos sesiones el lunes 16/11/2009 y el martes 17/11/2009, de 15h00 a 18h00

Resumen: While current search techniques aim at ever more sophisticated for  searching over hypertext, the Semantic Web promises to break new boundaries in search by transforming the content itself into a form that is more easily processable by machines.

In this talk, we will discuss some of the possible technologies for annotating content for machine processing and showcase some of the ways that semantic annotations can improve the search experience for users. This talk is intended for researchers and developers with minimal Semantic Web or Information Retrieval experience, and could also be of nterest to publishers of Web content. We will focus on the web and formats used on the Web, although similar techniques are applicable to search in other contexts such as enterprise search, desktop search or mobile search.

Part 1
In this first part of the talk, we will describe the basic concepts of the Semantic Web and the various ways in which metadata can be published on the Web. We will pay a particular attention to ways of embedding metadata inside HTML pages, including microformats, RDFa, and microdata (HTML5). We illustrate these formats with practical examples and show some of the tools that can be used to facilitate the process of annotation. We also provide some statistics on how and where they are being used and share some of our experiences with data on the Web.

Part 2
In the second part, we discuss the notion of semantic search, and in particular the different roles that semantics can play in various parts if the IR process from document processing to query analysis, ranking and result presentation. We show some of the existing prototypes for semantic search both from the research and the commercial areas. We also touch upon toolkits and APIs that can be used to build custom semantic search systems. We introduce the main conferences and venues for semantic search and current efforts for evaluating semantic search systems. We close with a discussion of active research topics in the field.

Bio: Peter Mika is a researcher and data architect at Yahoo! Research in Barcelona. He received his BS in computer science from Eotvos Lorand University and his MSc and PhD in computer science (cum laude) from Vrije Universiteit Amsterdam. His interdisciplinary work in social networks and the Semantic Web earned him a Best Paper Award at the 2005 International Semantic Web Conference and a First Prize at the 2004 Semantic Web Challenge. He has been cochair of the Semantic Web Challenge since 2007. Mika is the youngest member elected to the editorial board of the Journal of Web Semantics. He is the author of the book 'Social Networks and the Semantic Web' (Springer, 2007).  In 2008 he has been selected as one of "AI's Ten to Watch" by the editorial board of the IEEE Intelligent Systems journal.

Web Content Mining and NLP

Ponente: Bing Liu (University of Illinois at Chicago)

Horario: miércoles 18/11/2009, a las 15h00 a 16h30

Resumen: Web mining aims to develop a new generation of techniques to effectively extract and/or mine useful information or knowledge from the Web. It consists of Web usage mining, Web structure mining, and Web content mining. Web usage mining refers to the discovery of user access patterns from Web usage logs. Web structure mining tries to discover useful knowledge from the structure of hyperlinks. Web content mining aims to extract/mine useful information or knowledge from web page contents. In this talk, I will focus on Web Content Mining. In the past few years, there was a rapid expansion of activities in Web content mining. I will introduce some of the main mining task/problems and state-of-the-art techniques for solving these problems. Topics include data/information extraction, information integration, information synthesis, and opinion mining or sentiment analysis. Throughout the talk, I will pay special attentions to connections between these tasks and NLP, and discuss how NLP researchers can make contributions towards solving these problems. It is now well recognized that the huge data volume and the rich content of the Web present a golden opportunity and a stage for NLP researchers. Web content mining calls for close collaborations between NLP researchers and researchers in other fields, e.g., data mining, machine learning, information retrieval and databases. Such collaborations are likely to produce major scientific breakthroughs and also make significant industrial impacts.

Bio:  Bing Liu is a professor of Computer Science at the University of Illinois at Chicago (UIC). He obtained his PhD in Artificial Intelligence from the University of Edinburgh. Before joining UIC in 2002, he was with the National University of Singapore. He has published extensively in the fields of data mining, Web mining and opinion mining in leading conferences and journals. His research has been focused on classification based on associations, interestingness in data mining, learning from positive and unlabeled examples, Web data/information extraction, and opinion mining and sentiment analysis. He has also written a textbook titled “Web Data Mining: Exploring Hyperlinks, Contents and Usage Data”. On professional services, Liu has served as associate editors of IEEE Transactions on Knowledge and Data Engineering, and SIGKDD Explorations, and is in the editorial boards of several other journals. He also served or serves as program chairs of IEEE International Conference on Data Mining (ICDM-2010), ACM Conference on Web Search and Data Mining (WSDM-2010), ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2008), SIAM Conference on Data Mining (SDM-2007), ACM Conference on Information and Knowledge Management (CIKM-2006), and Pacific Asia Conference on Data Mining (PAKDD-2002), and as area chairs of International World Wide Web Conference (WWW-2005, WWW-2010) in charge of the data mining track. In addition, he has served extensively as program committee members, and senior program committee members of leading conferences in data mining, Web technologies and natural language progressing.


