Introduction to Data Mining & Advanced Data Mining (COMP 4710 & COMP 7860)
Introduction to Data Mining (COMP 4710) & Advanced Data Mining (COMP 7860)
Data Mining Concepts and Techniques by Jiawei Han; Micheline Kamber; Jian PeiData Mining: Concepts and Techniques provides the concepts and techniques in processing gathered data or information, which will be used in various applications. Specifically, it explains data mining and the tools used in discovering knowledge from the collected data. This book is referred as the knowledge discovery from data (KDD). It focuses on the feasibility, usefulness, effectiveness, and scalability of techniques of large data sets. After describing data mining, this edition explains the methods of knowing, preprocessing, processing, and warehousing data. It then presents information about data warehouses, online analytical processing (OLAP), and data cube technology. Then, the methods involved in mining frequent patterns, associations, and correlations for large data sets are described. The book details the methods for data classification and introduces the concepts and methods for data clustering. The remaining chapters discuss the outlier detection and the trends, applications, and research frontiers in data mining. This book is intended for Computer Science students, application developers, business professionals, and researchers who seek information on data mining. * Presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects. * Addresses advanced topics such as mining object-relational databases, spatial databases, multimedia databases, time-series databases, text databases, the World Wide Web, and applications in several fields. *Provides a comprehensive, practical look at the concepts and techniques you need to get the most out of your data
Publication Date: 2011-06-22
Introduction to Data Mining by Pang-Ning Tan; Michael Steinbach; Vipin Kumar; Anuj KarpatneIntroducing the fundamental concepts and algorithms of data mining Introduction to Data Mining, 2nd Edition , gives a comprehensive overview of the background and general themes of data mining and is designed to be useful to students, instructors, researchers, and professionals. Presented in a clear and accessible way, the book outlines fundamental concepts and algorithms for each topic, thus providing the reader with the necessary background for the application of data mining to real problems. The text helps readers understand the nuances of the subject, and includes important sections on classification, association analysis, and cluster analysis. This edition improves on the first iteration of the book, published over a decade ago, by addressing the significant changes in the industry as a result of advanced technology and data growth.
Call Number: QA76.9.D343 T35 2019
Publication Date: 2018-01-04
Introduction to Data Mining by Vipin Kumar; Michael Steinbach; Pang-Ning TanIntroduction to Data Mining presents fundamental concepts and algorithms for those learning data mining for the first time. Each major topic is organized into two chapters, beginning with basic concepts that provide necessary background for understanding each data mining technique, followed by more advanced concepts and algorithms.
Call Number: QA 76.9 D343 T35 2005
Publication Date: 2005-05-02
Advances in Distributed and Parallel Knowledge Discovery by Hillol Kargupta (Editor); Philip Wing Keung Chan (Editor); Vipin Kumar (Foreword by)foreword by Vipin Kumar Knowledge discovery and data mining (KDD) deals with the problem of extracting interesting associations, classifiers, clusters, and other patterns from data. The emergence of network-based distributed computing environments has introduced an important new dimension to this problem--distributed sources of data. Traditional centralized KDD typically requires central aggregation of distributed data, which may not always be feasible because of limited network bandwidth, security concerns, scalability problems, and other practical issues. Distributed knowledge discovery (DKD) works with the merger of communication and computation by analyzing data in a distributed fashion. This technology is particularly useful for large heterogeneous distributed environments such as the Internet, intranets, mobile computing environments, and sensor-networks.When the data sets are large, scaling up the speed of the KDD process is crucial. Parallel knowledge discovery (PKD) techniques addresses this problem by using high-performance multiprocessor machines. This book presents introductions to DKD and PKD, extensive reviews of the field, and state-of-the-art techniques.Contributors Rakesh Agrawal, Khaled AlSabti, Stuart Bailey, Philip Chan, David Cheung, Vincent Cho, Joydeep Ghosh, Robert Grossman, Yi-ke Guo, John Hale, John Hall, Daryl Hershberger, Ching-Tien Ho, Erik Johnson, Chris Jones, Chandrika Kamath, Hillol Kargupta, Charles Lo, Balinder Malhi, Ron Musick, Vincent Ng, Byung-Hoon Park, Srinivasan Parthasarathy, Andreas Prodromidis, Foster Provost, Jian Pun, Ashok Ramu, Sanjay Ranka, Mahesh Sreenivas, Salvatore Stolfo, Ramesh Subramonian, Janjao Sutiwaraphun, Kagan Tummer, Andrei Turinsky, Beat Wüthrich, Mohammed Zaki, Joshua Zhang.
Call Number: QA 76.9 D5 A345 2000
Publication Date: 2000-08-28
Big Data by Kuan-Ching Li (Editor); Hai Jiang (Editor); Laurence T. Yang (Editor); Alfredo Cuzzocrea (Editor)As today's organizations are capturing exponentially larger amounts of data than ever, now is the time for organizations to rethink how they digest that data. Through advanced algorithms and analytics techniques, organizations can harness this data, discover hidden patterns, and use the newly acquired knowledge to achieve competitive advantages. Presenting the contributions of leading experts in their respective fields, Big Data: Algorithms, Analytics, and Applicationsbridges the gap between the vastness of Big Data and the appropriate computational methods for scientific and social discovery. It covers fundamental issues about Big Data, including efficient algorithmic methods to process data, better analytical strategies to digest data, and representative applications in diverse fields, such as medicine, science, and engineering. The book is organized into five main sections: Big Data Management--considers the research issues related to the management of Big Data, including indexing and scalabilityaspects Big Data Processing--addresses the problem of processing Big Data across a wide range of resource-intensive computational settings Big Data Stream Techniques and Algorithms--explores research issues regarding the management and mining of Big Data in streaming environments Big Data Privacy--focuses on models, techniques, and algorithms for preserving Big Data privacy Big Data Applications--illustrates practical applications of Big Data across several domains, including finance, multimedia tools, biometrics, and satellite Big Data processing Overall, the book reports on state-of-the-art studies and achievements in algorithms, analytics, and applications of Big Data. It provides readers with the basis for further efforts in this challenging scientific field that will play a leading role in next-generation database, data warehousing, data mining, and cloud computing research. It also explores related applications in diverse sectors, covering technologies for media/data communication, elastic media/data storage, cross-network media/data fusion, and SaaS.
Publication Date: 2015-02-23
Data Mining by Mehmed KantardzicThis book reviews state-of-the-art methodologies and techniques for analyzing enormous quantities of raw data in high-dimensional data spaces, to extract new information for decision making. The goal of this book is to provide a single introductory source, organized in a systematic way, in which we could direct the readers in analysis of large data sets, through the explanation of basic concepts, models and methodologies developed in recent decades. If you are an instructor or professor and would like to obtain instructor's materials, please visit http://booksupport.wiley.com If you are an instructor or professor and would like to obtain a solutions manual, please send an email to: firstname.lastname@example.org
Call Number: QA 76.9 D343 K36 2011
Publication Date: 2011-08-16
Data Mining by Margaret H. DunhamThorough in its coverage from basic to advanced topics, this book presents the key algorithms and techniques used in data mining. An emphasis is placed on the use of data mining concepts in real world applications with large database components. Includes unique chapters on Web mining, spatial mining, temporal mining, and prototypes and DM products. Separate case studies section highlights real world applications. An excellent reference book for computer database professionals and researchers.
Call Number: QA 76.9 D343 D86 2003
Publication Date: 2002-08-22
Data Mining: Next Generation Challenges and Future Directions by Hillol Kargupta (Editor); Anupam Joshi (Editor); Krishnamoorthy Sivakumar (Editor); Yelena Yesha (Editor)Data mining, or knowledge discovery, has become an indispensable technology for businesses and researchers in many fields. Drawing on work in such areas as statistics, machine learning, pattern recognition, databases, and high performance computing, data mining extracts useful information from the large data sets now available to industry and science. This collection surveys the most recent advances in the field and charts directions for future research.The first part looks at pervasive, distributed, and stream data mining, discussing topics that include distributed data mining algorithms for new application areas, several aspects of next-generation data mining systems and applications, and detection of recurrent patterns in digital media. The second part considers data mining, counter-terrorism, and privacy concerns, examining such topics as biosurveillance, marshalling evidence through data mining, and link discovery. The third part looks at scientific data mining; topics include mining temporally-varying phenomena, data sets using graphs, and spatial data mining. The last part considers web, semantics, and data mining, examining advances in text mining algorithms and software, semantic webs, and other subjects.
Call Number: QA 76.9 D343 D3835 2004
Publication Date: 2004-11-19
Data Mining, 4th ed. by Ian H. Witten; Eibe Frank; Mark A. Hall; Christopher J. PalestroData Mining: Practical Machine Learning Tools and Techniques, Fourth Edition, offers a thorough grounding in machine learning concepts, along with practical advice on applying these tools and techniques in real-world data mining situations. This highly anticipated fourth edition of the most acclaimed work on data mining and machine learning teaches readers everything they need to know to get going, from preparing inputs, interpreting outputs, evaluating results, to the algorithmic methods at the heart of successful data mining approaches. Extensive updates reflect the technical changes and modernizations that have taken place in the field since the last edition, including substantial new chapters on probabilistic methods and on deep learning. Accompanying the book is a new version of the popular WEKA machine learning software from the University of Waikato. Authors Witten, Frank, Hall, and Pal include today's techniques coupled with the methods at the leading edge of contemporary research. Please visit the book companion website at http://www.cs.waikato.ac.nz/ml/weka/book.html It contains Powerpoint slides for Chapters 1-12. This is a very comprehensive teaching resource, with many PPT slides covering each chapter of the book Online Appendix on the Weka workbench; again a very comprehensive learning aid for the open source software that goes with the book Table of contents, highlighting the many new sections in the 4th edition, along with reviews of the 1st edition, errata, etc. Provides a thorough grounding in machine learning concepts, as well as practical advice on applying the tools and techniques to data mining projects Presents concrete tips and techniques for performance improvement that work by transforming the input or output in machine learning methods Includes a downloadable WEKA software toolkit, a comprehensive collection of machine learning algorithms for data mining tasks-in an easy-to-use interactive interface Includes open-access online courses that introduce practical applications of the material in the book
Call Number: QA 76.9 D343 W58 2011
Publication Date: 2016-10-01
Knowledge Discovery in Databases by Gregory Piatetsky-Shapiro (Editor); William Frawley (Editor)Knowledge Discovery in Databases brings together current research on the exciting problem of discovering useful and interesting knowledge in databases. It spans many different approaches to discovery, including inductive learning, bayesian statistics, semantic query optimization, knowledge acquisition for expert systems, information theory, and fuzzy 1 sets.The rapid growth in the number and size of databases creates a need for tools and techniques for intelligent data understanding. Relationships and patterns in data may enable a manufacturer to discover the cause of a persistent disk failure or the reason for consumer complaints. But today's databases hide their secrets beneath a cover of overwhelming detail. The task of uncovering these secrets is called "discovery in databases." This loosely defined subfield of machine learning is concerned with discovery from large amounts of possible uncertain data. Its techniques range from statistics to the use of domain knowledge to control search.Following an overview of knowledge discovery in databases, thirty technical chapters are grouped in seven parts which cover discovery of quantitative laws, discovery of qualitative laws, using knowledge in discovery, data summarization, domain?specific discovery methods, integrated and multi-paradigm systems, and methodology and application issues. An important thread running through the collection is reliance on domain knowledge, starting with general methods and progressing to specialized methods where domain knowledge is built in.Gregory Piatetski-Shapiro is Senior Member of Technical Staff and Principal Investigator of the Knowledge Discovery Project at GTE Laboratories. William Frawley is Principal Member of Technical Staff at GTE and Principal Investigator of the Learning in Expert Domains Project.
Call Number: QA 76.9 D3 K56 1991
Publication Date: 1991-12-30
Principles of Data Mining by David J. Hand; Heikki Mannila; Padhraic SmythThe growing interest in data mining is motivated by a common problem across disciplines: how does one store, access, model, and ultimately describe and understand very large data sets? Historically, different aspects of data mining have been addressed independently by different disciplines. This is the first truly interdisciplinary text on data mining, blending the contributions of information science, computer science, and statistics. The book consists of three sections. The first, foundations, provides a tutorial overview of the principles underlying data mining algorithms and their application. The presentation emphasizes intuition rather than rigor. The second section, data mining algorithms, shows how algorithms are constructed to solve specific problems in a principled manner. The algorithms covered include trees and rules for classification and regression, association rules, belief networks, classical statistical models, nonlinear models such as neural networks, and local "memory-based" models. The third section shows how all of the preceding analysis fits together when applied to real-world data mining problems. Topics include the role of metadata, how to handle missing data, and data preprocessing.
Call Number: QA 76.9 D343 H38 2001
Publication Date: 2001-08-17
The Top Ten Algorithms in Data Mining by Xindong Wu (Editor); Vipin Kumar (Editor)Identifying some of the most influential algorithms that are widely used in the data mining community, The Top Ten Algorithms in Data Mining provides a description of each algorithm, discusses its impact, and reviews current and future research. Thoroughly evaluated by independent reviewers, each chapter focuses on a particular algorithm and is written by either the original authors of the algorithm or world-class researchers who have extensively studied the respective algorithm. The book concentrates on the following important algorithms: C4.5, k-Means, SVM, Apriori, EM, PageRank, AdaBoost, kNN, Naive Bayes, and CART. Examples illustrate how each algorithm works and highlight its overall performance in a real-world application. The text covers key topics--including classification, clustering, statistical learning, association analysis, and link mining--in data mining research and development as well as in data mining, machine learning, and artificial intelligence courses. By naming the leading algorithms in this field, this book encourages the use of data mining techniques in a broader realm of real-world applications. It should inspire more data mining researchers to further explore the impact and novel research issues of these algorithms.
Call Number: QA 76.9 D343 T66 2009
Publication Date: 2009-04-09
Data Quality and High-Dimensional Data Analytics by Al; Chee-Yong ChanPoor data quality is known to compromise the credibility and efficiency of commercial and public endeavours. Also, the importance of managing data quality has increased manifold as the diversity of sources, formats and volume of data grows. This volume targets the data quality in the light of collaborative information systems where data creation and ownership is increasingly difficult to establish.
Call Number: QA 76.9 E95 I57 2008
Publication Date: 2009-02-19
Encyclopedia of Data Warehousing and Mining by John WangThis encyclopedia offers thorough exposure to the issues of importance in the changing field of data warehousing and mining. It informs decision makers, problem solvers, and data mining specialists in business and other settings with over 300 entries on theories, methodologies, functionalities, and applications.
Call Number: QA 76.9 D37 E52 2009
Publication Date: 2008-08-31
Encyclopedia of Database Systems by Ling Liu (Editor); M. Tamer Özsu (Editor)This revised and expanded edition of Encyclopedia of Database Systems provides easy access to crucial concepts relevant to all aspects of very large databases, data management, and database systems, including areas of current interest and research results of historical significance. This comprehensive reference is organized alphabetically and each entry presents basic terminology, concepts, methods and algorithms, key results to date, references to the literature, and cross-references to other entries. Topics for the encyclopedia--including areas of current interest as well as research results of historical significance--were selected by a distinguished international advisory board and written by world-class experts in the field. New entries that reflect recent developments and technological advances in very large databases include: big data, big data technology, cloud computing, cloud data centers, business analytics, social networks, ranking, trust management, query over encrypted data, and more. Entirely new entries include database systems, relational database systems, databases, multimedia databases, bioinformatics, workflow systems, and web data management. Encyclopedia of Database Systems, 2nd edition, is designed to meet the needs of researchers, professors, graduate and undergraduate students in computer science and engineering. Industry professionals, from database specialists to software developers, will also benefit from this valuable reference work.
Publication Date: 2018-11-10
Frequent Pattern Mining by Charu C. Aggarwal (Editor); Jiawei Han (Editor)This comprehensive reference consists of 18 chapters from prominent researchers in the field. Each chapter is self-contained, and synthesizes one aspect of frequent pattern mining. An emphasis is placed on simplifying the content, so that students and practitioners can benefit from the book. Each chapter contains a survey describing key research on the topic, a case study and future directions. Key topics include: Pattern Growth Methods, Frequent Pattern Mining in Data Streams, Mining Graph Patterns, Big Data Frequent Pattern Mining, Algorithms for Data Clustering and more. Advanced-level students in computer science, researchers and practitioners from industry will find this book an invaluable reference.
Visual Analytics and Interactive Technologies by Qingyu Zhang; Richard Segall; Mei CaoLarge volumes of data and complex problems inspire research in computing and data, text, and Web mining. However, analyzing data is not sufficient, as it has to be presented visually with analytical capabilities. Visual Analytics and Interactive Technologies: Data, d104 and Web Mining Applications is a comprehensive reference on concepts, algorithms, theories, applications, software, and visualization of data mining, text mining, Web mining and computing/supercomputing. This publication provides a coherent set of related works on the state-of-the-art of the theory and applications of mining, making it a useful resource for researchers, practitioners, professionals and intellectuals in technical and non-technical fields.