SYLLABUS MIS 6093-Independent Study in MIS-Data Mining

Instructor: Dr. Richard S. Segall

Office: 216 Business Building

Phones: 972-3416 Extension 159 (Office) and

               931-9642 (Home: Answering Machine available)

Class Times: MIS 4093: 2:00PM-4:50PM  Thursdays  (BU 308)

Other Classes: 10:00AM-10:50PM MWF & 2:00PM-3:15PM MW

                          & 6:30PM-9:20PM Thursdays (second half of semester).

Office Hours: 9:30AM-10:00AM & 11:00AM-11:30AM MWF and 3:15PM-5:30PM  MWF  1:00PM-2:00PM MWR

                 6:00PM-6:30PM Thursdays (second half of semester), and other times by appointment.

E-mail Addresses: RSEGALL@MAIL.ASTATE.EDU (Office)  and  

                                RSSEGALL@AOL.COM  (Home)

 

Course Web Site Address: http://business.astate.edu/econdsci/index.html & click on “Faculty” and “Richard Segall

Text Book Web Site Address: http://www-faculty.cs.uiuc.edu/~hanj/DM_Book.html

http://www-courses.cs.uiuc.edu/~cs497jh/papers/supplementarylist.htm (Cases for Han & Kamber)

Texts:  1.) Han & Kamber (H&K): Data Mining: Concepts and Trends, Morgan Kaufman Publishers (MKP)

                 ISBN 1-55860-489-8, Copyright 2001.

             2.) Westphal and Blaxton (W&B) Data Mining Solutions: Methods and Tools for Solving Real World

                  Problems, John Wiley & Sons, Inc., ISBN   0-471-25384-7, Copyright 1998.

             3.) Additional handouts as distributed to class and/or made available on course web site.

Prerequisites: MIS 2403 or equivalent or knowledge of database

Proposed Catalogue Description of Course: Concepts in data mining techniques with emphasis on applying the concepts to obtain solutions for different business situations.

 

Overview of Course: One of the purposes of this course is to help you become more aware of the concepts, trends and tool available for data mining in the current Information Systems world. The course will discuss the fundamentals of data mining concepts and techniques. This course has also primary objectives of exposing students to some of the software available for data mining solutions. Through class lectures, homework and student team presentations of current relevant literature either distributed in class or available on the World Wide Web, and a semester team project the students will become familiar with this new area of information systems.

Outline and Course Schedule:

Readings to be assigned from text by Han and Kamber (H&K) are to be selected from the following:

Chapter 1: Introduction

1.1 What motivated data mining? Why is it important?

1.2 So, what is data mining?

1.3 Data mining-on what kind of data?

1.4 Data mining functionalities-what kinds of patterns can be mined?

1.5 Are all of the patterns interesting?

1.6 Classification of data mining systems

1.7 Major issues in data mining

Chapter 2: Data Warehouse and OLAP Technology for Data Mining

2.1 What is a data warehouse?

2.2 A multidimensional data model

2.3 Data warehouse architecture 

2.4 Data warehouse implementation

2.5 Further development of data cube technology

2.6 From data warehousing to data mining

Chapter 3: Data Preparation

3.1 Why preprocess the data?

3.2 Data cleaning 

3.3 Data integration and transformation 

3.4 Data reduction

3.5 Discretization and concept hierarchy generation 

 

 

Chapter 4: Data Mining Primitives, Languages, and System Architectures

4.1 Data mining primitives: what defines a data mining task? 

4.2 A data mining query language

4.3 Designing graphical user interfaces based on a data mining query language

4.4 Architecture of data mining systems

Chapter 5: Concept Description: Characterization and Comparison

5.1 What is concept description?

5.2 Data generalization and summarization-based characterization

5.2.1 Attribute-oriented induction

5.3 Analytical characterization: analysis of attribute relevance

5.4 Mining class comparisons: discriminating between different classes

5.5 Mining descriptive statistical measures in large databases

5.6 Discussion

Chapter 6: Mining Association Rules in Large Databases

6.1 Association rule mining

6.2 Mining single-dimensional Boolean association rules from transactional databases

6.3 Mining multilevel association rules from transaction databases 

6.4 Mining multidimensional association rules from relational databases and data warehouses

6.5 From association mining to correlation analysis

6.6 Constraint-based association mining

Chapter 7: Classification and Prediction

7.1 What is classification? What is prediction?

7.2 Issues regarding classification and prediction

7.3 Classification by decision tree induction 

7.4 Bayesian classification 

7.5 Classification by back-propagation

7.6 Classification based on concepts from association rule mining

7.7 Other classification methods

7.8 Prediction

7.9 Classifier accuracy

Chapter 8: Cluster Analysis

8.1 What is cluster analysis?

8.2 Types of data in clustering analysis

8.3 A categorization of major clustering methods

8.4 Partitioning methods

8.5 Hierarchical methods

              8.6 Density-based methods 

8.7 Grid-based methods

8.8 Model-based clustering methods 

8.9 Outlier analysis

Chapter 9: Mining Complex Types of Data

9.1 Multidimensional analysis and descriptive mining of complex data objects

9.2 Mining spatial databases 

9.3 Mining multimedia databases

9.4 Mining time-series and sequence data

9.5 Mining text databases 

9.6 Mining the World-Wide Web 

9.6.4 Web usage mining

Chapter 10: Data Mining Applications and Trends in Data Mining

10.l Data mining applications

10.2 Data mining system products and research prototypes

10.3 Additional themes on data mining

10.4 Social impacts of data mining 

10.5 Trends in data mining

Appendix A An Introduction to Microsoft's OLE DB for Data Mining

Appendix B An Introduction to DBMiner

 

 

 

 

Readings to be assigned from text by Westphal & Blaxton (W&B) to be selected from the following:

Section I: Defining the Data Mining Approach

Chapter 1: What is Data Mining?

Chapter 2: Understanding Data Mining

Chapter 3: Defining the Problems to be Solved

Section II: Data Preparation and Analysis

Chapter 4: Accessing and Preparing the Data

Chapter 5: Visual Methods for Analyzing Data

Chapter 6: Non-Visual Analytical Methods

Section III: Assessing Data Mining Tools and Technologies

Chapter 7: Link Analysis Tools

Chapter 8: Landscape Visualization Tools

Chapter 9: Quantitative Data Mining Tools

Chapter 10: Future Trends in  Data Mining

Section IV: Case Studies (Student Presentation are also to be selected from these Chapters as well as Readings List cited earlier.)

Chapter 11: Mapping the Human Genome

Chapter 12: Telecommunication Services

Chapter 13: Banking and Finance

Chapter 14: Retail Data Mining

Chapter 15: Financial Market Data Mining

Chapter 16: Money Laundering and Other Financial Crimes

Since this is a Graduate course per AACSB Accreditation standards Graduate students are to be

evaluated on different higher requirement than undergraduate students enrolled for same Course

Hence Graduate Students will in addition to the above will be required to write a ten (10) page

paper* (excluding additional pages as needed for references, tables and figures) on an application(s) of Data Mining of their selection. The grade for this paper will be a letter grade that will be included into the Written Homework assignments scores out of a possible 200 points.

 

* Please refer to graduate school catalogue for policies regarding plagiarism. All citations are expected to be cited in this paper.

 


Grading Policy: The course grade will be determined by the following:

Three (3) Exams (In-class & Take-Home parts) and any quizzes =                   45%                                     

Written Homework Assignments & ten (10) page Paper            =                   22% 

Team Project and Class Presentations    =                                              18%

Final Exam (In-class & Take-Home part) =                                             15%

 

Course Policies:

1.)   Portions of the exams and/or final exam may be composed of take-home portions because the nature of the course requires use of computers, and thus cannot be completed in-class. It is expected that you DO YOUR OWN WORK on these, as well as other assigned homeworks. Copying homework or submitting identical computer files or printouts from other student(s) is not allowed!!!

 

     Similarly submitting homework or take-home exam answers that are entirely copied from the text verbatim is

        considered as plagiarism. Submitting verbatim text of any webpages without citations or from other software

        modules on take-home exams or homeworks  or semester team project is also considered as plagiarism. Victims of collusion on assigned homeworks and take-home exam portions and plagiarism as described above will be penalized!!! You are responsible for knowing contents of “ASU Student Handbook” and its stated “Academic Integrity Policies.”

 

2.)   Although no explicit policy or statement regarding class attendance is written in the ASU 2002-2003 Graduate Bulletin, it is expected that graduate students attend all classes and abide to policies similar to that stated on page 41 of the ASU 2001-2002 Undergraduate Bulletin:

       Students should attend each lecture, recitation, and laboratory session of every course on which they are enrolled. Students who miss a class session should expect to make up missed work or receive a failing grade on missed work. Make-up policy is at the discretion of the instructor.”

     “Students enrolled in junior and senior level courses (numbered 3000 or 4000) will not be assigned a grade of F solely for failing to attend classes. However, instructors shall set forth at the beginning of the semester their expectations with regard to make-up policy for work missed, class participation, and other factors that may influence course grades.”

 

3.)   Exceptions to this rule as stated above will be made for excusable absences as documented in writing such as medical excuses, death in family, etc. In summary, you are responsible for everything discussed or presented in class regardless of you were in attendance or not. Attendance is essential especially when Student Presentations are to be made, as questions need to be directed to presenters.

     Those with perfect attendance will be given the benefit of the doubt on borderline grade cases at the

     end of the term.(example: between 89.99 and 90.00 as shown in item (13.) below.)

 

4.)   If you come to class late and find the classroom empty, do not assume that class is canceled. Assume that the class is meeting in one of the other computer rooms for hands-on instruction or some other classroom. You are expected to check the computer room(s) to locate the class and announcements on blackboard or classroom doors indicating location of the class.

 

Hence similar to university policy stated on page 41 of the of the ASU 2001-2002 Undergraduate Bulletin, class attendance will be taken daily. Those with perfect attendance will be given the benefit of the doubt on borderline grade cases at the end of the term. (example: between 89.99 and 90.00 as shown in item (13.) below.)

 

5.)   Class Attendance will be taken regularly with sign-up sheets to be passed around the room. It is your responsibility to see that the daily attendance sheets are signed! Class attendance sheets may be passed around the class twice during the class period, i.e. once at start of class  and once later in class.  In both of these  cases, signing the attendance sheet only once would NOT constitute full attendance for that class!!!

 

6.)  Point deductions for the overall class average earned for the term will be made according to the

        following scale: of one-half point deduction for each unexcused class absence beyond first:

             Perfect Attendance :                                                           0 Points

             One Absence                                                                    1.0 Points

             Two Absences:                                                                 2.0 Points

             Three Absences:                                                              3.0  Points

             Four Absences:                                                                4.0  Points

             Five Absences:                                                                5.0  Points

             Six Absences:                                                                 6.0  Points

             Etc. ……………………………………………….

Example: 90.53 overall course average with 2 absences yields adjusted overall course average of 89.03 and B course grade.

7.) All assignments submitted for this course are expected to be word-processed. All assignments for

       this course having diagrams are expected to be computer generated (i.e. NOT hand drawn!).

       Assignments not following this requirement will either be not accepted or receive severe point

       deductions.  As a general rule, assignments not word-processed will not be accepted for grading!!!

       Software available in computer labs include:  MS Word with its toolbar symbols, and for

       presentations: MS PowerPoint, WinEdge, Visio Professional (pre-made components useful in Diagrams),

       SmartDraw (for self-made diagrams) downloadable from www.smartdraw,com , etc.

 

8.) All homework, take-home portions of exams, and take-home-final portion are expected to be handed 

   in on time!!! Late homework will not be accepted unless there is a valid excuse e.g. illness. If is known  

      in advance that you will be unable to attend class when homework or take-home exam is due, you are

      advised to submit in advance, or send by e-mail making copies for self of any computer files, or FAX 

      hard copies to the Department of Economics & Decision Sciences at (870)910-8187 with Dr. Segall

      name as addressee in order that it will arrive by due date.

 

      The instructor however will not be responsible for lost submission not handed in with the class, e.g. e-mail sent

      by error without attachment files or sent to incorrect e-mail address, or fax transmission doesn’t get

      delivered to instructor, or placed in office door basket or under door that was due in class time, etc. You are

      advised to make back-up files of all homework and take-home exams submitted.

 

9.) Every semester, there are a few students who attempt to hand in the entire semester’s worth of missing  

        homeworks collectively at the end of the semester to be graded for credit. THIS IS TOTALLY

     UNACCEPTABLE!!! Also due in part to the immense amount of homework papers that need to be  

        graded weekly for all of Dr. Segall’s classes this is unfeasible to even be done.

 

10) Make-up exams or quizzes will only be made for valid reason (e.g. illness, death in family) and need to be made-up

      as soon as possible. For example, missed exam #1 with documented reason does not constitute valid reason for

      make-up exam #1 at end of semester when answers have already been discussed in class. No make-up would be

      allowed for such a described situation.

 

11.) Formula  for Grade Determination for Homework Contribution to Grade:

      = ((Total homework points earned)/(Total homework points assigned)) X 22

 

12.) Formula for "Approximate"*  Grade Determination for each Exam** contribution to Grade:

     = ((Total exam points earned)/(Total exam points possible)) X 15**

*approximate because any quizzes points are also added to "Total exam points possible."

**Both in-class and take-home parts

 

13.) A straight curve is intended to be used for course grade determination:

      90.00-100       A,

      80.00-89.99     B,

      70.00-79.99     C,

      60.00-69.99     D,

      0-59.99          F.

Since the quizzes and exams are primarily* intended to be open-book and open notes, and take-home exams given with ample time, the above described straight curve is intended to be strictly adhered to!!! There may however be some closed book parts on matching of terms and their corresponding definitions.

 

14.) “Double-rounding” will NOT be employed in computing averages of components of final course

         grade. That is,  rounding of overall Homework average to be added to rounding of overall average of

         Exams will NOT be used in computing final grade for the course!!! As indicated above, it is unlikely

         that a grading curve will be employed at the end of the term, but would be used only if deemed

         appropriate. In this unlikely event, no curve will be employed for any individual exam because of the  

         composite set of exam scores for the entire term must be considered along with the other components.

         If a grading curve is used during the term, it will be constructed in such a way that “clusters” of

         scores will be grouped together to receive the same letter grade.

         PLEASE NOTE THAT NO GRADING CURVE WAS EVER EMPLOYED WHEN ANY

         COURSE WAS TAUGHT BY DR. SEGALL IN PREVIOUS SEMESTERS AT ASU!!! 

 

15.) It is impossible to discuss in class the contents of all of the assigned readings for the course. However      

       you will be responsible for the contents of all of the assigned readings for examinations and assignments.

 

16.) You are responsible for everything posted on the course web page and periodically checking the    

    course web site for updated and recent postings at the hotlink of MIS 4093/MIS 6093 at web site  

    address of:. http://business.astate.edu/econdsci/index.html (Click on “Faculty” & “Segall”). You are

      responsible for printing out and bringing to class copies of the PowerPoint slides used in lecture for your taking

      of class notes. When clicking on the MIS 4093/MIS 6093  hotlink additional hotlinks will be available to click

      on such as by  individual chapter  for viewing and printing  these lecture materials.

 

17.) The Final Exam most likely cannot be comprehensive in nature due to immense amount of readings  

         and topics covered in this course. Most likely the Final Exam would emphasize the material from the

         later part of the course. If points remain in writing Final Exam so that no additional questions can be

         written to complete total points required, a few review questions may be included. The exact contents

         of the Final Exam will be announced in class prior to the Final Exam.

 

18.) Per University Regulations, no grades will be given over the phone or by e-mail.

 

19.) Use of cell phones during lecture time and exams will not be tolerated. Please be

      considerate of your classmates and the instructor.

 

20.) Those students with disabilities are responsible informing Dr. Segall of written

      documentation of their special needs for exams.

 

21.) The instructor reserves the right to make any necessary changes to the course syllabus as

     stated, and would announce any necessary changes in class at the appropriate time(s).

------------------------------------------------------------------------------------------------------------------------------------------------------------

 

 

 

 

 

 

RULES FOR CLASS TEAM PRESENTATIONS:

1.      Student Class Team Presentations will be based upon current events in Data Mining available on web sites provided by Dr. Segall, or Case Studies for the Han and Kambler text provided by the publisher at text web site of

    http://www-courses.cs.uiuc.edu/~cs497jh/papers/supplementarylist.htm, or Case Studies contained

    within Chapters 11-16 of Westphal & Blaxton, or handouts provided by Dr. Segall. One of the

    purposes of these Student Team presentations is to provide more in-depth insight into the

    applications of Data Mining as Case Studies, insight into the state-of-the art of Data Mining as

    provided by current web postings, and to relate these to the respective chapter(s) in our Texts we

    are discussing in lecture.

 

2.      Students are expected to do two (2) team presentations for the semester. If time permits which depends on class size, extra credit will be awarded for any extra presentations as points to be added to the total homework points earned.

 

3.      Teams of presenters are to be formed by students selecting their teammates as a first priority. Sign-up notices will be written on Class Attendance Sheets. Dr. Segall will then assign those students as needed to form complete teams.

 

4.      Sign-up for each student’s second presentation can not be made until a complete cycle has been made of the class of the complete set of first presentations.

 

5.      Each team member is to speak for 5 minutes each. Hence total time for each team will be equal to:

     (number of presenters) times (5 minutes per presenter) + 5 minutes for Questions and Answers from

     audience.

 

6.      The contents of each presentation should consist of:

(a.)              a summary of the highlights of the Case Study or material to be presented.

(b.)              Indication of how the Case Study or article relates to the Chapter(s) in Han & Kamber being discussed in lecture. Some of the case studies will be from chapters 11 to 16 of text Westphal & Blaxton.

(c.)               Potential Take-Home Exam questions as discussed below. The set of questions for the entire team should preferably be presented at the end of the presentation. However for some of the in-depth case studies it may be more appropriate to present each presenter’s question at the end of his or her’s portion of the team’s presentation. The potential exam questions should either be numbered consecutively or labeled as author’s questions, e.g. “Laurie’s question”.

 

7.      Each presenter must present one (1) essay or problem (e.g. table or figure) question per presenter based on the material of their class presentation each of which would be a potential question on the Take-Home part of an Exam. That is, the potential Take-Home exam questions should be focused on the contents of the presentation. The potential exam question can also relate to material in Han & Kamber or Westphal & Blaxton, but still must pertain to the contents of their presentation. The potential exam questions can be essay, create a table, etc. but NOT short answer (e.g. NOT T/F or MC or fill-in blank). That is, short answers of five (5) or six (6) words would be totally unacceptable as potential Take-Home Exam questions.

 

8.      Only if any class presentations are based on web sites no longer available on web or current literature not available on the web, will Dr. Segall then distributed copies of the appropriate materials to each of the team members prior to each team's class presentation.

 

9.      Presentations are expected be made in Microsoft PowerPoint. Each team is required to give Dr. Segall a diskette of the team presentation during the class of presentation for posting on the course web page. This diskette is NOT to have the answers to the proposed student Exam questions, even though the student presentation given in class may include slides with acceptable answers.

 

10.   Each presenter will be given a “Team Presentation Peer Evaluation Form” to complete on the day of his or her Team Presentation. In this form, each presenter will evaluate and grade the performance and contribution of the OTHER team members of their team, i.e. they will be grading everyone else but themselves. These forms are to be completed and handed in to Dr. Segall by the end of the class on which the team presentation was given, and will be held in strict confidence. Only letter grades of A, B, C, D, and F will be used on this form.

 

11.   Presentation Grades will not be made available until the end of the semester on distributed 

     summary sheets because your presentation grade will not only be made relative to the

     quality of your contribution to your team’s presentation but also relative to the quality of

     the presentations made by the other teams.

 

12.   Only letter grades will be assigned to presentations, where:

     A+=100, A=95, A-=90, B=85, B-=80, C=75, C-=70, D=65, D-=60, F+=50, F=25, F-=0.

     That is, presentation grades such as 83, 91, etc. would not be possible.

 

 



 SOURCES FOR SEMESTER TEAM PROJECT

 

CD-ROM of Westphal and Blaxton text has the following software modules available for download.  See Appendix “What’s on CD-ROM” on pages 597-602 for descriptions of each of these software modules:

 

  1.       Analyst’s Notebook
  2.       Cross Graphs
  3.        Daisy
  4.        DataVista