HST 952 Computing for Biomedical Scientists Fall 2003

Course Information


Course announcements 
Course calendar 
Level and Units: 
Grad H; 3 (lecture)-0 (lab)-9 (prep) 
Time and Place: 
Tuesdays/Thursdays 11 a.m. - 12:30 p.m. 
Location: MEC Rm 334 Tosteson Medical Education Center
Harvard Medical School (260 Longwood Avenue)
Exception: MEC, Room 333 on Sept. 4 and Sept. 25  
Instructors: 
Omolola Ogunyemi, Aziz Boxwala, Qing Zeng 
For general correspondence and homework submission
send e-mail to ta952@dsg.harvard.edu
cc: rclacson@mit.edu
Office Hours: 
Tuesdays 10-11 a.m. as requested 
Course mailing list: 
hst952@yahoogroups.com 
Textbooks (required): 
Java: An Introduction to Computer Science and Programming, 3rd edition 
Author: Walter Savitch 
Prentice Hall, 2003 

Modern Database Management, 6th edition 
Authors: Fred R. McFadden, Jeffrey Hoffer, Mary B. Prescott 
Addison-Wesley, 2001

Supplementary texts/readings: 
Foundations of Computer Science 
Authors: Alfred Aho, Jeffrey Ullman 
W. H. Freeman and co., 1995 

Handbook of Medical Informatics 
Authors: Jan H. van Bemmel, Mark A. Musen 
Springer-Verlag, 1997 

Principles of Database and Knowledge-Base Systems, 
Volume 1 
Author: Jeffrey D. Ullman 
W H Freeman & Co., 1988 

Relevant papers or readings selected by instructors from: 

Knowledge Representation: Logical, Philosophical, and Computational Foundations 
Author: John F. Sowa 
Brooks/Cole Pub Co., 1999 

Note: The textbooks/readings listed will be on reserve at Harvard's Countway Library of Medicine. Some of the texts can also be found at MIT's Barker Engineering Library, or at Harvard's Gordon McKay Applied Sciences Library


Course description

This course introduces abstraction as an important mechanism for problem decomposition and solution formulation in the biomedical domain, and examines computer representation, storage, retrieval, and manipulation of biomedical data. As part of the course, we will briefly examine the effect of programming paradigm choice on problem-solving approaches, and introduce data structures and algorithms. We will also examine knowledge representation schemes for capturing biomedical domain complexity and principles of data modeling for efficient storage and retrieval. The final project involves building a medical information system that encompasses the different concepts taught in the course.

Computer science basics covered in the first part of the course are integral to understanding topics covered in the latter part, and for completing the assigned homework.

Goals:
With this course, we hope to provide a foundation for scientists interested in using computers for solving biomedical problems. Computing with biomedical data poses unique challenges with respect to data volume, complexity, and uncertainty in data and in domain knowledge. Students taking this course should come away with a grounding in abstraction for problem decomposition and solution formulation, data modeling, and information management. The latter are key to analysis, development, and proper design of information systems.

Prerequisites:
Biomedical background and an interest in computing

Programming will be done in Java; no prior familiarity with Java is assumed.

Topics

The following is an outline of topics to be covered in this course.

Part 1: Introduction to Computing

(11 classses)

Omolola Ogunyemi

  1. Overview: How it all fits together
  2. Solving problems with the computer
  3. Algorithmic programming paradigms
  4. Imperative Programming: Approaches for abstracting model state (and defining program structure)
  5. Object-Oriented Programming concepts
  6. Java language details
  7. Computational Processes and Abstraction

Part 2: Data and Knowledge Representation

Qing Zeng
  1. Domain Ontology
    1. Definition
      1. Ontology = catalog of types that exist in a domain D from perspective of some person using language L
      2. Informal ontology = types undefined or defined in natural language
      3. Formal ontology = types organized into partial order (lattice)
      4. Axiomatized formal ontology = axioms provide additional contraints on types
      5. Prototyped formal ontology = prototypical individuals provide characteristics. Similarity based on distance functions.
    2. Motivation: need to represent medical data/knowledge
      1. Example of a patient case
      2. Intended applications
        • Patient care (Reporting, Summary, trend)
        • Financial, Manangement, Clinical Research, expert system, information retrieval, vocabulary discovery
      3. Difficulty in representation
        • Lack of consensus
        • Comprehensiveness
          • Drugs, lab tests, signs, symptoms, narrative reports
        • Cost
        • Proprietary interests
    3. Distinction
      1. Definition
      2. Seven distinctions (Peirce and Whitehead)
        • Abstract/Physical
        • Independent/Relative/Mediating
        • Continuant/Occurrent
    4. Trees
    5. Axioms
    6. Unified Medical Language System
    7. Conceptual graphs
  2. Logic
    1. Propositional Logic
      1. Variable
      2. Connective
      3. Boolean algebra
      4. Tautology and deduction
    2. Predicate Logic
      1. Statement
        • Predicate
        • Arguments
      2. Quantifiers
  3. Special topics
    1. Natural language semantics
      1. Language Analysis
        • Morphology
        • Syntax
        • Semantics
      2. Concepts and Relations
      3. Resolving Ambiguities
    2. Uncertainty
      1. Source of uncertainty
      2. Representation Schemes
        • Fuzzy logic
        • Probability
        • Non monotonic logic…
      3. Example: Temporal Granularity as in TSQL

Part 3: Data management, querying, and retrieval

Aziz Boxwala
  1. Data concepts and modeling
    1. Nature of data
      1. Proxies and reality, metadata, data
      2. Relationship among data
    2. Data models
      1. User models, conceptual models, physical models
      2. Modeling the meaning of data
        • Degree, dependencies, time, uniqueness, generalization-specialization, aggregation
      3. Modeling methods
        • ER models, Relational models, Object-oriented models, Hierarchical and network models
  2. Relational model
    1. Maintaining integrity of data
      1. Uniqueness and Keys
        • Primary and foreign keys
      2. Domain of data
      3. Normalization of models
        • Functional dependencies among data
        • Normal forms
    2. Implementing a relational database
      1. Transforming logical models to physical models
        • Tables and views
        • Case tools
      2. Data dictionaries
      3. Joins and queries
        • Inner and outer joins
        • Query optimization
      4. SQL
        • DDL, DML, DQL
      5. Security models in RDBMS
  3. Special topics
    1. Overview of object-oriented data management
      1. Review of classes and objects, inheritance, encapsulation
      2. Object-relational DBMS
      3. Object-oriented databases
    2. Modeling for analytical processing of data
      1. Star-join schema
        • Comparisons with transactional models

Assignments, Exams, and Grading

There will be weekly homeworks, consisting of programming assignments in Java. Assignments are generally due one week after they are distributed. Assignments submitted up to one week after the due date will get an automatic deduction of 10 points (i.e., if you submit your homework up to a week after the deadline, the maximum score you can receive is 90/100). Assignments submitted between one and two weeks after the deadline will get an automatic deduction of 20 points. Assignments submitted more than 2 weeks after the deadline will receive a score of 0 automatically. Please speak to the instructors if you believe you will need more than 3 weeks to complete an assignment.

The final grade will be based on homeworks (50%), a mid-term exam (20%) and a final project (30%). The midterm exam will be open book, open notes. Your class participation will also be considered in determining your final letter grade.

On-line Resources

Collaboration Policy

The instructors believe that collaboration is an important part of your educational experience, and you are encouraged to discuss the course assignments with your fellow students and to form study groups where appropriate.

To avoid crossing the line between collaboration and cheating/plagiarism, we require that you develop your own solution to each assignment. You may discuss strategies for solving programming assignments with your classmates and obtain assistance from them in debugging code that you have written. However, all code that you write and submit must be your own code with the following exception: When a homework assignment depends on code developed for a previous assignment, the instructors will provide solutions to the previous assignment that you may modify and extend to complete the new assignment. If it is determined that several people have cheated on an assignment, the grade for the assignment will be divided among all involved. For example if four people cheat and the work obtains a score of 100/100, each person will get an individual score of 25/100.

Please reference sources that contribute to your homework solutions.

No collaboration is permitted on exams.