| HST 952 |
Computing for Biomedical
Scientists |
Fall 2003 |
Course Information
- Course announcements
- Course calendar
- Level and Units:
- Grad H; 3 (lecture)-0 (lab)-9 (prep)
- Time and Place:
- Tuesdays/Thursdays 11 a.m. - 12:30 p.m.
Location: MEC Rm 334
Tosteson
Medical Education Center Harvard Medical School (260 Longwood
Avenue) Exception: MEC, Room 333 on Sept. 4 and Sept. 25
- Instructors:
- Omolola Ogunyemi, Aziz Boxwala, Qing Zeng
For general
correspondence and homework submission
send e-mail to ta952@dsg.harvard.edu
cc: rclacson@mit.edu
- Office Hours:
- Tuesdays 10-11 a.m. as requested
- Course mailing list:
- hst952@yahoogroups.com
- Textbooks (required):
- Java: An Introduction to Computer Science and Programming, 3rd
edition
Author: Walter Savitch Prentice Hall,
2003
Modern Database Management, 6th edition Authors: Fred R.
McFadden, Jeffrey Hoffer, Mary B. Prescott Addison-Wesley,
2001
- Supplementary texts/readings:
- Foundations of Computer Science
Authors: Alfred Aho,
Jeffrey Ullman W. H. Freeman and co., 1995
Handbook of Medical Informatics Authors: Jan H. van Bemmel,
Mark A. Musen Springer-Verlag, 1997
Principles of Database and Knowledge-Base Systems, Volume
1 Author: Jeffrey D. Ullman W H Freeman & Co.,
1988
Relevant papers or readings selected by instructors from:
Knowledge Representation: Logical, Philosophical, and Computational
Foundations Author: John F. Sowa Brooks/Cole Pub
Co., 1999
Note: The textbooks/readings listed will be on reserve at
Harvard's Countway
Library of Medicine. Some of the texts can also be found at MIT's Barker Engineering Library,
or at Harvard's Gordon McKay Applied
Sciences Library. |
Course description
This course introduces abstraction as an important
mechanism for problem decomposition and solution formulation in the biomedical
domain, and examines computer representation, storage, retrieval, and
manipulation of biomedical data. As part of the course, we will briefly examine
the effect of programming paradigm choice on problem-solving approaches, and
introduce data structures and algorithms. We will also examine knowledge
representation schemes for capturing biomedical domain complexity and principles
of data modeling for efficient storage and retrieval. The final project involves
building a medical information system that encompasses the different concepts
taught in the course.
Computer science basics covered in the first part of the course are integral
to understanding topics covered in the latter part, and for completing the
assigned homework.
Goals:
With this course, we hope to provide a foundation for
scientists interested in using computers for solving biomedical problems.
Computing with biomedical data poses unique challenges with respect to data
volume, complexity, and uncertainty in data and in domain knowledge. Students
taking this course should come away with a grounding in abstraction for problem
decomposition and solution formulation, data modeling, and information
management. The latter are key to analysis, development, and proper design of
information systems.
Prerequisites:
Biomedical background and an interest in computing
Programming will be done in Java; no prior familiarity with Java is assumed.
Topics
The following is an outline of topics to be covered in this
course.
Part 1: Introduction to Computing
(11 classses)
Omolola Ogunyemi
- Overview: How it all fits together
- Solving problems with the computer
- Abstraction: creating a model for a problem and devising appropriate
methods (algorithms) that can be automatically applied to solve the model
problem
- Computational process: the solution to the model problem (and its
subproblems)
- Data: the information processed by a computational process
- Computer program: a description of a computational process
- Programming language: the notation used for describing the computational
process
- Algorithmic programming paradigms
- Imperative programming
- Programming is issuing commands (imperatives)
- Variables and assignment are used to change state, and serve as an
analog of the concept of a modifiable store supported by real computers
- Lends itself easily to procedural and object-oriented programming
styles
- Examples of programming languages that naturally support this
paradigm: C, C++, Java, Pascal, Fortran
- Declarative programming
- Programming is defining/declaring a solution
- Notion of state that reflects the underlying computer architecture is
not an important consideration
- Functional and logic programming are declarative programming
approaches
- Examples of programming languages that support this paradigm: Lisp,
Scheme, ML, Prolog
- Imperative Programming: Approaches for abstracting model state (and
defining program structure)
- Procedural Programming
- Commands are expressed as procedures/functions
- Distinctions are maintained between data (state) and operations on
data (behavior)
- To carry out a task, issue a sequence of appropriate commands --
find/create appropriate procedures that modify data (state) and call them
- Object-Oriented Programming
- Data (state) and functions that manipulate the data (behavior) are
grouped together into a modular unit called an object
- Commands are issued by objects
- A task is carried out by creating a structured network of the
appropriate objects and having the objects interact by exchanging messages
- Object-Oriented Programming concepts
- Encapsulation
- Polymorphism
- Inheritance
- Java language details
- Data types
- Expressions
- Control statements
- Computational Processes and Abstraction
- Procedural Abstraction
- Data Abstraction
- Lists
- Binary trees
- (Brief) Algorithms for searching
- (Brief) Algorithms for sorting
- (Brief) Complexity of algorithms (Big-Oh notation, etc.)
Part 2: Data and Knowledge Representation
Qing Zeng
- Domain Ontology
- Definition
- Ontology = catalog of types that exist in a domain D from perspective
of some person using language L
- Informal ontology = types undefined or defined in natural language
- Formal ontology = types organized into partial order (lattice)
- Axiomatized formal ontology = axioms provide additional contraints on
types
- Prototyped formal ontology = prototypical individuals provide
characteristics. Similarity based on distance functions.
- Motivation: need to represent medical data/knowledge
- Example of a patient case
- Intended applications
- Patient care (Reporting, Summary, trend)
- Financial, Manangement, Clinical Research, expert system,
information retrieval, vocabulary discovery
- Difficulty in representation
- Lack of consensus
- Comprehensiveness
- Drugs, lab tests, signs, symptoms, narrative reports
- Cost
- Proprietary interests
- Distinction
- Definition
- Seven distinctions (Peirce and Whitehead)
- Abstract/Physical
- Independent/Relative/Mediating
- Continuant/Occurrent
- Trees
- Axioms
- Unified Medical Language System
- Conceptual graphs
- Logic
- Propositional Logic
- Variable
- Connective
- Boolean algebra
- Tautology and deduction
- Predicate Logic
- Statement
- Quantifiers
- Special topics
- Natural language semantics
- Language Analysis
- Morphology
- Syntax
- Semantics
- Concepts and Relations
- Resolving Ambiguities
- Uncertainty
- Source of uncertainty
- Representation Schemes
- Fuzzy logic
- Probability
- Non monotonic logic…
- Example: Temporal Granularity as in TSQL
Part 3: Data management, querying, and retrieval
Aziz Boxwala
- Data concepts and modeling
- Nature of data
- Proxies and reality, metadata, data
- Relationship among data
- Data models
- User models, conceptual models, physical models
- Modeling the meaning of data
- Degree, dependencies, time, uniqueness,
generalization-specialization, aggregation
- Modeling methods
- ER models, Relational models, Object-oriented models, Hierarchical
and network models
- Relational model
- Maintaining integrity of data
- Uniqueness and Keys
- Domain of data
- Normalization of models
- Functional dependencies among data
- Normal forms
- Implementing a relational database
- Transforming logical models to physical models
- Tables and views
- Case tools
- Data dictionaries
- Joins and queries
- Inner and outer joins
- Query optimization
- SQL
- Security models in RDBMS
- Special topics
- Overview of object-oriented data management
- Review of classes and objects, inheritance, encapsulation
- Object-relational DBMS
- Object-oriented databases
- Modeling for analytical processing of data
- Star-join schema
- Comparisons with transactional models
Assignments, Exams, and Grading
There will be weekly homeworks,
consisting of programming assignments in Java. Assignments are generally due one
week after they are distributed. Assignments submitted up to one week after the
due date will get an automatic deduction of 10 points (i.e., if you submit your
homework up to a week after the deadline, the maximum score you can receive is
90/100). Assignments submitted between one and two weeks after the deadline will
get an automatic deduction of 20 points. Assignments submitted more than 2 weeks
after the deadline will receive a score of 0 automatically. Please speak to the
instructors if you believe you will need more than 3 weeks to complete an
assignment.
The final grade will be based on homeworks (50%), a mid-term exam (20%) and a
final project (30%). The midterm exam will be open book, open notes. Your class
participation will also be considered in determining your final letter grade.
On-line Resources
- Java 2 standard edition version
1.4 downloads
- Sun's java
documentation (jdk 1.4)
- Companion web-site for the
Java textbook. Has links to interactive quizzes for each chapter of the book
- Corrections
of errors in the textbook
- Sun's java
tutorial
- On-line java
tutor (assumes little or no programming background)
- Usenet newsgroups:
- comp.lang.java.* (the * is a wild card that matches any
characters. There are 16 different java related newsgroups under the
comp.lang.java hierarchy)
- sci.med.informatics
You can read these newsgroups through Google's
free server or through a particular
institution's server (e.g. news.mit.edu, news.dfci.harvard.edu,
etc.)
Collaboration Policy
The instructors believe that collaboration is an
important part of your educational experience, and you are encouraged to discuss
the course assignments with your fellow students and to form study groups where
appropriate.
To avoid crossing the line between collaboration and cheating/plagiarism, we
require that you develop your own solution to each assignment. You may discuss
strategies for solving programming assignments with your classmates and obtain
assistance from them in debugging code that you have written. However, all
code that you write and submit must be your own code with the following
exception: When a homework assignment depends on code developed for a previous
assignment, the instructors will provide solutions to the previous assignment
that you may modify and extend to complete the new assignment. If it is
determined that several people have cheated on an assignment, the grade for the
assignment will be divided among all involved. For example if four people cheat
and the work obtains a score of 100/100, each person will get an individual
score of 25/100.
Please reference sources that contribute to your homework solutions.
No collaboration is permitted on exams.