PPT presentation - Tempus Project Site

Download Report

Transcript PPT presentation - Tempus Project Site

Teaching Bioinformatics
Nevena Ackovska
Ana Madevska - Bogdanova
Outline

Motivation

Agent based approach

Theoretical background

Practical work

Conclusion remarks
Motivation

Subject idea: Train students in building
modern Intelligent Systems. The subject
offers topics that include modeling the real
world, Data Mining, Robotics, Bioinformatics
and many more.

Cover the methodology used in teaching
Bioinformatics part of the Intelligent Systems
course.

Extended to other useful applications.
Building intelligent systems

To build an intelligent system, system that
can
cope
with
the
ever
changing
environment, one needs to have a knowledge
of many areas of today science:





artificial intelligence,
robotics,
material science,
cognitive science,
and a lot of knowledge that we can use of the
biological systems which are true representatives
of the class of Intelligent systems.
Framework: The Cell as an Agent


The basic framework of teaching Intelligent Systems is the
fact that every intelligent system is an agent. The agent
cannot be observed separately from its environment.
The intelligence is observed in interaction between the
agent and the environment.
ENVIRONMENT
Situation
Behavior
AGENT
Self - functioning intelligent systems

The best functioning intelligent systems are the living
systems.

Basic representative - the biological cell.

The cell could answer to environmental changes with
three types of response: behavior (for example
movement), production (biosynthesis of material, ex.
proteins), and cell division and multiplication.

Many scientists argue that the intelligent behavior of
the cell is encoded in its genetics system
Modeling cell processes and actors

To model processes that happen in the
cells we can use:
biochemical
linguistic
knowledge,
metaphor,
manufacturing
system’s

metaphor
software prospective
For the purposes of intelligent systems
course, we are mainly concerned with the
basic biochemical knowledge, and the
modeling of the genetics processes is
done using linguistic approach.
Theoretical background:
Terminology

It is important for the students to understand
the terminology and basic processes behind
the biological problems.

There are two different types of biological
sequences studied in this class: DNA/RNA
and amino acids (proteins).
Deoxyribonucleic Acid (DNA)

is the basis for the building blocks encoding
the information of life.

A single stranded DNA molecule, called a
polynucleotide or oligomer, is a chain of
small molecules called nucleotides.

There are four different nucleotides, or
bases: adenosine (A), cytosine (C), guanine
(G) and thymine (T)

It was important for student to understand
that stringing together a simple alphabet of
four characters together we can get enough
information to create a complex organism!.
Ribonucleic Acid (RNA)

is similar to DNA in the fact that it is
constructed from nucleotides.

Instead of thymine (T), an alternative base
uracil (U) is found in RNA.

Three of the major RNA molecules involved
in protein synthesis are messenger RNA
(mRNA), transfer RNA (tRNA) and ribosomal
RNA (rRNA).
Proteins

Proteins are polypeptides that have a three
dimensional structure.

Consisted of 20 amino acids.

They can be described through four different
hierarchical levels.

The first one, the primary structure is the
sequence of amino acids constituting the
polypeptide chain, is the level that the
students were considering during the course.
Central dogma

The flow of genetic information. DNA directs
the synthesis of RNA, and RNA then in turn
directs the synthesis of protein - Central
Dogma of Molecular Biology.
DNA  RNA  протеин
Practical student work

Since the linguistic viewpoint of genetics
processes is very natural for computer
analysis, we use it for modeling in the
students’ projects.

Two types of problems to be solved.
1.
2.
Build a module that should simulate the
biosynthesis of proteins
Use the complementary principle in genetics in
order to obtain 3D spatial forms of the actors in
the process of biosynthesis of the proteins
Problem 1: Genetic Turing machines
ATGAAGCCTATTTCGCTAACCAAACATTACGGTGG....
Input tape:
DNA
RNA
Polymerase
Moving direction
Interface tape: mRNA
AUGAAGCCUAUUUCGCUAACCAAACAUU
Ribosome
Moving direction
MetLysProIle
output tape: protein
Solutions to problem 1
Solutions consisted mostly of 5 types of
modules:
check validity of DNA file:

1.

DNA file is consisted of only 4 letters A,C,T and
G. It has to have a starting triplet ATG where from
the transcription is beginning.
obtain mRNA file:
2.

transcript from DNA alphabet, to RNA alphabet
Solutions to problem 1 (continued)
check validity of mRNA file:
3.

consisted of only 4 letters A, C, U and G. It has to
have starting triplet AUG. It has to have ending
triplet (UAA, UAG or UGA). Between the starting
and the ending triplet, there must be number of
RNA letters that can be organized in triplets.
obtain protein file:
4.

translate from polynucleotide language (mRNA
alphabet) to polypeptide language (protein
alphabet)
check validity of protein file:
5.

protein files are built by 20 letters of the amino
acid alphabet. They start with the amino acid Met.
Problem 2: Complementary principle to
build spatial forms



Using the complementary principle in
genetics in order to obtain 3D spatial forms
of the actors in the process of biosynthesis
of the proteins.
This principle enables (among many other
things) RNA molecule to build its secondary
structure.
The complementary principle of the RNA
molecule (using the linguistic metaphor)
states that letter A is complementary to letter
U (and vice versa), and the letter C is
complementary to letter G (and vice versa).
Complementary principle

This principle enables for the complementary
substrings to be able to fold and produce the
secondary RNA string structure..
…CCCUUAUAGGCCCAUCAUAAGGCC
Observations on students’ work

The students of Computer science naturally
cope with the problems that consist of
strings, files, their input, processing and
output result.

When the problems discussed above were
stated as typical genetics problems, they
could not easily be understood.

Strings and alphabets of any kind: RNA
string, DNA string, protein string or alphabet
and so on are very easily understood and
processed to obtain the needed result.
Observations on students’ work

The folding of the RNA structure was not
understandable
until
some
students
discovered that the problem could be
postulated as
find substring A1 and substring A2 in the RNA
file, such that A1 is complementary string to
A2.

Once the problems were postulated in
linguistic terminology, the solutions were
easily obtainable.
Students’ observations

The students observed how nature, ex. cell
solves the problem of passing information
about some new situation that happened in
its environment.

The Computer science students who built
graphical simulations of the biosynthesis of
protein problem, observed that it is not only
information that is circulating in the cell.
They observed that there is extensive
material circulation going on in the biological
cell.
Conclusions



Great potentials of teaching bioinformatics in the
computer science curricula.
Two types of bioinformatics problems were
postulated to and solved by the computer
science students: the problems of generating
protein string from the DNA file, and the problem
of RNA secondary structure.
The genetics problems that are postulated
through the linguistics viewpoint can be easily
modeled and solve with great success by
computer science students.
Conclusions

The “raw” geneticists’ terminology is more
difficult for the students to cope with.

Once the information processing part of the
cell could be solved, the computer science
students realize that the processes in the
cell are not merely information passing
processes, but rather synthesis of
information and material transformation
processes

Fun to work on interesting projects
Questions
Transcription: DNA  RNA
DNA
A
C
G
T





RNA
U
G
C
A
Based on principle of complementarity.
Solutions to problem 2

Check validity of RNA file

Find complementary RNA substrings