2006/03/29
- DB got corrupted (again!) during weekend shutdown. Information-content and
GO similarity measures must be recalculated. Replacement of HD in servers. Consider
migrate DB to several smaller DBs (reduce update time).
- Implementing new measure of similarity: compounds in a reaction are divided
in substrates and products, similarity of two reactions according to their compounds
maps substrates to substates and products to products only
2006/03/16
- new pathways already in web. Queries take now aprox.
30 secs for 8 organisms and 24 pathways.
- updating info on several metabolism-related pathways (SQLite
database over 200MB)
2006/03/15
- organism Anopheles gambia added.
2006/03/14
- graphic file can be downloaded as .eps by clicking on the image
- enzyme info distances and reaction enzyme info distances precalculated
in DB
2006/03/13
- alpha value was de-activated, some distances were negative: fixed
- web server v0.53: new organisms and pathways added, functionality of "Select/Deselect all" buttons improved
2006/03/12
- bugs fixed in not_common_reaction_similitude
- web server v0.52: KEGG taxonomy of pathways included
2006/03/11
- maximum distance criterion is not supported in Bio::Tree::DistanceFactory, removed from web interface
- web server v.051: neighbor-joining clustering method added
- web server v0.5: KEGG taxonomy of organisms included
2006/03/08
- web server v0.4.1: branch lengths included
- graphic tree display enabled
2006/03/07
- web server v0.4: included optimizations for calculation of
reaction distances: 8 paths & 8 orgs in 5 secs.
2006/03/06
- graphic tree display temporarily disabled
- web server v0.3: reaction distances are now stored in
SQLite DB.
2006/03/04
2006/03/03
- started populating the DB. Table OrganismReaction was
necessary (objects we are measuring in ReactionDistance are not
Reaction but OrganismReaction).
2006/03/02
- a DB created through the SQLite shell seems to have some
compatibility problems when accessed from Perl scripts using DBI.
- problems overriding system's Perl old modules: consider for
mirror @ UPC.
2006/03/01
- current DB schema could lead to wrong retrieval of enzyme sets in
a reaction. Assuming the set of enzymes for a given reaction is the same
for a specific organism, this problem can be solved by including
"organism_id" as a FK in table Reaction and removing the (now) useless
table OrganismReaction (the assumption holds for current dataset, KEGG
documentation does not specify anything).
2006/02/28
- adding tables Organism, OrganismEnzyme, OrganismReaction,
OrganismCompound.
- DB starting to get complex: DIA database schema.
2006/02/27
- adding tables CompoundReactionDistance and EnzymeReactionDistance
to support pre-calculation of reaction distances.
- re-writing update-enzymes.pl: now includes calculation of
enzymes in each organism from pathway files, calculation of which enzyme
distances should be computed, computation of enzyme distances, and update
of enzymes/enzyme distances in the DB.
2006/02/24
- adding tables Compound, CompoundDistance, Reaction and
ReactionDistance: everything to be pre-calculated.
2006/02/23
- web server v0.2: enzyme distances are now stored as a SQLite DB.
Queries are processed in roughly half the time: 22 secs for 8 orgs, 8 paths.
Enzymatic similarity of two reactions is now calculated in O(3n) the number
of enzymes, before it was O(n*n). Compuation of similarity of set of
reactions can equally be reduced if we pre-calculate similarity of
reactions.
2006/02/22
- first test with enzyme distances stored in SQLite DB: works, but
extremely slow, something is wrong.
- more problems in www fixed: environment lacks most tools and
Perl version in /usr/bin is older than it should be.
- "SQLite.so: symbol fdatasync not found" problem in www fixed.
DBD::SQLite Makefile should be changed to include two extra flags
(-posix4 and -lrt) in gcc options (LDDLFLAGS and LDFLAGS). This
problem seems to be exclusive of Solaris machines (once again, bravo!).
- installed SQLite and DBD::SQLite. All enzyme distances (and
probably reaction distances as well) will be moved to a DB to be
accessed from the CGI.
2006/02/21
- faster processing of queries: only pathway files of selected organisms
are loaded (query with 8 orgs & 8 paths: 45 secs).
- semi-automatic synchronization with KEGG: pathway files obtained from
KEGG through scripts. Enzyme files obtained from pathway files. Distances not
yet calculated obtained from enzyme files.
2006/02/16