The ChENL Project Work Log


August 9, 2007

ChENL has grown quite a bit in the last few months.  It can now read
and write some file types for DLPOLY and Gaussian, allowing me to
create many nifty analysis tools.  It also just recently got the start
of Qt/OpenGL molecule viewer, which will let me (finally) get some
PRETTY 2D contour plots soon.  It is halfway towards LAPACK
integration, and it works on Mac darwin too.

Basically I want ChENL to become a major platform for my future work.
Right now that means being able to manipulate and visualize MD and
quantum chemistry data, and soon that may mean fresh code to actually
do MD and ab initio calculations.  I'll soon need to put some
screenshots online and beef up by a large amount the README the web
pages.

May 12, 2007

I started the next layer of abstraction codenamed "pillars".  This
layer's purpose is to encapsulate very basic physics, math, and
chemistry data types: fundamental constants, atoms, molecules,
particles, waves, angles, vectors, etc.

The first part of pillars are basic wrappers for values and angles;
these keep track of numerical precision and later will be able to warn
or abort when roundoff error is likely to corrupt results.  This being
C++ in the release code these checks do not compile in, instead the
wrappers reduce directly to long doubles.  Of course, like the Logger
this level of code has a lot of #ifdef DEBUG in it, but I'm hoping
that no other layers on top will need to.

In other news, I found a second ab initio package licensed under GPLv2
called "psi3".  It is hosted on Sourceforge and at initial check looks
quite good.  It's in C and C++ (and reasonably modern C++ at that).  I
just might see what kind of work is necessary to dig into that
project...

April 6, 2007

Well, I've got a decent Logger, I18n, some basic exception classes and
a template for future work.  All that's left are the XML schemas for
the message reference and localized strings and I'll be able to cut
marrow.

April 5, 2007

Hmm, the sustitution code in I18NMessage was pretty trivial.  I'll add
a few more things before the 'marrow' cut...probably do that early
next week.

April 3, 2007

The base is almost ready for its first file release.  It has a bare
minimum of enterprise-class "Hello world": I18n, logging, a base
exception class, and base classes for application info/warning/error
messages.  Once I get the string substitution code working in
I18NMessage._translate() I'll cut it.

Neat!  I put in the C++ demangler code from GNU binutils (which
conveniently is licensed GPL/LGPL so I can use it).  Now my
stacktraces look even better.

April 1, 2007

Things have changed, again.  I could not solve the performance
problems of the Lisp code, in fact I was already using the fastest
available implementations of all the functions in the pipeline.  After
a quick test showed that sscanf() was on the order of 1,000 times
faster than parse-float I decided that I could not in good faith
continue this project in Lisp.  (It also didn't help that my code was
morphing slightly between SBCL and CMUCL.)  I like Lisp, I learned
some valuable lessons, and for web apps and business logic I'll take
it over Java, but for quantum chemistry I need to go with a platform
that provides a reasonable subset of syntactical goodness, but at much
higher performance and with some guarantees that it will be available
on high performance clusters.  So: C++.

I've almost got the base logger, exception, and i18n classes set up.
I always start with those since essentially every function depends on
them and getting them right pays off a lot in the end.

I'm using Boost and the standard library (a.k.a. STL) as much as
possible: shared_ptr, auto_ptr, string, exception.  I don't want to be
passing raw pointers around (which I saw mpqc doing) and I don't want
to be like C.  So far it's working out OK.  Boost.build isn't too bad,
I kind of like how it supports debug and release builds right out of
the box.

While I'm here, let me outline what goes in the library.  Logging,
i18n, math (BLAS), chemistry model (atoms/molecules, common
properties, orbitals and basis sets, etc.), MPICH, and file formats
(especially PDB).  I'll eventually build some applications out of
these tools for Hartree-Fock/MP2/MP4 with 3-21g/6-31g/6-311g/etc.  I
figure once that's done I'll begin advertising.

July 28, 2006

I've started working on a new-and-improved project plan for ChENL.
Basically, I've begun doing actual quantum chemistry calculations at
school and been exposed to Gaussian, DL_POLY, Cerius2, XChrysDen,
Ghemical (actually one of my favorites but I can't use it very often),
etc.  I've discovered that the QC industry as it were has three kinds
of software:

1.  Proprietary software that ranges in price from very affordable
    (Gaussian2003) to "ouch!" and can be quite powerful.  However, the
    EULA terms can also be very problematic (see
    http://www.bannedbygaussian.org).  Often the source code IS
    provided to these programs because they run on highly custom
    hardware.

2.  Proprietary software that is free in price but strictly limited in
    its distribution.  Sadly, several of these programs (note: the
    industry likes to use the word "codes" instead of "programs" -- I
    as a CS major prefer programs) were/are funded by public-sector
    dollars.

3.  Truly FOSS software.

Unfortuately a lot of software in category #3 is not used so much by
the PhD's to get solid traction against #1/#2.  (I'm genuinely
surprised that the community hasn't gravitated to the GPL-type
offerings, considering how many people would benefit (e.g. more
papers).  But anyway...)  The problem with software in category #1
that runs on the backend (besides total dependence for your career on
a fickle vendor) is that it's VERY unintuitive once you scratch below
the surface: you get file formats that look like they came straight
from 1961 with NO comments inside.  Ick.

I've decided to try my hand at creating a framework for these kind of
problems.  I'm not even going to try to displace any of the
well-tested stuff out there, I'm just going to push for a somewhat
novel-for-quantum-chemistry approach.

I'm going to stick with Lisp for the mainline code using non-Lisp
libraries where appropriate (Octave, ODBC, matrices, etc.).  This is
an odd decision and I'm not completely comfortable with it...thinking
out loud...  I've already encountered some significant performance
issues in a simple program that reads a flat ASCII file filled with
floats, parses them, and does some easy calculations.  The parsing
part is what takes forever though, the calculations part is fast
enough.  I'm hoping that with a larger project underway I'll be able
to get over the "hump" of figuring out how to make Lisp faster, and
once that's done things will be OK again.  Basically if I stick with
Lisp I suffer from performance, but then I NEVER have to worry about
"how can make the language do this neat idea X"?  20 years from now,
that may prove to be a very valuable feature.

January 19, 2006

I've decided to move the project over to Lisp as the primary language
(CMUCL first, then SBCL).

November 4, 2005

Got DataTable2D mostly functional.  I can point it at a CSV file and
ask for linear and polynomial interpolation using any two columns as X
and Y.  Unfortunately the data has to be ordered on load, e.g. I can't
do a "scatterplot".

October 25, 2005

Finally figured out why I *hated* the Units class: it was hard to
construct, didn't have support for temperature-like conversions, and
was in general a messy first try.  Now I've stubbed in a new API that
will make it a breeze to initialize new Units based on old ones and
retain the "tree" so that 1 foot = 12 inches and 1 inch = .00254
meter.  Broke a lot of code of course, but once this round is done
I'll be set to start using unit conversions throughout the rest of it.

October 24, 2005

Stubbing in classes for compounds and fluids.  I'm hoping to use this
for my Thermo II homework due next week.

October 23, 2005

Just made DataSeries mostly functional.  It can load CSV files and can
now be passed to the standard statistics functions.  Just need to add
a few more functions and I can start using it in place of Excel for a
lot of engineering-like computations.

Still would like to get my unit conversions to work though, then I'd
have something really unique.  Right now I'm stuck in "puny
implementation of common algorithms" land.

October 20, 2005

The good news is all the code I've put in for polynomial stuff works
great and my homework rules.  The bad news is the effing test was all
about memorizing formulas such that all the work I put into actually
making a functional program wasted my time in studying for the exam.
(fumes)

October 12, 2005

The first class (Matrix) is about 80% done now.  I'm now stubbing in a
simple Polynomial class to do root- and answer-finding for me.

I also inserted a simple error class that can be thrown.  For most of
my applications, the setup needs to be self-checking: am I
manipulating the right dimensions to get the result I want?  My
typical first pass will be to set up a problem including units, try
out the manipulation, and see if any functions barf.  Once that works,
for speed I can define NO_UNIT_CONVERSIONS, recompile, and get fast
results.  Since Unit and Measurement are the base classes for the
setup, I want them to assert().  The higher-level structures will use
exceptions.  One day I'd like my error class to include a functional
stack trace, but for now it at least identifies the file and line.

Too bad I won't get much use of this from this semester, but I figure
with methodical addition I'll have a basic library for next term that
might make my homework manipulations significantly easier.

October 7, 2005

I'm taking a numerics course and had a large assignment with matrices
this week.  I've now added basic matrix operations, and also a flag to
turn off unit conversion code.  I need to fix units.cpp so the unit
operations actually work, right now I'm creating some kind of infinite
loop in multiply.

February 1, 2004

Project started.  Registered on SourceForge, awaiting approval.  LGPL.