Financial Cryptography: Comment on my War On SQL

Comments: my War On SQL

There was an article around mid-90s about how rdbms & sql set the computing business back (at least) 20 yrs.

disclaimer: i was in sjr in 70s & 80s as well as handling some of the technaology transfer to endicott for sql/ds ... misc. past post mentioning original rdbms/sql
http://www.garlic.com/~lynn/subtopic.html#systemr

this is old post about jan92 meeting in ellison's conference room ... one of the people mentioned claimed to have done much of the technology transfer from endicott back to stl for (mainframe) db2.

as an aside, two other people also named at the same meeting ... later left and show up at small client/server startup responsible for something called commerce server (the startup had also invented this technology called "SSL" they wanted to use). We were brought to consult because they wanted us to do payment transactions ... the result is now frequently referred to as "electronic commerce"

during those early years of electronic commerce, RDBMS based webserver tended to have a significantly larger number of problems ... use of rdbms significantly increased the skill level and effort and the added complexity significantly increased the probability of mistakes, security vulnerabilities ... you name it ... it was always much worse.

i do some playing with sqlite3 because it is used by mozilla ... and i do a whole bunch of processing with firefox file (outside of firefox and backup information in various non-relational formats)

Posted by Lynn Wheeler at November 7, 2009 10:53 PM

"Cross-object transactions are tricky."

I use a "parallel update" routine for changing multiple files atomically. You give it a set of distinct file names, and for each file name you specify both a new and an old value. The routine automatically locks the files in ASCII order to avoid deadlock, then verifies that all the files still have their specified old values. If so, then all the new values are written and the locks released. If not, then no changes are made, and the caller must repeat its logic from scratch. The caller does this in an infinite loop until the parallel update finally succeeds. Statistically it will *always* eventually succeed, but you can abort after 1000 tries or so if you like.

This works extremely well in a highly concurrent system. I have a Perl test suite where you specify any number of processes (e.g. 3000) and any number of files (e.g. 5). Each process chooses two files at random, subtracting 1 from one file and adding 1 to the other. Each process loops until its parallel update succeeds. Often the update succeeds on the first try, but clashes occur more frequently with a larger number of processes and a smaller number of files.

After all the child processes are done, the main test program verifies that the sum of all the file values is precisely 0. (Some will be negative, some positive.)

The parallel update routine also does convenient things like auto-creating directory paths when it needs to create a new file. I also have it automatically delete a file when it is set to a null value, and delete any enclosing directories which become empty as a result. That way you never have to worry about "mkdir" and "rmdir".

I plan to use this technique in the Loom.cc software soon. Right now loom.cc uses a single locked GDBM file (a simple key-value store written by the good folks at Gnu). Using parallel update will allow massively concurrent updates with no single point of locking. In a real system clashes are rare -- but utterly disastrous if they do occur and you don't handle them correctly. Parallel update is a simple and powerful concept here.

When I roll this out at loom.cc you can look at the source code and see the "update" routine in all its glory, along with the demanding stress test. (It's written in Perl, but "beautiful" Perl, not script-kiddie Perl.)

Posted by Patrick Chkoreff at November 8, 2009 08:45 AM

You forget, sir, our old conversation re securing the communications between app and DB... No secure JDBC/ ODBC drivers afaik... And the dependency on the black box to secure its end of the deal...

Posted by AC2 at November 10, 2009 06:52 AM

for the fun of it:

Developers: The NoSQL Ecosystem
http://developers.slashdot.org/story/09/11/09/2335214/The-NoSQL-Ecosystem

and

NoSQL Ecosystem
http://www.rackspacecloud.com/blog/2009/11/09/nosql-ecosystem/

for something completely different ... old post in comp.database.theory on 3value logic
http://www.garlic.com/~lynn/2003g.html# How to cope with missing values - NULLS?

now their is something over dispute between rdbms and xml database. original markup language, GML was invented at the science center in 1969 (precursor to sgml, html, xml, etc). GML (generalized markup language) actually stands for the first letters of last names of the inventors. science center was also responsible for early virtual machine systems (cp40, cp67, vm370 ... gml original ran on cms under cp67). misc. past posts mentioning science center
http://www.garlic.com/~lynn/subtopic.html#545tech

some number of people transferred from science center to sjr ... where the original rdbms/sql implementation went on under vm370 ("L" from gml did some amount of work on "blobs" in r-star time-frame).

Posted by Lynn Wheeler at November 10, 2009 02:03 PM

...
Some people object to the NoSQL term because it sounds like we’re defining ourselves based on what we aren’t doing rather than what we are. That’s true, to a degree, but the term is still valuable because when a relational database is the only tool you know, every problem looks like a thumb. NoSQL is making people aware that there are other options out there. But we’re not anti-relational-database for when that really is the best tool for the job; it’s “Not Only SQL,” rather than “No SQL at all.”
...

Posted by NoSQL Ecosystem at November 10, 2009 03:06 PM

"Applications traditionally changed faster than relational data, but social data changes faster than applications"

Posted by seen on the net by Gunnar at November 11, 2009 09:31 AM

Have peps forgoton the real reason why we had to "normalise" not the later excuse?

Back in 1980 core memory was measured in Kbytes and magnetic media likewise or if you where very wealthy Mbytes. As for CPU speed well that was still measured in Mhz and as for flops you went to second silicon in the form of lookup tables or maths coprocessors.

Normalisation is and always will be a form of compression. "Data" is compressable for many reasons but... In compressing by normalisation you optomise out some data relations in favour of others.

It is why flat file DBs still have advantages over relational systems when it comes to either complex or ad hock enquiries.

Have a look at some SQL and you will see that the first thing being done is to re-build to get at relashionships that where normalised out...

Yes OO does have some advantages over relational but the problem has been shifted not removed. The problem now is methods...

In 20years I expect we will be having a conversation about "My War with OO". And guess what the "flat file DB" will still be there faithfully holding it all up. As was once said "the more things change the more they stay the same" ;)

Posted by Clive Robinson at November 13, 2009 10:00 AM

I started out programming in the real world on DEC's VMS using both DEC's Basic (a nifty language) and C (just for low level stuff).

I miss the built-in RMS or record management system. It was a sort of a flat file scheme that DEC provided interfaces to. I was in a multi-project/multi-programmer team and it worked quite well for us. Good times.

Posted by Purpleslog at May 14, 2010 02:58 PM

FiteClub, a london finance-tech evening get-together, says:

"NOSQL is presently a hot topic, with developers finding that traditional relational database management systems aren't always appropriate to their needs. An emergent definition is that NOSQL should stand for Not Only SQL, with databases that bring the best of both worlds - NOSQL for the speed and scalability of eventually consistent architecture, and SQL for where it's needed to integrate with legacy systems (or just the query language of choice for developers on the edge of the system)."

Posted by FiteClub at June 1, 2010 05:01 PM