admin's blog

Database Meets Revision Control

Any developer who has worked with HIPAA compliancy knows that the law is murky at best and the fed doesn't publish a programmers' guide to make your life any easier.  However, one of the cardinal rules is the requirement to keep track of who sees HIPAA data, who modifies it and when this was done.  Another is that if you delete/update patient data you need to log what was deleted/updated in order to provide an audit trail, if only for the lawyers.  Failure to do so can subject a company to some pretty draconian penalties.

This creates a challenge on the database side because SQL UPDATE obliterates a record's history.  There are a few potential solutions, such as maintaining a changelog which such updates are written based on table triggers.   I've done this but the log of atomic changes can grow immense.  It's also difficult to reconstruct a large record based on potentially dozens or even hundreds of changes to records which must be retained for up to six years. That's how a traditional RCS would handle rollbacks but it's not practical inside the confines of a database.

SSH Tunnel for PostgreSQL Connection on AWS EC2

AWS has become a fairly ubiquitous hosting option for small companies.  But developers typically work on local dev setups outside the host and occasionally need access to the main dev and staging databases located on AWS.  How do you do this and still run in a secure environment?  One thing you don't do is poke a hole in AWS' firewall and run PostgreSQL's port in the open.

tags: 

Setting up a hybrid Google Apps mail account

I've run my own mail server since, well, the UUCP days. I used to host a lot of mailiing lists so over the past 20+ years I've run Sendmail, Exim, Qmail and Postfix. They're all different but they share one thing in common: unless running mail servers is your hobby, they're not fire-and-forget applications, especially in the high-spam, high-malware, post-Snowden environment today.  Maintaining a mail server is a chore. You walk a fine line between being buried in UCE and blackholing your Uncle Rich.

I stopped running mailing lists several years ago and since then I've wanted to outsource my mail servers to a reliable third party host.  When I first experienced Google Apps I knew that's where I wanted to be. I don't understand what Google is doing but it's the best mail handler I've used.  I rarely get spam in my Gmail but I've never had a false negative that I'm aware of.  On Gmail, you don't have to muck with Baysian filter settings or install RBLs.  It just works.

The problem is that I create a different email address for every web site I use so I have well over 700 aliases and Google Apps only supports 30 per user with no options to increase that number.  One alternative is to use Google Groups for aliases but that presents its own set of problems. Then a friend of mine, Jesse, told me about yet another alternative.

What Jesse does is keep the MX for his domain and runs his own mail server. But all his local server does is act like an alias forwarding agent.  When mail arrives for jesse@jessedomain.com it consults its alias database and forwards the mail on to his Google Apps account and one of the restricted number of Google aliases.  Google doesn't have MX for his domain but it's set up to send mail as XX@jessedomain.com.

This is exactly what I wanted for myself and in fact tried a couple of times to get it to work.  It failed because I handed my primary MX to Google Apps. The first part of the trick is not to do that.  Keep your MX or use Google only as a fallback MX.

The question is if mail arrives at my server as steve@somedomain.com how can I forward it to a different steve@somedomain.com on Google Apps?  That's the second trick.

tags: 

SMB+SSH: Ubuntu server and OSX client

The title above is pretty close to the Google search query I used in vain to find a recipe for tunneling an OSX Samba client to an Ubuntu 14.04 server. Hopefully this post will save someone the hours I spent trying to set this up.

In the end, like so many Unix projects, the answer turned out to be simple. All that's needed is a configured and functioning Unix/Linux Samba and SSH server .   Everything else is on the client side.

I'm not unfamiliar with Samba. I ran it for years between a FreeBSD Unix server and Windows XT workstation. It had its quirks, and still does. When I dumped Windows for a shiny, new Mac Pro in 2009, I switched to NFS. But with each successive OSX upgrade, NFS got flakier to the point where it became useless so I returned to Samba.  But Samba is inherently insecure outside of a trusted LAN so for out-of-office occasions I started using SSHFS. Unfortunately, SSHFS relies on deprecated, third-party software on the OSX side and it was s..l..o..w.  My PHP Storm IDE was grinding through directory refreshes after Git checkouts.

With the release of OSX Mavericks 10.9, Apple announced that it was dumping yet another networking protocol -- it's own greybeard AFP. To replace it, they embraced SMB2. Or... ta da... Samba. Technically, SMB2 isn't officially Samba however OSX has unofficially supported Samba clients for several operating system releases.  Samba(tm) (the Unix server) is actually a product of the open source team at Samba.org.  SMB is an acronym for Server Message Block, which is a proprietary Microsoft protocol. Samba is built to the published white paper spec for SMB.  

Optimizing a Result Set Pager

It's ubiquitous on data driven web sites: the result set pager.  We've all used them whether we built them from scratch or used one provided by the framework.

Pagers are by nature performance suckers because we're asking the database to re-run the same query for each "page", slicing off just one set of contiguous rows for each page. If your result set is 10,000 rows long but you're only paging through them 10 rows at a time, that's potentially 1,000 database requests to view the entire set.

But it's worse than that because in order to provide those nifty pager controls, like those in the image above, the software has to know how many rows are in the larger result set so it can do the math to populate the navigation for those page numbers.  In other words, using the above example the software needs to know that there is a Page 14 to jump to.

A little background first.  Internally, garden variety pagers are pretty much the same.  They request a fixed number of rows to display per page, like 10.  That becomes your LIMIT filter in the database query:

SELECT * FROM people LIMIT 10;

To create the page navigation you need to do some math to generate the OFFSET.  For instance, using a page size of 10, the query for a Page 3 display would look like this:

SELECT * FROM people OFFSET 20 LIMIT 10;

Embedding a View in a Node

A couple of people asked me how I embedded the client logo slideshow inside the content area of the "Brooklyn Technical Services has over 20 years' experience..." node on the front page here.

Actually, that was the easy part.  The hard part was building that slideshow, mainly because I hadn't done it with Drupal 7 before.   After several false starts I found a great video tutorial on Views Slideshow by a Toronto developer who walks you through the process, step by step.  He moves a little quickly but that's what the pause button is for.

tags: 

Fixing OG Forum (Organic Groups forums)

Organic Groups is a module -- a broad suite of modules actually -- allowing you to create and maintain private groups inside of the Drupal content management system.  While Drupal has a powerful roles and taxonomy system with lots of third-party node access tools, Organic Groups (OG) is an out-of-the-box solution that doesn't require much knowledge of Drupal internals to get yourself up and run

tags: 

Holiday fun: Designing a stained glass Christmas tree with GlassEye 2000

One of my hobbies is constructing stained glass, which is something I got into out of necessity while restoring an old house in Brooklyn.  The cost for replacing or, worse, restoring old stained glass panels was frightening enough that I took some classes to learn how to do it myself.  Fortunately, I learned that working with stained glass is somewhat the same as woodwork joinery so the transition wasn't too difficult once I learned the tools and their tricky techniques.

However, Rembrandt I ain't. I can visualize things pretty well but there's a bridge out somewhere between my left and right brain. With woodworking, I usually wind up head jamming the fabrication. It works 90% of the time. The other 10% is handled by my hard-won skills in making dumb mistakes look like I meant to do that. But this ad hoc process doesn't work for stained glass construction, where you need to have a completed design and pieces cut before you start soldering things together.

Finding duplicate records in a database: the SQL HAVING clause

One issue I run across occasionally is a table with duplicate entries such as two entries for the same company in an accounts payable system.  This can create embarrassing problems with billing if ACME Inc #1 is 90 days overdue because someone credited a payment, and now a credit, to ACME Inc #2.

Pages