selecting the best bluray audio codec in postgres through sql order by statements

Today’s adventure in coding was to continue to moving Blu-ray support into my DVD ripper software. The next step was getting the priority audio codecs that I wanted when remuxing the exported tracks.

Blu-rays support a number of advanced audio codecs, including support for the same ones on DVDs. When I’m ripping DVDs or Blu-rays, I like to choose the preferred audio codec, based on the best format.

My DVD archiving tools (dart) use PostgreSQL as the backend. I won’t go into the details of my setup because that would take a very, very long time … but if you want to see the database schema for whatever reason, here it is.

In my audio table, I keep track of the formats with a short string. Here’s the names as well, plus this is in order of preferred codecs:

  • lpcm – LPCM unompressed audio
  • truhd – Dolby Digital TruHD (Atmos)
  • dtshd-ma – DTS-HD Master Audio
  • dtshd – DTS-HD
  • DTS – DTS
  • ac3 – Dolby Digital

Normally what happens in a disc structure is that the Blu-ray will have the audio tracks indexed by the best quality as well. So you’d have the first one be dtshd-ma, and then a dts with 6 channels, and then a stereo dts, etc.

That’s not *always* the case though, and since I’m pedantic, I hunt out the examples where they are not in that order.

I’m dumping the query here, because I’m likely going to want to know what it is — I’m looking for all examples where the audio index (ix) is set to one of the higher quality ones — implying that it is coming after a lower quality one (dts, ac3).

SELECT id, track_id, ix, format, language FROM audio WHERE track_id IN (SELECT track_id FROM audio WHERE format in(‘lpcm’, ‘dtshd-ma’, ‘truhd’, ‘dtshd’) and ix > 1) ORDER BY track_id, ix;

I’m trimming the results here to show samples. Here’s one where Dolby Digital is the first English track, and Dolby TruHD is the second.

    id | track_id | ix | format | langcode 
 62551 |    48679 |  1 |    ac3 | eng
 62552 |    48679 |  2 |  truhd | eng
 62555 |    48679 |  5 |    ac3 | eng

So, the question is, when selecting the audio track, how do I get the one with the best quality — or, more specifically, the one with the codec I want when multiple are available?

Normally if I was just doing a straight grab, without wanting too much complexity, I’d simply select the first English audio track and assume that’s the correct one. A safe assumption, but this is me.

As far as a dataset goes, I’ve got I think about 100 Blu-rays, and this query I’m pulling on all the audio tracks of all the Blu-rays, and I’ve only got 14 title tracks where they are “out of order.” So, very small chance of it happening.

Anyway………. this post is all about how to do a query so that I can get those in the preferred order I want.

And here’s the answer — in your ORDER BY clause, you add one for each format, and then descend based on whether that matches or not.

Writing the query will probably make more sense:

SELECT id, track_id, ix, format, langcode FROM audio WHERE track_id = 46879  AND langcode = ‘eng’ AND active = 1 ORDER BY format = ‘lpcm’ DESC, format = ‘truhd’ DESC, format = ‘dtshd-ma’ DESC, format = ‘dtshd’ DESC, format = ‘dts’ DESC, format = ‘ac3’ DESC, ix;

Doing an order by on a match of a column will return a 1 or 0 (true or false). By default it’s going to do it ascending (ASC), which would return the trues second (1 > 0), so flip it around to DESC.

Now if I ran that same query with the track id in my first example, here’s the new result set:

    id | track_id | ix | format | langcode 
 62552 |    48679 |  2 |  truhd | eng
 62551 |    48679 |  1 |    ac3 | eng
 62555 |    48679 |  5 |    ac3 | eng

You can see that the truhd one got bumped to the top!

Here’s a second example, where even though the audio track index is already in the order I prefer, I’m going to switch around my preferences for sake of this documentation.

Here’s the original result set:

    id | track_id | ix |  format  | langcode 
 60444 |    46895 |  1 |    truhd | eng
 60445 |    46895 |  2 | dtshd-ma | eng
 60446 |    46895 |  3 |      ac3 | eng
 60451 |    46895 |  8 |      ac3 | eng

And this time I want to prioritize the DTS-HD audio codecs over Dolby’s.

The new query, where I just change the preference order:

SELECT id, track_id, ix, format, langcode FROM audio WHERE track_id = 46895 AND langcode = ‘eng’ AND active = 1 ORDER BY format = ‘lpcm’ DESC, format = ‘dtshd-ma’ DESC, format = ‘dtshd’ DESC, format = ‘truhd’ DESC, format = ‘dts’ DESC, format = ‘ac3’ DESC, ix;

And here’s the new result set:

    id | track_id | ix |  format  | langcode 
 60445 |    46895 |  2 | dtshd-ma | eng
 60444 |    46895 |  1 |    truhd | eng
 60446 |    46895 |  3 |      ac3 | eng
 60451 |    46895 |  8 |      ac3 | eng

There is another way to do it, which is using CASE statements inside your query. That’s what I do on DVDs where there’s only DTS or Dolby Digital. That’s another story, though.

So, now when I rip and remux my Blu-rays, I can keep only the audio tracks with preferred codecs that I like.

Oh, and if you want to see my apps that I use to copy my Blu-rays, it’s in my bluray_info github repo, the bluray_copy app, which I also wrote.

The irony of all of this is my audio receiver only supports 5.1 channels. But I can’t imagine how that’s relevant. ;) Enjoy your audio!

standardizing on booleans in mysql

Okay, so here’s a question I would normally pose to a mailing list, but since my email setups are so jacked up at the moment, there really isn’t a good way I could subscribe and ask one, so I’ll just post this to my blog instead and hope for some input.  Not that I don’t have anything against mailing lists, mind you, it’s just that I don’t like setting up an email account and subscribing when I post maybe three times a year, and lurk and read the rest of the time.

Anyway.  At work, I was looking at cleaning up one database table of a project I’m working on, and I noticed that we have three ways that we are storing boolean values in the table:

  • unsigned tinyint, which presumably would only be set to 0 or 1
  • char(1), which also should be set to 0 or 1
  • enum(‘y’,n’)

I, personally, always prefer the tinyint route.  Not really for a technical reason so much as a historical one … it’s just kind of the first one I picked.   What I would really like is if MySQL had a *real* boolean type field similar to postgresql, where the values can be TRUE, FALSE, ‘t’ or ‘f’.  It makes things so much easier.

MySQL will accept BOOL as a column type when creating a column, but it’s implementation is a bit jacked in my opinion.  It creates an unsigned tinyint column, with null attributes.  That just gives a huge range of possible options, and doesn’t really come that close to a binary option set at all.

mysql> create table test (steve bool);
Query OK, 0 rows affected (0.00 sec)

mysql> desc test;
| Field | Type       | Null | Key | Default | Extra |
| steve | tinyint(1) | YES  |     | NULL    |       |
1 row in set (0.00 sec)

I did a bit of research, since enum seems now like the most reasonable option — it limits you to a strict sub set of options.  The only question I have is, how well would that index?  Would it be faster scanning the table for enums or integers?  That’s where I’m not sure.  It turns out that an enum that stores up to 255 values will only use 1 byte (assuming I’m intrepreting this MySQL reference book correctly).  A tiny integer uses the same size.  So it seems like they should both be pretty optimal, but I dunno.

Any thoughts?

mysql ordering by string with possible blank entries

I just found a workaround to something I’ve always preferred to do in SQL. I’m using MySQL 5 at work, and I had a query where I would order the entries by a column that is a varchar. Since there was the possibility for this column to be blank, it would display all rows with a blank entry first, and then alphabetically from there.

So, the order would be something like: ‘ ‘, ‘ ‘, ‘1’, ‘2’, ‘3’, ‘A’, ‘B’, ‘C’.

What I really wanted was to display the blank ones last, since I wasn’t interested in those. I poked through the string functions available to see if I could conjure up a hack, and ASCII works great, as it fetches the ASCII numeral of the first character in the string. And, if the string is empty, it will return a zero. And that’s all I needed, was a binary flag to order by first.

Here’s a sample query then:

SELECT string FROM table ORDER BY ! ASCII(string), string;

And the result would be: ‘1’, ‘2’, ‘3’, ‘A’, ‘B’, ‘C’, ‘ ‘, ‘ ‘.

Perfect. :)

postgres and mysql comparison paper

I’ve been job hunting, and while my dream job would be somewhere that uses PostgreSQL, I am having an extremely hard time finding anyone that uses it. So, I think my chances might be better actually getting a company to convert to using it instead. In doing that, I’ve started outlining a draft of a paper that I can present to both lead programmers, database administrators, and management on the pros of using PostgreSQL over MySQL. If anyone has some ideas that I could add in, I would appreciate it.

Here’s the general principles I already plan on covering: foreign key support, data types, transactions, shell interface, ANSI SQL support, table types, general features, history, licensing, abstraction layers (using PHP).

Also, and I don’t mean to sound like I’m spreading FUD, but it occurred to me this morning that I’ve never heard anyone say that MySQL is better than PostgreSQL.

Anyway, ideas welcome. I’ll post my progress as I get the paper put together. This is something I’ve been meaning to do for a long time.

prepared statements and stored procedures

I’m still working on cleaning up the import scripts for GPNL, and I’m going to have to start using PHP’s PDO database layer to connect to an SQLite3 database at one point.

I haven’t used it yet, but I had heard it was coming in PHP 5 for a while. Personally, I’ve always used PEAR::DB and was quite happy with that.

I’m still not sold on using the new layer anyway, but I figured I’d do some reading while I am getting ready to use it in this very small instance that I’m implementing.

On the docs page, I found a great summary of why prepared statements and stored procedures are handy and helpful. In short: they save you time for queries you have to repeat a lot, by pre-compiling the preparation that is common to all the queries, so that the database is really only processing the new data, and thus using less resources.

Prepared statements I haven’t played with much before until a few weeks ago, but I’ve slowly started using them in my import scripts. Performance-wise, I’ve only seen about a 15 to 20 percent speed increase. The thing I like the most about them, though, is that I don’t have to escape my strings anymore. That’s a nice little advantage I can live with.

Anyway,’s PDO documentation page has a nice writeup as well, and instead of trying to summarize it myself any more, I’ll just quote it verbatim:

Many of the more mature databases support the concept of prepared statements. What are they? You can think of them as a kind of compiled template for the SQL that you want to run, that can be customized using variable parameters. Prepared statements offer two major benefits:

  • The query only needs to be parsed (or prepared) once, but can be executed multiple times with the same or different parameters. When the query is prepared, the database will analyze, compile and optimize it’s plan for executing the query. For complex queries this process can take up enough time that it will noticeably slow down your application if you need to repeat the same query many times with different parameters. By using a prepared statement you avoid repeating the analyze/compile/optimize cycle. In short, prepared statements use fewer resources and thus run faster.
  • The parameters to prepared statements don’t need to be quoted; the driver handles it for you. If your application exclusively uses prepared statements, you can be sure that no SQL injection will occur. (However, if you’re still building up other parts of the query based on untrusted input, you’re still at risk).

nice mysql vs postgres summary

I was googling for a postgresql image I could use when I found this page, a nice short summary on the differences between MySQL and PostgreSQL with an emphasis on development policy.

I should mention that I’m linking to it because I agree with the author and also because I’m biased towards PostgreSQL. I prefer postgres not because of fanboyism, but because of experience and years of using both databases.

I was actually lucky enough to be trained to use PostgreSQL as the first database I ever used, and everything after that has never been able to duplicate its feature set. Since my first tech job, I’ve worked with Access, MySQL, SQL Server 2000 and SQLite.

Anyway, I love postgres. If you’ve never given it a chance, and you are looking for more advanced features, check it out. It’s all that and a box of girl scout cookies. I tell you what.

potential postgres schema for lds-scriptures 3.0

Well, that was fast. I looked at the schema last night for the MDP Scriptures project, and started cleaning it up, and it went really quickly. I’ve got a postgres dump all ready for review, and this is probably the configuration I’ll use for the next release.

The major change was that I added a new table for the chapters. It seems a little odd having the chapter number in a table all its own, but for a normalized database schema it makes perfect sense. The only thing I don’t like is now you have to INNER JOIN across four tables just to get all the information. Most of the time you won’t need anything but book + chapter + verse, which is only three tables. I did create a sample view called view_verses which pulls them all together so you can easily run a select on some format like ‘Gen 1:1’. The thing I don’t like is that even that view is CPU intensive, so I may have to look at changing some stuff around.

Aside from that basic view, I’ve decided I’m not going to put all my fun ideas for functions and views in the packaged release. Instead, I’ll just have them either as a separate release, or just post them on the website since I’m sure they will evolve.

One really cool thing about postgres that I love is that you can have overloaded functions. I started playing with them a while back on this database, and came up with some cool concepts. One idea I want to implement is being able to run a select statment using a between on two verses. An example query would be: “SELECT * FROM view_verses WHERE verse_id BETWEEN verse(‘Gen.’, 1, 5) AND verse(‘Genesis’, 12); where the verse() function would be overloaded to take between one and three arguments: book, chapter and verse.

It’s pretty cool all the stuff you can do with postgres, and that’s definately where I’ll be focusing my attention in getting the goodies done.

Anyway, if you want to download this test schema, its available here. As always, feedback is welcome.

gentoo packages that need lovin'

I mentioned not too long ago that I was working on getting portage details crammed into postgresql, and here is the end result.

GPNL is meant to be a QA tool for treecleaners to use, making it easier to find packages and ebuilds that … well, need some lovin.

Though it’s primarily intended for quality assurance, I’ve written the frontend to be hopefully pretty generic so anyone can browse the portage tree and just see some interesting statistics all around. There’s still a lot more to be done on the website, but I think it’s to a point right now where it’s at least ready for some public consumption.

One thing I’m excited about is setting up the advanced search page, where you’ll be able to run all kinds of funky queries. I’m going to be adding some more QA checks as well, once I get some time. Getting this much done though was quite a lot of work though, and I’m probably going to take a break and focus more on other things for a while. However, if anyone has some reasonable feature requests, I’m all ears.

Oh, also the source code for the database schema and the import scripts is available online. I’ll setup SVN access and some documentation on the db layout sometime soon, not to mention how to get it working (short howto: emerge php, pkgcore, postgresql and portage-utils).

Also, a huge shout out to marienz and ferringb who put together pkgcore and my little python scripts that made importing the data incredibly simple. Thanks, guys. :)

lds-scriptures postgresql update snapshot

I have not worked on my LDS Scriptures project in a very long time.  In fact, annual realeases are becoming an embarrassing reality.  However, I’ve been itching to update the project for a while now, since I’ve been learning quite a lot more about databases since my last release.

I had an interesting idea for doing some custom RSS feeds of pulling down scriptures (which I’ll get into at a later time), but I quickly realized the current db schema wasn’t quite up to par. I started poking around the postgres one, and it was much easier than I imagined to fix it up quite a bit.  I dropped all the tables and recreated them, this time with primary and foreign keys, and I got rid of some columns that were completely unnecessary.  I renamed some too, along with the other general cleanup.

I tarballed the snapshot once I was finished.  This isn’t what I’m going to release as the final 3.0 version, since I still need to add things like views and more indexes, but this is the direction I’m going to be heading.  You can download the snapshot tarball here.  Feedback is welcome, as always. :)

death to sql server (part 4)

I’ve written about this before, but to rehash … the date functions inside SQL server suck. What’s really weird is that there’s an undocumented way to retrieve out certain datetime formats, and even that is inconsistent in its numbering scheme.

The way to pull them out is by running “SELECT CONVERT(VARCHAR, GETDATE(), @x);” where @x is a positive integer. If you can find the right integer, you can save time and pull out something directly like ’11-10-2006′ as your variable.

One of the problems you’ll run into though is that you can’t just do 1 through $integer. Only some of them return something, and the ones that don’t just throw an SQL error, so you get to hunt down which integers return something.

Well, digging for them manually once is something I don’t want to repeat, so I wrote a query statement to pull out some of them. This could be a handy reference inside your database somewhere.

DECLARE @x int;
SET @x = 1;
WHILE @x 0 AND (@x 20 AND @x <25) OR (@x > 99 AND @x < 115) OR @x IN(126,130,131)) BEGIN
SELECT @x = (@x +1);