gentoo packages that need lovin'

I mentioned not too long ago that I was working on getting portage details crammed into postgresql, and here is the end result.

GPNL is meant to be a QA tool for treecleaners to use, making it easier to find packages and ebuilds that … well, need some lovin.

Though it’s primarily intended for quality assurance, I’ve written the frontend to be hopefully pretty generic so anyone can browse the portage tree and just see some interesting statistics all around. There’s still a lot more to be done on the website, but I think it’s to a point right now where it’s at least ready for some public consumption.

One thing I’m excited about is setting up the advanced search page, where you’ll be able to run all kinds of funky queries. I’m going to be adding some more QA checks as well, once I get some time. Getting this much done though was quite a lot of work though, and I’m probably going to take a break and focus more on other things for a while. However, if anyone has some reasonable feature requests, I’m all ears.

Oh, also the source code for the database schema and the import scripts is available online. I’ll setup SVN access and some documentation on the db layout sometime soon, not to mention how to get it working (short howto: emerge php, pkgcore, postgresql and portage-utils).

Also, a huge shout out to marienz and ferringb who put together pkgcore and my little python scripts that made importing the data incredibly simple. Thanks, guys. :)

lds-scriptures postgresql update snapshot

I have not worked on my LDS Scriptures project in a very long time.  In fact, annual realeases are becoming an embarrassing reality.  However, I’ve been itching to update the project for a while now, since I’ve been learning quite a lot more about databases since my last release.

I had an interesting idea for doing some custom RSS feeds of pulling down scriptures (which I’ll get into at a later time), but I quickly realized the current db schema wasn’t quite up to par. I started poking around the postgres one, and it was much easier than I imagined to fix it up quite a bit.  I dropped all the tables and recreated them, this time with primary and foreign keys, and I got rid of some columns that were completely unnecessary.  I renamed some too, along with the other general cleanup.

I tarballed the snapshot once I was finished.  This isn’t what I’m going to release as the final 3.0 version, since I still need to add things like views and more indexes, but this is the direction I’m going to be heading.  You can download the snapshot tarball here.  Feedback is welcome, as always. :)

death to sql server (part 4)

I’ve written about this before, but to rehash … the date functions inside SQL server suck. What’s really weird is that there’s an undocumented way to retrieve out certain datetime formats, and even that is inconsistent in its numbering scheme.

The way to pull them out is by running “SELECT CONVERT(VARCHAR, GETDATE(), @x);” where @x is a positive integer. If you can find the right integer, you can save time and pull out something directly like ’11-10-2006′ as your variable.

One of the problems you’ll run into though is that you can’t just do 1 through $integer. Only some of them return something, and the ones that don’t just throw an SQL error, so you get to hunt down which integers return something.

Well, digging for them manually once is something I don’t want to repeat, so I wrote a query statement to pull out some of them. This could be a handy reference inside your database somewhere.

DECLARE @x int;
SET @x = 1;
WHILE @x 0 AND (@x 20 AND @x <25) OR (@x > 99 AND @x < 115) OR @x IN(126,130,131)) BEGIN
SELECT @x = (@x +1);


postgresql functions

One thing I love about (advanced) databases is that you can write functions. They speed up the query time quite a bit, and you can do fun stuff like IF … ELSE … THEN statements.

In working on getting portage into postgres, part of the problem I’m trying to solve is find out where QA issues are so they can be fixed. Unfortunately, in the early stages of my little script, it always assumes that the everything is working correctly across the board, so I’ll write my queries assuming the foreign keys won’t break. In reality, that doesn’t happen, and I end up killing a transaction with hundreds of thousands of statements because there’s 25 queries that break.

So, I had to write a postgres function to do the checking for me. This is going to be absolutely boring to those db gurus out there, but this is still slightly new to me, so I’m really enjoying it.

use_id integer;
SELECT id FROM use WHERE name = $2 AND
(package IS NULL OR package = $3) LIMIT 1 INTO use_id;
INSERT INTO ebuild_use (ebuild, use) VALUES ($1, use_id);

A really simple function, I know. All it does is run a SELECT statement to see if my foreign key is going to break *before* running the INSERT statement. That way, I can continue on my happy way with my transaction, and at the same time, if I want to turn on ‘qa mode’ when inserting the data, I can check for a false return on the recordset, and know which ebuilds need attention.

Pretty cool stuff, I think. Also, for the record, pgaccess is a *great* little GUI tool to quickly and easily edit your functions.

portage in postgresql

Well, I’m bored, so I figured I’d spill the beans on a project I’ve been keeping under wraps for a while.

I’ve been working on getting everything about the portage tree into postgresql so you can run all kinds of queries. What kinds of queries? How many ebuilds use eclass ‘eutils’ and USE flag ‘alsa’ and are in ‘video’ herd and amd64 is masked but x86 isn’t. That kind. Funky ones. :)

I must say, I really love postgresql even though I haven’t been using it regularly for a long time, I’m quickly getting back into it. The simplicity, the standards, the power, the tools … postgres has it all. Ahh, fanboyism.

Anyway, getting the details of the ebuilds was made incredibly easy thanks to marienz and ferringb and their work on pkgcore (and a custom python script). After that, it was just a matter of parsing the information and setting up the schemas. My importer is written in PHP and the class to import / read the data is still in its slightly butt-ugly stage. It can use some cleaning up, for sure. The database layout is going to be where the real optimizations are though. I’m going to work on setting up some good views so it will be easy to query. Right now, here’s the list of tables I have setup: arch, category, ebuild, ebuild_arch, ebuild_eclass, ebuild_homepage, ebuild_license, ebuild_use, eclass, eclass_use, herd, license, package and use. All of them can already be populated by the scripts except for eclass_use and herd. I haven’t setup the dependency ones yet, though that’ll be pretty simple too.

So there’s my big announcement. Woots. I’m working on creating the SQL to import everything right now (which takes a long time), and once that’s done, I’ll throw up a db dump somewhere. There’s still lots to be done, like finishing the import scripts and setting up some webpages to browse the tree, but it shouldn’t be too hard. I’m definately over the worst of it.

sql vs. sql

Once again, UPHPU has had a minor stir about which database is faster / better / stronger, PostgreSQL or MySQL. All fanboyism aside, who really cares?

You want to know the way to *really* speed up your database? Normalization is probably going to be the largest factor. After that, use views, stored procedures, indexes, transactions and well-written queries, and your database is going to fly amazingly fast.

At work we have a large server we call “Zeus” because it is incredibly large. I won’t even go into specs because you wouldn’t believe me even if I told you. When I first started working here, the database running on it was incredibly slow. At first I blamed it all on the database software we are using (you can search my blog if you really wanna know which one it is. Hint: it’s neither of the two mentioned above), but as we cleaned up the databases and tables by removing columns that were complete cruft and then doing everything I mentioned above, this puppy flies. In fact, our “dev” database, which is running on nothing more than an Athlon XP 1800+ runs just as fast as our beast-monster does.

That’s how you get a fast database — doing things the right way. Who would have thought?

I have to apologize for the elitist feel of this post, but my point is this … the only magic bullet in improving performance is going to be quality code and design. Just replacing your database with something else isn’t going to make the speed fairy sprinkle your application with love.

death to sql server (part 3)

It’s been a good while that I’ve ranted and raved against MS SQL Server. I’ve actually found a few bugs since then, I just haven’t documented them.

My beef this time around is that SQL Server 2000 doesn’t have a TRIM() function. What’s even more strange, is that they do have RTRIM and LTRIM functions. So, if you want to trim something you just run “UPDATE table SET foo = RTRIM(LTRIM(foo));” Brilliant!

Just another reason to love open source software — you get incremental bug fixes and feature upgrades. This would have been fixed in any other database a long time ago, but since new features == new versions released every five years with Microsoft, you’re just plain stuck with what you paid for the first time around.

Yes, I know I’m trolling but really … what defense do they have? I know I can create a UDF to do the work for me, but that’s besides the point that I’m right and prefer laziness. Plus, it’s one thing to say “MS suxxors” and another to point out why their software is so shoddy. And if there’s one thing I enjoy doing, it’s educating people on why they’re wrong and Gentoo is the extreme answer to everything.

Now go emerge postgres, ya jerk.