High Performance MySQL 307
High Performance MySQL | |
author | Zawodny & Balling |
pages | 294 |
publisher | O'Reilly |
rating | Excellent |
reviewer | Steve Suehring |
ISBN | 0596003064 |
summary | Optimization, Backups, Replication, Load Balancing & More |
With a title like High Performance MySQL, the reader should not expect assistance with tasks such as installation or the basics of SQL. The authors acknowledge that the book isn't meant for the MySQL beginner. Even so, the book does spend a little time going over a refresher of MySQL. If the reader is looking for a book on the basics of MySQL, though, I'd recommend picking up a different title. I might be so brazen as to suggest looking at the book on MySQL that I wrote a couple years ago, but I'll leave it for the reader to find an appropriate title for their level of knowledge with MySQL.
The book is broken out into ten chapters, beginning with a chapter on the basics of MySQL as seen from someone with some level of MySQL experience. (And since I am someone with some level of MySQL knowledge, this chapter didn't seem to be too painful of a read.) That statement begs for a further explanation. I find it painful to read books that claim to have advanced topics but find it necessary to spend (or waste) time by giving yet another explanation of how to type ./configure; make; make install in the beginning chapter. In this regard, I find High Performance MySQL to be quite good at providing the appropriate level of explanation for the audience.
The first pages alone are a great starting point for those who are wondering about the performance as it relates to the different versions of MySQL. A discussion of the binary versions of MySQL versus the source code versions gives the reader a good understanding of the implications of this important decision. Choosing correctly between the versions of MySQL can help to maximize performance.
Though only the first chapter is titled "Back to Basics," most chapters include some basic or background information pertinent to that chapter. For example, both the query and index performance-related chapters include sections with the word 'Basic' in their titles. These two chapters, "Index Performance" and "Query Performance," cover topics often misunderstood when considering performance on a database. Naturally, any database administrator worth their pay (insert your own joke here) will tell you that an index can help to improve performance. But High Performance MySQL takes it a step further by diving into why you might use a particular index given your database needs. This type of practical information is common throughout the book. Rather than just give the reader a listing of the indexes (which are mostly common among RDBMS anyway), the authors choose to tell you why you might choose one method over another.
Other chapters within High Performance MySQL include chapters on server performance tuning, replication, load balancing, and high-availability, backup and recovery, and security. I've followed MySQL forums and mailing lists over the years and some of the more common questions relate to MySQL security. The chapter on security gives the reader a great explanation of the MySQL GRANT tables and includes a sidebar on the significance of 'localhost' in MySQL which nearly always seems to be a source of confusion, though maybe not for the reader of this particular book. Like other chapters, the security chapter includes tips for the underlying environment upon which MySQL operates. For example, regardless of how good the MySQL database permissions are configured, if another piece of software on the server isn't kept up to date then the entire server, including the MySQL data, could be compromised.
Looking back at the book, I might have laid the chapters out in a different order or provided more structure through sections of the book. However, this can be dismissed as personal opinion and is non-substantive anyway. I believe High Performance MySQL is an excellent choice for the MySQL administrator or developer not only looking to improve their database performance but also to improve his knowledge of MySQL.
You can purchase High Performance MySQL from bn.com. (And the book that Suehring alludes to is the MySQL Bible . Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.
No Thanks (Score:4, Funny)
Re:No Thanks (Score:3, Insightful)
Apples and oranges (Score:5, Insightful)
High Performance and Incredible Data Integrity? You are comparing apples and oranges. Make up your mind. It's either "High Performance with MySQL" or "Incredible Data Integrity with PostgreSQL." You can't have your cake and eat it too. If you need real ACID (Atomicity, Consistency, Isolation, and Durability) then by all means have it, but realize that it isn't free. Please read about the relational model [wikipedia.org] and ACID [wikipedia.org]. Read about the set theory [wikipedia.org] and predicate calculus [wikipedia.org], about tuple calculus [wikipedia.org] and relational algebra [wikipedia.org]. This is complicated stuff which it is not cheap algorithmically. Most of people don't need ACID and should not be penalized by the algorithms that implement it. Those who need it will have to pay with performance, because there is only this much you can do in one cycle. A pure ACID relational model is not for everyone, some people will only need a persistent object store instead of a relational database. The most important thing to realize is that databases are not magic. They implement complicated algorithms in code that have to be run by the CPU and by definition cannot be a one-size-fits-all solution. So please don't oversimplify it by saying "I want everything with no paying for anything, and also a pony." Use the right tool for the right job.
Re:Apples and oranges (Score:5, Insightful)
In fact, it is very difficult to understand where MySQL fits in today's technological environment. On one hand you have truly powerful database servers that guarantee data safety. On the other hand you have simple embedded databases that run much faster than MySQL. Where does this leave MySQL? With nothing more than a lot of "it's Open Source!" momentum.
High Performance and Incredible Data Integrity?
I never said that. Please don't act like I did.
Re:Apples and oranges (Score:2, Interesting)
Oh? Read some of the other comments on this article and see :)
I agree that MySQL is overused. It's the M in LAMP, and so people assume its what they have to use for everything.
But what are these alternatives you mention that are so much better suited for every possible scenario? For me, MySQL is the perfect mix of speed and features. Give me some leads on other databases I should consider replacing it with,
Re:Apples and oranges (Score:4, Informative)
There is, of course, no tool that meets the needs of every possible scenario. Instead, there are tools that meet the needs of common scenarios. Most of what MySQL is being (improperly) used for could be easily replaced with PostgreSQL. Postgres offer similar performance, but takes a minor hit for data integrity. As a bonus, you gain sub-selects, stored procedures, SQL-92 compliance, and other features of a "real" SQL database.
MySQL is quite decent for something like a blog, but why even bother with a complete server in those cases? An embedded database such as Berkeley DB [sleepycat.com] or HSQL [sourceforge.net] (formerly Hypersonic) would provide better performance and would get rid of security issues inherent in running a complete database server.
In still other instances, SQL databases are misused for large object data storage. In many of these instances, an Object Database such as ObjectStore [objectstore.com] (or your favorite open source choice of the 1000+ options) will provide better performance, without sacrificing much in the way of database management. (Standard database management tools are usually insufficient for dealing with databases containing large amounts of LOBs.)
Basically, the choice in database and database technology should be carefully weighed against the application instead of saying "I know SQL and MySQL is 1337!"
Some other database options include:
DaffodilDB [daffodildb.com]
SAP DB [sapdb.org]
FireBird DB [sourceforge.net]
Cloudscape [cloudscape.com] (Soon to be open source)
Xindice [apache.org] (XML Database)
ObjectDB [objectdb.com]
DB4O [db4o.com]
Prevayler [prevayler.org]
Re:Apples and oranges (Score:4, Insightful)
Actually, MySQL has been moving in that direction for quite some time now. MySQL 4.1 has subqueries [mysql.com] and MySQL 5.0 has stored procedures [mysql.com].
Of course, MySQL still doesn't quite have the functionality of Postgress, especially in the data integrity department, so there's certainly truth to your claims. However, my point is that it's not standing still. It's certainly improving, so we can't just keep pointing out its past mistakes.
Re:Apples and oranges (Score:3, Insightful)
But that's fine. Not everyone needs all of the current features of PostgreSQL let alone all of the potential new ones.
HOWEVER, MySQL needs to work on data integrity post haste. "my new purple crayon" being logically equal to 0 is not a missing feature; it's a data loss bug. No error. Not even a warning. A bug in software that tries
Re:Apples and oranges (Score:5, Insightful)
I've got a few million reports that basically look like this: some header information of the form "variable=value" (about a dozen possible values of variable), and then one or more sections. Each section has a header in the same format as the report header, and from 0 to 5000 entries of the form "item:status:file", where name is one of about 5000 items, status is one of 5 status values, and file is the full path of a file containing information about the item.
I want these in a database. So, I've got a table that represents reports, which gets the information from the header. Another table represents report sections, and has the information from the section headers, and references the report table. A third table represents the "item:status:file" lines. I've actually got a bunch more tables, because the database is in third normal form, but the above is the gist of it.
With MySQL, I write the straightforward code to parse reports and insert into the database. It takes about a day to import all my reports, and querying them runs at decent speed.
With PostgreSQL, it starts out about an order of magnitude slower at importing, and after four days, when it is isn't anywhere near done, it is running even slower.
PostgreSQL does seem to query a bit faster than MySQL (but see below), but that slowness in getting the data in is a killer.
As far as queries go, PostgreSQL seems fast, if you've done "VACUUM ANALYZE" on the database. If you don't do that, it is incredibly slow. For example, if I have a simple table, indexed on an integer field, and do a select that was "WHERE that_field = 123", it is fast, and uses the index, but if I do "WHERE that_field IN (123, 456)", it doesn't use the index, unless you analyzed the table--it scans the whole freaking table. MySQL is smart enough to actually use an index without you having to run a special command first. Geesh!
So, here's what it comes down to, and why sooooo many people choose MySQL over PostgreSQL: PostgreSQL seems to require that you know a fair amount about database stuff to actually make it work well, even if you aren't trying to do fancy stuff. MySQL, when you are doing simple stuff, just works, and works well.
Oh, one more thing. When I was importing my few million records into MySQL, it barely loaded down my machine. With PostgreSQL, pretty early on, the disk light came on and stayed on, and doing things on the machine became painful. If that were a shared machine, say, a web server with shared hosting, replacing the MySQL in LAMP with PostgreSQL would be a disaster.
Re:Apples and oranges (Score:5, Informative)
There is an option to change the transactionality of standalone commands, too, so that they implicitly start a transaction that doesn't end until you COMMIT it.
VACUUM is something of a contentious point among PostgreSQL users, certainly. On the one hand, I can't think of an algorithmic way to automatically schedule statistic updates, and apparently tracking them in real time causes more of a performance hit than is needed. On the other, having to set up an external task (either a cron task or vacuumd) rather than being able to configure postmaster to vacuum itself on a schedule or other parameters is kind of a pain.
PostgreSQL's default memory cache size is exceedingly small -- this is an issue with the default configuration that can lead to excessive disk reads and writes.
In short, pgsql does require somewhat more effort to initially configure and maintain. On the other hand, for a lot of people it's very much worth that effort.
Re:Apples and oranges (Score:4, Insightful)
Actually, I did have the complete processing for each report in a begin/commit pair. That was one of the reasons I was experimenting with PostgreSQL...with MySQL there was the annoyance of having to make sure the whole report was going to be OK before inserting any of it, so I couldn't just add to each table as I processed the report, because I might find out near the end that the report is bogus and want to discard it.
In short, pgsql does require somewhat more effort to initially configure and maintain. On the other hand, for a lot of people it's very much worth that effort
Right...and for a lot of people, they need something the works in the common case without that effort, and so there's always going to be an important place for MySQL.
What I'd really like to see is a good document included with PostgreSQL that says "So..you've got a big application that just works fine in MySQL, but you want to convert? Here's what you do" that covers all these things.
Re:Apples and oranges (Score:3, Insightful)
That extra effort to configure and maintain is why we went with MySQL. It's hella fast (and yes, that *is* a technical term), isn't plagued by the insecurities of MS SQL Server, and it's a breeze to install and configure -- don't even get me started on PostgreSQL's requirement that it be run from a non-Administrator account under Windows when EVERYONE with a default installation of Windows from NT 4.0 through XP is runn
Re:Apples and oranges (Score:2)
Plus MySQL isn't free and Postgresql is
Re:Apples and oranges (Score:2)
- It needs to run on windows (95-XP)
- It must be allowed to be distributed with closed source apps.
- It should be automatically installed and running when the user
clicks 'OK' a few times in the installer that also installs the closed source app.
For the commercial licence fee you can do this with MySQL.
We used Paradox and Access before MySQL, and compared to those MySQL is a blessin
Re:Apples and oranges (Score:3, Insightful)
It fits your requirements perfectly without requiring the commercial license. Of course the installer bit depends also on your application, but it shouldn't be too hard to do. You may want to check out the pginstaller project here: http://pgfoundry.org/projects/pginstaller/ [pgfoundry.org].
No brainer. (Score:3, Informative)
Re:Apples and oranges (Score:3, Informative)
Well, HSQLDB is embedded, but only inside a java application
True, but that's why I said I'd need to know more about his application.
It persists data in a flat file that it loads on start.
Not quite. HSQLDB uses an SQL script of the transactions to recreate the database. However, this script is significantly reduced in size if you use CACHED tables.
If you've got more than 1000 records
Again, depends on the data and the application. 1000 records is usually not a whole lot to ca
Re:Apples and oranges (Score:4, Interesting)
I prefer to FAPP
(FreeBSD, Apache, PostgreSQL & PHP)
but that makes me sound a bit of a wanker
The thing about MYSQL and speed is that "it's really quick at returning rows" isn't the only metric of speed.
Let's see how fast you can work out how to do an INTERSECT!
or how about a huge join with LIMIT 5, mysql (and postgres) do the whole join and then return the 5 records. With Postgres you can utilize stored procedures to do the join *after* the required records have been worked out and return the rows via a stored procedure. I got a 3 minute query down to 3 seconds like that !
You may as well turn your LAMPs off, they are dim.
Re:Apples and oranges (Score:2)
Tripping on ACID (Score:4, Insightful)
As for the parent post, he's just bought into the illusion that database gurus think at a transcendental level. It is a common ailment. Object gurus, business gurus, religious gurus, political gurus are all treading on the same clouds. It is a common ailment that usually cures itself with time.
Re:Tripping on ACID (Score:4, Informative)
Personally, I don't just stop with ACID, I make sure I have an audit trail from start to finish and I make a barage of tests that show that the audit trails add up. Even with ACID I never buy the illusion that there will never be a problem with the transaction.
ACID may be about data integrity. My point is that people who have an ACID database tend to stop thinking about data integrity.
Sorry, but I don't buy into the illusion that anyone is creating perfect programs. I have had to deal with people who thought they wrote perfect programs. My experience is that the programs written by buzz word spouting gurus is generally very poor quality.
Re:Apples and oranges (Score:4, Insightful)
I would never use MySQL for financial transactions but for non-monetary usage it's a good general purpose database as long as your don't require complex business rules filled with lots of cascading actions.
Re:Apples and oranges (Score:2, Informative)
I guess it's kind of humorous then that PostgreSQL beats the pants off of MySQL in any kind of moderate DB usage. MySQL can do a damn fine job for simple INSERTs and SELECTs but throw a hundred users at it or a few WHERE or ORDER BY clauses (or all of the above) and MySQL shows its true colours: Made by Fischer-Price. Throw in complex queries including subselects or try using views and... oh right, I forgot... It's not got any kind of real relational power behind it... my mistake!
Re:Apples and oranges (Score:4, Informative)
MySQL's InnoDB offers the same level of data integrity as PgSQL does, and about the same performance. Where it wins is in the flexibility to choose table-by-table whether you want InnoDB, MyISAM, or the new NDBCLUSTER. There's also the universal availability and familiarity when it comes to the mysql protocol and libraries.
As far as performance.. you might want to look at numbers on well designed threading systems such as Solaris or Linux 2.6. Most of PostgreSQL's gains are erased by MySQL 4.1's excellent code when it is combined with those OS's proper threading.
The only place I see PgSQL winning the performance war is the query optimizer.
PgSQL will be doomed to a BSDesque life. Touted by too many people with no social skills, nobody will ever be able to see how good it is, because they just get shouted down when asking questions.
Re:Apples and oranges (Score:3, Informative)
Re:Apples and oranges (Score:3, Informative)
Last time I checked, Dark Age of Camelot usually has more than a few hundred users, and they are using MySQL. So much for your amusing theory.
Re:Apples and oranges (Score:3, Informative)
Re:Apples and oranges (Score:3, Informative)
I'm sure Postgre is great and all, but the evidence does seem to indicate that MySQL is suited for at least some intense applications.
Re:Apples and oranges (Score:3, Informative)
It probably says more about you than about me that you think that that list of "gotchas" (many of which are no longer valid, or are otherwise insignificant) is somehow difficult to keep in mind. None of the professional programmers I work with have any difficulty dealing with the quirks of the MySQL database environment, especially considering that that list of gotchas is tiny compared to the list of d
Re:Apples and oranges (Score:4, Informative)
A recent example involves BLOBs and MySQL. The basic feature request is to be able to attach various binary files to other data. One of the ways to do that is with a BLOB. So I create a simple test so we can compare this approach with various other approaches.
Code up the test, and attach a file. Pull the file back out, and try and read it back in. Doesn't work, the file is corrupted. At first I figure I'm interfacing with the database wrong, since I've never used BLOBs before. So I muck around with the API some, still no go.
Then I realize I should check how much data is going in and how much is going out. Turns out I'm trying to attach a 70KB file, and I'm getting 64KB back. A little bit of research tells me that MySQL BLOB types only support up to 64KB. Oops. Change to a MEDIUMBLOB (I think) and then it works.
Turns out instead of flagging it as an error to insert more data into the field than it can contain, MySQL just trunucates it. (So if you try and do something like insert "MYSQL SUCKS" into a CHAR(5) column, it'll say it successfully inserted one row. A select will then get "MYSQL" back.)
Some page out there has a really nice list of things that MySQL will do with bad data. Besides trunucating values, it has some interesting ways of handling bad numeric values.
Re:Apples and oranges (Score:2, Insightful)
Re:No Thanks (Score:2, Funny)
say what you will but for our companies needs I guarantee you that MySQL is more than suffecient...we don't have a lot of changes on the db....plain and simple...a waste of money!..
But then you propbably know more about my companies db than I do so I will shut up now.
Re:No Thanks (Score:3, Insightful)
Give me a heart attack, why don't you?!
Re:No Thanks (Score:2, Insightful)
MySQL - I smell flames? (Score:5, Funny)
MySQL is better than PostgreSQL
Redhat is better than Debian
Windows XP is better than MAC OS X
Bush is better than Kerry
Conservatives are beter than Liberals
Americans are better than Europeans
etc. etc. etc.
Re:MySQL - I smell flames? (Score:5, Funny)
Emacs is better than vi
Abortion is a woman's right
Gun control means using both hands
Christianity sucks, the bible is just a book
Re:MySQL - I smell flames? (Score:2)
Abortion is a woman's right Gun control means using both hands
Those two statements don't fit the initial set while the other do; they're conservative/reactionary while the pattern is "cool lib/alt vs. ignorant blind conservatism."
P.S. I'm all for the initial set... actually that's why I noticed what's out of tune ;-) ... and I own a mac... therefore I'm kewel ;-) and know better... ;-)
Oh I've heard of this (Score:5, Funny)
Sometimes MySQL indexes are bad... (Score:2, Interesting)
I hope there's something about that in the book...but then again, this is 2004. Who still reads (non-electronic) books about technical topics?
Re:Sometimes MySQL indexes are bad... (Score:5, Insightful)
Re:Sometimes MySQL indexes are bad... (Score:2)
Of course indexes can hurt performance. If not, you would just make every field in a table an index.
Hopefully this book is made for people who know this already...
Re:Sometimes MySQL indexes are bad... (Score:3, Informative)
Because using an index requires random data access which is more expensive than sequential access. There is a point where doing an index scan takes more time than just going over all the records in a table (example: small tables, condition matches many rows, etc). Any sane query planner should/will choose the faster method.
It Sounds Pretty Basic (Score:3, Insightful)
The important thing is, does it tell you how to set up and manage highly distributed system managing gigabytes of data under high transaction volumes?
If it doesn't, and just discusses the general issues to consider under such conditions, it is worthless for those that are already professionals. Professionals know how to consider these things, they just want concrete examples so they don't have to spend months of reading and tinkering to figure it out on their own.
Unfortunately, reading the review above doesn't even let me know whether or not the book has anything in it at a level that I might want. However, if I was writing the review I'd probably order the paragraphs differently and use some different words. That might just be personal preference.... :p
Re:It Sounds Pretty Basic (Score:3, Interesting)
The article did mention that it covers some related issues: "Other chapters within High Performance MySQL include chapters on server performance tuning, replication, load balancing, and high-availability, backup and recovery, and security."
But I agree, the review doesn't give any indication whether the book spends all its time repeating whats already ava
Re:It Sounds Pretty Basic (Score:5, Interesting)
My background is in Electrical Engineering, but as part of my duty I am in charge of maintaining our company's data base. I'm probably the most qualified person here to do it too; but my knowledge of databases doesn't extend much farther than select/insert/update. Sound to me like this book could help me learn a thing or two about what I'm "administrating".
Re:It Sounds Pretty Basic (Score:2, Funny)
I believe you missed something. The book is about MySQL.
Re:It Sounds Pretty Basic (Score:2, Interesting)
More importantly, what self-respecting professional would be caught dead using MySQL?
No data integrity
Completely non-standard SQL
No extensibility in the engine (functions, stored procedures, etc.)
No subselects
Weird handling of '0' vs. null
etc.
Back when MySQL was created, it was a great option for when you needed a simple database to
Re:It Sounds Pretty Basic (Score:2)
Sometimes the lack of caring about data integrity (and therefore using less locks on a high
Re:It Sounds Pretty Basic (Score:4, Informative)
That is huge. I used MySQL about 6 months ago (though it wasn't the most recent version.) I was very disappointed. I use subqueries extensively. Since these were not available, I ended up with messy and slow joins.
The non-standard SQL isn't too bad to figure out as long as you have google. There needs to be a SQL translator java applet or something.
In addition, the version that I used didn't support views. Is this fixed now?
Regardless, I gave it a chance. The next time I need database services for personal use I'm going with PostgreSQL.
Re:It Sounds Pretty Basic (Score:3, Informative)
MySQL's SQL is quite standard, and while there are differences [mysql.com], they are either the lack of a small number of features (like sub-selects, which you go into below) or very baroque trade-offs, which most users will not encounter.
I suspect you're instead running into assumptions about what is and is not "standard" based on what you've used in the past. MySQL follows the ANSI SQL standard [mysql.com] as closely as all of the other databases I've used, having its own small quirks and LOT
Re:It Sounds Pretty Basic (Score:3, Informative)
That is to say, if you:
Then that's the same thing as:
The only difference being that for sub-selects, since it's done server-side, you can optimize by evlauating the sub-select lazily. Some databases that support sub-selects do not do
Re:It Sounds Pretty Basic (Score:5, Informative)
No data integrity
MySQL has a modular design, allowing you to use any of several database backends, each with different design priorities. If data integrity is a priority for you, RTFM and pick the right backend.
Completely non-standard SQL
This is true of pretty much all DBMS's. MySQL's development cycle is quite delightfully quick, meaning that if people want certain features added, it doesn't take long to get them added.
No extensibility in the engine (functions, stored procedures, etc.)
Ummm... MySQL has those.
No subselects
Yes it does, as of version 4.1. It's not their fault if you're more than a full release behind.
Weird handling of '0' vs. null
The special handling of null forces you to write slightly less braindead code. This kind of handholding is becoming popular in the design of modern programming languages like Java and C#, with the recognition that the later in your development cycle you find a bug, the more it costs to fix it. Of course, there are still times when the programmer needs very explicit control, like when writing an OS kernel or a device driver. If you're writing an OS kernel or a device driver in MySQL, you've got bigger problems.
Re:It Sounds Pretty Basic (Score:4, Informative)
As I understand it, InnoDB (which I assume is what you're referring to) costs money to acquire and as such is NOT part of the standard database engine. It's really just another database engine using MySQL as a facade.
This is true of pretty much all DBMS's. MySQL's development cycle is quite delightfully quick, meaning that if people want certain features added, it doesn't take long to get them added.
MySQL is the ONLY database that won't execute: Instead, you have to execute: Do you have ANY idea how much that screws up those of us in the database tools business?
No extensibility in the engine (functions, stored procedures, etc.)
Ummm... MySQL has those.
In 5.0 Alpha. i.e. Not here yet.
No subselects
Yes it does, as of version 4.1. It's not their fault if you're more than a full release behind
Again, 4.1 is a non-production release. The latest stable is 4.0.
The special handling of null forces you to write slightly less braindead code.
So you're saying that inserting zeros instead of nulls (or no value at all) for auto-increment columns leads to less brain-dead code? All it does for me is make my database tools that much more complicated to code.
Re:It Sounds Pretty Basic (Score:3, Interesting)
Do they actually have constraints yet? For all the RDBMS apps that I can think of, I wouldn't go without constraints. Last I heard, MySQL thinks 2004-02-31 is a date.
Also, I'm a little wary of the mix-and-match feature policy, perhaps because I don't understand it. Are there any features not available in the InnoDB tables? Can I do master-master replication and everything, all with transactions? I would be really troubled if
Re:It Sounds Pretty Basic (Score:3, Insightful)
>The special handling of null forces you to write slightly less braindead code.
Way to spin it, dude. Here's another way to look at it: The retardo handling of null in MySQL makes your work as a programmer harder. It's just a pissing match argument: only a wuss needs those training-wheels features. Real Programmers never make mistakes.
create table foo (bar integer not null, biz integer not null);
insert into foo (bar) values (1);
select * from foo;
+-----+-----+
| bar |
Re:It Sounds Pretty Basic (Score:3, Insightful)
It is generally faster than Postgres,
It handles large datatypes more easily than Postgres (although Postgres, while odd, has some more flexibility)
Postgres has an odd 8k limit per row (probably fixed by now though)
Supports selects across databases.
It has good replication support (with 4.x an
Re:It Sounds Pretty Basic (Score:3, Insightful)
When PostreSQL came on the scene I attempted to use it, but there was little in the way of documentation. I went the MySQL route for my little database projects because it had better docs at the time. I never decided MySQL was "1337" and that all other databases were "evil".
It had good docs, was cheap, and was good enough to get the job done.
M
Re:It Sounds Pretty Basic (Score:2)
I have (I think) put the following things in cron:
Vacuum full in the weekend.
Analyze every night.
Reindex once a week.
Cluster every night.
And I have the vacuum daemon Running.
(And a backup every night)
Are there any things I should do manually?
I can't find any good postgres books about these issues. Ok, the newsgroups are ok, but a book would be still welcomed.
(I have two postgresql books, but both of them don't really tell things I
Re:It Sounds Pretty Basic (Score:2)
Of course, some of us never need to vacuum because we never delete any data.
Re:It Sounds Pretty Basic (Score:5, Informative)
Up until I picked up this book, I was not comfortable with using MySQL for the extremely high performance apps I have to write. Yes, they are web-based (meaning they fit into the MySQL appropriateness factor) but things like backups and scalability really scared me off of it.
This book talks about indexes but doesn't focus on them as the magic bullet. Rather it is very specific with lots of need to know items of importance.
Some things that it covers are:
1. How to setup replication with different architectures including master/slave, master/slave/slave (another level), replication rings, dual master replication with slaves and more.
2. RAID (software vs. hardware), RAID types, IDE vs. SCSI. I know. I'm sure you've thought of these things already but most people will find something new here because there is a lot on it. It is quite dense.
3. Selection of an OS. Seriously. It actually compares threading models on OSes compared to performance with MySQL. This is great stuff.
4. Selection of a filesystem on an OS. Yes, nitty gritty. Includes talk about journaling vs non-journaling benefits and warnings.
5. Load balancing mySQL. How to do this through a load balancer. Why and how it's different than load balancing web servers.
6. How to backup live using replication.
7. I love this: Detailed information on how the replication system actually works. I love understanding the inner workings because then I know how/why things went wrong.
8. The benefits of myISAM/InnoDB table types including how they store/retrieve data, the features available, etc.
I've been optimizing databases for web use for some time and this isn't a book you should be tired of because it claims "high performance." Okay, maybe I'm a geek, but this book kept me up at night figuring out how I should be architecting a system.
Re:It Sounds Pretty Basic (Score:2)
Simple, just read a few pages on the MySQL documentation. [mysql.com]
2. RAID (software vs. hardware), RAID types, IDE vs. SCSI. I know. I'm sure you've thought of these things already but most people will find something new here because there is a lot on it. It is quite dense.
What on earth does this have to do with MySQL specifically?
to you maybe Re:It Sounds Pretty Basic (Score:4, Informative)
1. How to setup replication with different architectures including master/slave, master/slave/slave (another level), replication rings, dual master replication with slaves and more.
Simple, just read a few pages on the MySQL documentation. [mysql.com]
Some people prefer printed book form, as some publishers are aware.
2. RAID (software vs. hardware), RAID types, IDE vs. SCSI. I know. I'm sure you've thought of these things already but most people will find something new here because there is a lot on it. It is quite dense.
What on earth does this have to do with MySQL specifically?
You might benefit from learning about it if you are trying to optimise an SQL server?
And furthermore, if you dont know about _RAID_, and need to be told the benefits of SCSI/Raid, do you really think you need MySQL clustering ?
Well, once the reader has read these chapters they can make this judgement. You've manage to make this judgement for EVERYONE without reading the chapter for which I congratulate you.
Any PC/Server with enough ram will push queries like a maniac. Also, check #4...
queries... I think this book isn't just a book about getting fast queries.
3. Selection of an OS. Seriously. It actually compares threading models on OSes compared to performance with MySQL. This is great stuff.
Great, now that i've bought this book, i'm supposed to _CHANGE_ OS in my server farm, just to gain a few more queries/sec ?
Steady there! Whoa! This isn't advertised as a book for people who are happy with what they have and know they did it right by instinct. This could just be a book for people who want to learn something before they decide what to do NEXT.
4. Selection of a filesystem on an OS. Yes, nitty gritty. Includes talk about journaling vs non-journaling benefits and warnings.
Filesystems? Oboy, anyone who knows about SQL performance knows that if you access _DISK_ during execution, you are screwed anyway.
And whats going to happen to your data when you power down? I suppose you've been committing all your data to files on the disk? It might just be useful to know about the benefits and dangers of the various filesystems.
Anyone who knows anything about indexing knows that btrees and rtrees are optimised for pulling data off random access media. There's better index forms if you are going to be ram based. I forget what they are because all my data is disk based.
5. Load balancing mySQL. How to do this through a load balancer. Why and how it's different than load balancing web servers.
This seems like a decent chapter.
6. How to backup live using replication.
DOH! We are talking backup here, right? Backup means archiving a copy of the database. MySQL allows you to use the magic command "cp" (or scp, or rsync, or..) Best # so far
Totally wrong. If you want to back up a database that is being modified, you need to lock it and that would stop updates. This certainly needs a chapter.
You could read a few pages on mysql.com about this instead if this book offends you. It sounds like you might need to. Some people prefer web sites to books.
7. I love this: Detailed information on how the replication system actually works. I love understanding the inner workings because then I know how/why things went wrong.
The updated node sends commands to the other machines to update as well. If something goes wrong, you could rollback the transaction.
If there are transactions? I can use mysqlbinlog and convert to ascii logs, edit the logs, and replay the edited logs. If you didn't know about this, maybe you acknowledge the need for a chap
Re:It Sounds Pretty Basic (Score:2)
1. Comparison of storage engine features (e.g. myISAM, InnoDB, BDB and Heap) including lock granularity, file split-up, referential integrity, caching, compression, etc.
2. RAID MyIsam Table for overcoming OS file size limitations.
3. Benchmarking strategies including how to write custom benchmarks useful to you.
4. Indexing structures: B-tree, vs. hash vs. the geographic R-tree indexes. Of cours
Re:It Sounds Pretty Basic (Score:2)
Re: (Score:2, Informative)
Check your SQL.... (Score:5, Informative)
Er, anyhow, as an apology, here's an open source SQL analyzer [postgresql.org].
Re:Check your SQL.... (Score:5, Informative)
> just blame it on the engine.
Yup. There's a really good article by Stephane Faroult on OnLamp about writing better SQL... it's right here [onlamp.com].
After reading that article, I went through some code I'd written and found some places where I was using DISTINCT incorrectly in exactly the way he described.
Nearly worthless review (Score:2, Informative)
Having said that, I'm intrigued by the title and will probably investigate the book anyway simply because it's a topic that directly pertains to some upcoming projects on my calendar.
mysql backup (Score:5, Interesting)
Plain mysqldump does not export consistent data across tables. While you could lock all tables for the duration of backup, that's not likely going to be possible if you're aiming for "high performance". Mysqlhotcopy only works for myisam tables and simply tar-ing up data files is just asking for a disaster to happen. So how do you make a consistent backup without setting up a dedicated replication slave for this purpose?
(No, this is not a troll; I am in fact looking for a good way to back up mysql data on my server)
Re:mysql backup (Score:3, Informative)
I've never had to do this but your problem interested me. Think about it, how could you possibly make the database spit out a perfectly frozen-in-time set of tables without actually freezing in order to make the set?
It's not like you'd be able to tell the database "Make me a backup of the database as it was at is r
Re:mysql backup (Score:3, Informative)
This is an example of why people who have used real databases are annoyed by mysql's undeserved popularity. People who have never seen a DB before build applications and processes around mysql's limitations and think that's the way things are.
Why would you store
Re:mysql backup (Score:2)
Yeah, you are right, I've never heard of MVCC, but the transaction thing makes perfect sense to me.
I should have realized we'd left my area of expertise when the guy said he wasn't using MyISAM tables.
Re:mysql backup (Score:2)
Re:mysql backup (Score:4, Informative)
In principle, using LVM 2 system it's easy. Get database to consistent state on disk, take LVM snapshot of the logical volume involved, resume database, run dump on the snapshot, take out the snapshot when no longer needed.
Beauty is that the size of snapshot is size of delta that is caused during the backup; therefore, this is practical for almost any sized database.
Re:mysql backup (Score:5, Informative)
The caveat: You need to set up replication. This is a good idea anyways for a high performance system in case the master crashes and you need another system available quickly.
It works like this:
1. Set up replication.
2. Do a locked table backup on the slave.
The nice things are the details like how the new version of MySQL uses two threads on the slave. One to copy the transactions to the local drive and the other to apply the transactions. Because of this, your local transaction log is up to date even if the transactions haven't been applied which subsequently means you don't have missing transactions if there is a crash on the master during the backup.
This is seriously a good book. Okay, you might want to use PostgreSQL for its feature set but if you are doing write few read mostly with the web as the application, this book seriously gives you the confidence to use MySQL. Up until I read this book, I didn't know enough about MySQL performance implications to trust it for scalability. Now I know what I need to do to get it.
Move the problem away (Score:4, Informative)
Put your MySQL data onto a Logical Volume Manager volume. Use the LVM snapshot mechanism to insure a consistent view of the data during backup. Here's how:
Create an LVM physical group with enough space to contain all your database, plus enough overcapacity to store any changes during the backup proceedure. So, let us say your database contains 10G of data, will take 10 hours to back up, and will, during that time, undergo revision to about 1G of that data - you create a PV of 11G or more.
Next, create a logical volume big enough to store your data - in the case of the numbers above, create a 10G logical volume. In that volume create your file system, and set up your database.
Now, when you wish to do a backup, lock out access to the database, and do a snapshot of the logical volume it is on. Then restore access to the database. This won't take very long at all.
Mount the snapshot read-only and back it up. If you need to back up through the MySQL, bind a server onto the read-only data.
Then release the snapshot.
During the time the snapshot exists (in this example, the 10 hours to do the backup), the real, live read/write file system may be updated as desired, as long as the total differences between it and the snapshot do not exceed the reserve capacity of the physical group - in the case of the example numbers above that would be 1G of total differences (NOT 1G of writes - change the same 1k record a billion times and it is still a 1K change).
That's one of the reasons the Big Boys (like Sun) pooh-poohed Linux - it did not have LVM. Now it does.
Re:Move the problem away (Score:2)
Re:Move the problem away (Score:2)
MySQL has a feature to do this automagically:
FLUSH TABLES WITH READ LOCK
Although you may need to consider the transaction running on the machine. I can run the FLUSH & LOCK just fine on the machines that do small I/O, but I have an external locking mechanism for machines that do large SELECTs or UPDATEs. For examp
Re:mysql backup (Score:2, Informative)
...
Description
pg_dump is a utility for backing up a PostgreSQL database. It makes consistent backups even if the database is being used concurrently. pg_dump does not block other users accessing the database (readers or writers).
...
How does Yahoo! Finance use MySQL? (Score:5, Interesting)
Re:How does Yahoo! Finance use MySQL? (Score:5, Informative)
"Yahoo! uses the MySQL database to power many of the services on Yahoo! Finance (finance.yahoo.com), a popular web site that provides the full spectrum of financial services and information. MySQL is behind Yahoo! Finance's high-traffic area of news headlines, which is aggregated from over 100 sources worldwide, as well as its stock charts, insider trading, SEC filings, conference calls and earnings reports, among others. "
"Prior to its switch to MySQL in 2000, Yahoo! Finance managed its database needs through homegrown flat files and Berkeley DB databases. But those solutions proved to be inflexible and not scalable enough for their needs, with the volume of data nearly doubling every few months. "Our needs vary considerably, and that is one reason that MySQL works well with us. It handles the high-volume, 260 million record tables just as well as the small, low-volume ones. One of our servers handled over a quarter of a billion queries in a month-and-a-half, and it still has capacity to spare.""
"Since Yahoo! Finance started using MySQL, the database's popularity has grown within the company, and many of Yahoo's other world-wide properties are now using MySQL too. "We have used MySQL far more than anyone expected," stated Zawodny. "We went from experimental to mission-critical in a couple of months. Once others saw it, they jumped on board.""
---
I wonder why many people here hate MySQL even when it has been proven to be a very competitive database. Is it because MySQL is a foreign product?
Re:How does Yahoo! Finance use MySQL? (Score:4, Insightful)
OK, that explains a lot. 250 mil. queries in 45 days is normal for a small company like us and definetely peanuts for Yahoo - i.e. I'd say that this allows the conclusion that nothing particularly important or demanding (i.e. heavily used compared to Yahoo's overall traffic) is built using MySQL at Yahoo. Just to put this into perspective, a little dual Xeon box here does more than 5 million *INSERTs* every day and a large number of SELECTs (we update each of our currently 2.4 million product prices at between 1 and 20 times per day - the exact number of INSERTs per day is currently unknown to me). We use PostgreSQL 7.4.5 though, so we don't consider this a particularly impressive feat ;-)
The biggest shocker (Score:2)
I guess its safe to say they aren't on the bleeding edge.
Re:How does Yahoo! Finance use MySQL? (Score:3, Insightful)
> needs through homegrown flat files and Berkeley DB databases. But those solutions
> proved to be inflexible and not scalable enough for their needs, with the
>volume of data nearly doubling every few months
So, a SQL database is better than flat files. Ergo, MySQL rules...?
I'm glad that Yahoo has found a solution that works (or at least that the person who said that thinks it does) but that doesn't say anything about wh
I bought it 3 weeks ago... (Score:5, Informative)
1) The number of typos in the book. Didn't anyone proofread it before it went to print?
2) How many features MySQL lacks. Don't get me wrong, I like it as its fast and easy to work with for web stuff, but I was surprised at how often I read "MySQL doesn't do x yet, but will in version y".
The chapter on replication was worth the cost of the book for me, but I suggest people considering buying the book should wait until the 2nd edition.
Re:I bought it 3 weeks ago... (Score:2)
./Configure (Score:3, Interesting)
Read only MySQL, right? (Score:4, Insightful)
The authors certainly could not be referring to a system that has medium to high levels of inserts or updates.
While MySQL is great for reads, it sucks balls with inserts and updates... even with InnoDB, even with gobs of RAM and 15K SCSI hard drives in Raid 0 configurations on Quad Opteron systems.
And what a sucky review too... Nothing about what the book addresses.
Re:Read only MySQL, right? (Score:3, Funny)
I'll admit that INSERTs/UPDATEs are slower than SELECT, but they shouldn't be that painful. My Database is approximate 50/50 read/write by byte, and 66/33 read/write by query. During peak updates, I 'load data infile' about 10 Gig of data in 2 hours. It uses about 2 of the 4 Sparc 450MHz processors. 'load data infile' is quite a bit more efficient for bulk loads that 1 million INSERT statements.
I'll assume you have a rea
My company implemented mysql clustering and failed (Score:4, Interesting)
A deal with the devil? (Score:4, Funny)
Industry insiders have hinted that an organization known as Infernal Industries has become a major investor in MySQL. Headed by one Baal Diablos II, this rising star has come to challenge Microsoft and Oracle for the title of most evil corporate entity.
When asked for comment, Mr. Diablos replied "we have 6,000 years of investment experience. We created the modern labor market with our first employees, Adam and Eve. Today, we are becoming the biggest player in IT products thanks to our efforts to bring this formidible experience to the table."
The new, high performance MySQL preemptively versioned at 6.66 comes with a complimentary familiar with no less than 20 years of SQL experience. When asked whether this new round of investment would cause the Roman Catholic Church to divest itself of MySQL products, the Holy See responded that it is strongly looking into PostegreSQL.
Safari Rules (Score:2, Informative)
Best $120/year I've ever spent, that. And it's a tax write-off.
MySQL Performance (Score:4, Interesting)
Re:MySQL Performance (Score:4, Interesting)
For example, say that you're running a shopping cart and want to guarantee that each available item has a unique identifier. Is it more efficient to make the "itemid" field a unique key, or to fire off two queries every time you want to add an item: the first to attempt to fetch a row with the newly-generated itemid and the second to insert the new data? Even more importantly, how thread-safe is your website code? Can you prove that it's impossible for someone to insert an itemid the instant after you've tried to verify that the exact same itemid hasn't already been used?
Again, you must and will put some sort of integrity checking into your system, unless you're really curious how well unemployment insurance pays in your city. Do you want to hand-roll your own half-assed solution, or would it be easier to say "here's a list of constraints, Mr. Database! Make sure I don't accidentally break them, would you?"
Re:MySQL Performance (Score:3, Interesting)
Maybe it's just me, but I can't conceive of any relational database scenario where maintaining referential integrity WOULDN'T be a requirement.
Garbage In, Garbage Out. Who cares how fast your queries run if the data has gotten munged?
Re:MySQL Performance (Score:4, Insightful)
OK, MySQL can be used as a cache. It's like Squid and Oracle is like Apache. You can use Oracle to handle your actual transactions, integrity constraints, etc etc, then periodically dump out the data into MySQL and generate the web pages from there.
I have Oracle databases with thousands of connected users, all doing both queries and transactions. Could you read the data quicker from MySQL? Probably. Can MySQL manage tens - sometimes hundreds - of thousands of locks at a time? Not a chance.
His blog (Score:2, Interesting)
OLAP? (Score:3, Interesting)
One argument that I have been turning to is that not that many people take the time to become great MySQL developers. It is easy to be good with MySQL, it is terribly difficult to be great with MySQL. Indexes alone are a problem for many people to grasp - the whole leftmost column rule is easily overlooked unless someone is having a problem with a specific query.
Books like this give me hope people will begin looking deeper at MySQL as a development platform and that a wider base of developers will begin to emerge.
Anyways, OLAP? Can this book be used to help people get there?
M
(P.S. I have built an almost-generic OLAP system for my company using a MySQL backend. I really want to know what other people's thoughts are.)
Re:High Performance? (Score:4, Insightful)
The engine is only going to run as good as the crap you (developers) put into it. I would go so far as to say the simply due to the pervasiveness of MySql there are for more horrible script examples out there than for PostgreSQL.
Personally, I use to correct tool for the job. BLOG? Hell ya use MySql. OLAP? Hell no. But you could make it work and work well if you wanted to.
Re:High Performance? (Score:2, Funny)