In a recent interview I was asked two specific Oracle questions, the first of which I messed up the answer, and the second I didn’t know the answer, but I have never found it that important to think about the specific terminology. Why? Because in many environments some things are far less significant operationally than they are developmentally – or in other words when it gets to a busy production database, then that creative developer applying elegance can be useful but that elegance can also cause a different performance problem to the issue that it was trying to solve.
So what were those questions?
Question 1: “What causes snapshot too old errors”? My answer: “When rollback runs out of space”. Now to me that is comprehensive because it covers the disk space, the fiddling with the undo retention parameter for the unwary, and also the fact that a SELECT requiring a consistent data set could run out of space because other competing DML (Data Manipulation Language – INSERT, UPDATE, DELETE, MERGE) statements have used up available rollback space, overwriting entries needed by the SELECT, rendering data at the start execution time of the SELECT as no longer available to the SELECT, and thus the SELECT cannot complete because it will want to retrieve data from rollback that has already changed – thus snapshot too old occurs. Still my answer of “when rollback runs out of space” is correct but I worded it incorrectly because perhaps I should have said when rollback entries matching the SELECT are no longer available. The solution offered by the questioner is that this is a tuning opportunity – maybe to a developer but to a production DBA one should increase available rollback space so that all operations are allowed, because operationally one doesn’t generally question development requirements unless your production environment is actually breaking. Why not tune the SELECT? One could but not every statement can or should be tuned, and there could be many other operational reasons why one should not tune a query in the form of other more pressing problems, and a developer might waste time focusing on something that might not be all that important.
Question 2: “What is the difference between an access predicate and a filter predicate”? My answer: “I have no idea. I have never heard of an access predicate”. Now to me in retrospection this is a very fine point about the tuning of queries using composite indexing (sometimes), and I have seen some databases that were very heavily over indexed and very slow, and full of critically large operational problems because they had 25+ indexes per table (many of them multiple column composites), and the databases had constant concurrency issues and even ugly things like deadlocks and customers complaining of lost activities and entries (transactions they are sure they executed but just seemed to vanish … deadlocks). This result is a database with a seeming inability to process concurrent transactions because of ill configured sequences and indexing disk space that was edging towards getting to occupy the same amount of space used by tables. Why that’s an issue is a potentially separate topic but suffice it to say that indexing should be making things faster by reducing i/o and if there’s as much index space occupied as tables then it might indicate the opposite and for busy OLTP then every row change will change lots of indexes at the same time.
So an access predicate tells a developer that an explain plan filtered in (or out) rows when reading an index rather than when reading the table, and a filter predicate filters when reading the table. There’s a really simple explanation of access and filter predicates here. Also the difference between access and filter predicate is no big deal – it’s just a label for something that’s common sense anyway in that a table is accessed through an index and in reality a database is best not including all filter columns in searches because it is unnecessary (Oracle doesn’t need it) because of the fundamental fact that query speed is far more dependent on the nature of data than anything else. So technically reading an index that filters rows during a range index scan (reading 2 to many index entries) can help a query to filter at the index rather than the table level, and so it encourages developers to potentially use many and more composite indexing, and most likely on fields that have nothing to do with referential integrity. So when you as a DBA or troubleshooter start digging into a database looking for those things that developers do that you would prefer they didn’t, when it comes to indexing, from a high level you’re looking for three things, because indexing is much more complex than just reading the index or the table:
- Is referential integrity in place and are the FKs indexed, and do tables have PKs, and are those PK and FKs surrogates or natural keys, and are surrogates unique across each table or are they duplicated down the hierarchy? A lot of questions yes? To some they all have meaning and if not to you reading this then you might want to research each of these topics individually – they’re all conceptual and architectural in nature but they are also very important.
- Are indexes “nailed” on for reporting? The word nailed is deliberately used because I hate nails – screws are much more effective, I don’t hit my thumb with the hammer, and I can easily unscrew them and start all over again). This scenario might often look like many large composite indexes on single tables that don’t seem to belong. Reports generally read lots of data and users expect a lack of speed – do not sort your reporting with indexing – get rid of the indexes and allow Oracle to sort your reports during i/o operations (there are better ways to split i/o).
- Numerous composite indexes on fields that probably include PK and FK fields, and rarely are there lots of fields in the composites. These indexes are common in shared schema SaaS/multi-tenant database designs, where large clients and small clients require small searches even on static data to have indexing that allows things like strictly controlled indexed access into tables. These types of indexes are also common where rapid small scale reporting is required, that is more like on screen listings of transactional data, as opposed to reports. Over indexing can cause big problems in operational non SaaS and SaaS environments with concurrency – maybe think about partitioning those multiple tenants (not hash partitioning as it’s for splitting large i/o operations), and perhaps think about using histogram statistics (inherent) – also Oracle 12c is a multi-tenant database and has some interesting features.
In short, if it looks like there is too much indexing then there probably is. Then think about why your database is like it is (there could be a good reason), in order to figure out how to address the issue if it needs addressing, and what solution you should use. Bear in mind that more indexing may be the solution but watch concurrency levels in OLTP environments – just don’t try to index every query and watch out for developers trying to duplicate indexing with separate function based duplications. Also be aware that full table scans are not always bad, but most importantly focus first on the issues that have the most impact as in the difference between an access and a filter predicate making very little difference in a user loaded operational database – find something that does make a difference – it’s all about understanding the data, the application behavior, operational user load – it’s not about elegance … which leads to this:
Elegance is for developers to worry about, but for ops people to ignore, which is really the problem in and the point about this blog entry, in that the creative and operational people are really doing two different things and often working against each other – they need to learn to sometimes play in the same sand box and that often requires leadership and an understanding of both development and operations to make creative elegance service end users properly – if you’re going to get bogged down into details then make sure they are useful details.
Personally, when interviewing DBAs I prefer to ask question that tell me if the person I am talking to is what I like to call development-centric, operationally-based, or both – both is rare as most people can’t wear more than a single hat. Developers are creative and open to new ideas but production people are consistent and reliable, and they don’t get creative and experimental because one simply doesn’t do that on the operational side of the business – creativity is not even a part of the operational reliability personality profile – it’s not even fear; in fact a manager might want to worry about an adventurous DBA or a fearful developer. What I’m really looking for are people like myself who can wear more than one hat, sometimes even at the same time. So I like to test for familiarity and ask questions that seek general understanding, such as such as how do redo logs, archive logs and datafiles fit together? What connects them? And what happens when they get out of sync? And how can they be (or are they) resynchronized?
Question 3: “What do you do about high CPU?” My answer: “A lot! There are many different answers and it requires drill-down, but there can be multiple causes – please be more specific about the problem you are having otherwise I will have to guess at and drag on about multiple possible scenarios”. In truth, this is not a good question because there isn’t really a good answer to it.
Question 4: “What is split brain”? My answer: “This is a situation where a cluster has a node failure and the nodes come up as more than one database, resulting in more than one database accessing the same data (the same tablespaces because disk groups can be shared), where same data access from more than one clustered database can and probably will corrupt the database”. Some Interview Questions to Check Oracle RAC Exposure.
Question 5: “What are the different types of RMAN scenarios”? My answer: “Be more specific please”, after I had talked about recovery and high speed backup scenarios I was asked about point in time recovery, which is also not rocket science. Look for understanding of backup and recovery scenarios and if someone has performed lots of recoveries it might call their operational database administration skills into question. If you really want to gauge understanding of RMAN ask 1) How can you speed up recovery when you have to restore after a disk failure, describing each step you take without expecting manual based precision of RMAN commands (someone who has memorized the commands doesn’t know how to use the manuals and might crack in a situation that requires out of the box thinking, which many fire fighting situations do), and 2) ask how best to speed up RMAN backups using Oracle RAC including all configuration on Oracle config files and on disk.
Here is an older posting referring to some Oracle RAC interview questions I devised.
Once again, for developers, elegance isn’t everything and speed isn’t always about the logical, because operationally it can be more about physical i/o. For operational DBAs, developers are there to experiment, which means that they are sometimes right and sometimes wrong but if you don’t allow changes to filter through then competing products will steal your customers
And when interviewing a prospective Oracle person, firstly, focus on what they understand rather than if they have specific details tucked away in their heads – most smart DBAs will prefer to understand and look specific details up in the extensive manuals when they encounter a problem. And secondly, be aware that developers interviewing operational people (and visa versa), may yield unanswered questions on the part of the interviewee in that ops people do not have things in their heads that developers do (and visa versa). Thirdly, and I’ve seen this more than once, a tech. will ask me about a solution to a problem that he/she has been trying to solve for 6 months – am I supposed to have a solution for a knotty and insoluble problem in a 5 second time window? I’m not Houdini Lastly, if you’re recruiting a person who has an understanding of both development and operations, expect the unexpected because not every interviewee knows everything about the more specialized fields (this is why knowledge worker management is hard because leaders have to expect their charges to know more specifics than they do), and you want to avoid distracting yourself from missing out on the best best people. Be careful about settling for a “good-fit” that fits in, counteracts diversity, thinks the same way as all your current employees, and stifles your ability to innovate in the long run when you might need that creative spark the most from a single many hats person thinking out of the box – remember that groups of people tend to be conservative when allowed because the diversity bit is harder to deal with, creativity is harder to manage, change usually stems from innovation and that scares people – people hate change even when it’s slow but often change is required for competitive survival – especially in IT because it changes faster than most sane minds can even catch their breath to think about. Above all, figure out your goal first when recruiting rather than just feeling your way through a process.
If you ask an interviewee a question that you have been wracking your brains trying to solve for hours or days, then don’t expect even the seasoned expert to have a sensible answer in 30 seconds. Quite often how you ask questions determines how it is answered and asking a question you don’t know the answer to is not going to get you an answer you might actually want to hear – like for example, you have been barking up the wrong tree for the last few weeks; unless of course you trying to show up a potential competitor.
When you want to understand use of GoldenGate and DataGuard (standby databases), you might want to ask about understanding of the ways in which DataGuard and Replication work, and how they are similar they are and in what ways specifically.