Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
m5
Oct 1, 2001
Personally I think it's risky to perform relational comparisons on fields that might be null. That's what COALESCE and ISNULL (MSSQL) and CASE are for.

SQL does not short-circuit logical expressions, or, at least, MSSQL doesn't.

Also, for your SQL reformatting needs: the online SQL formatter.

Adbot
ADBOT LOVES YOU

m5
Oct 1, 2001

Victor posted:

I don't think there is some solid limit -- the 8000 number just comes from the maximum storage length of char, varchar, binary, varbinary, in SQL 2000 and earlier (2005 has [n]varchar(max), and varbinary(max)). Selecting that many IDs from a GUI is a bit questionable, so if it is possible, I would just put a limit on that (say, to a 100 entries), and also have a "select all" option, in which case you wouldn't use in. However, people can be retarded and require that users be allowed to select hundreds of entries, so this might not be an option. If so, I guess a temporary table would be the best approach, albeit ugly.

SQL Server seems to have issues with the size of individual "IN" clauses, but in our experience it works OK if the lists are split up into separate IN clauses that are connected with OR.

In our application, it isn't a case of idiot customers, and selecting thousands of things to operate on makes perfect sense.

m5
Oct 1, 2001

jwnin posted:

As a result, our devevelopment team thinks that the best thing to do is to split each customer out into its own database. The catch is that there are around 20,000 customers per year, and each customer database would be about 100 MB in size.

If what you're saying is that for any given customers there's a foozle table and a thingy table, and each customer may have 50 foozles and 1000 thingies, then one possibility is to look into clustering (if your DBMS supports it; MS SQL Server does). If you keep your customer ID on each table, and cluster on that (it does not have to be the primary key), then as long as you make sure to include customer ID when you do queries (incl your joins) then it's almost like having separate tables for each customer. It'll slow your inserts down a little, but not much, and your searches will be much faster.

m5
Oct 1, 2001

Victor posted:

I'm assuming they aren't clicking thousands of items? My "idiot" comment was predicated upon that happening.

Well it wouldn't surprise me, but they certainly don't have to do that.

Actually we're kind-of dumb about how we do this. We provide "queue" views with filters and stuff, and things can be multi-selected or drag-selected or whatever. The easiest thing to do is to use the filters to get what you want and then click "select all". If we were smart, we'd then couple the filtering onto the running of the operation direction.

(Actually most operations, for various reasons, have to go one-at-a-time anyway.)

m5
Oct 1, 2001

Dakha posted:

I'm extracting data from an oracle database using SQL+. One of the tables I need to get has several columns such as '/AB/CDE' and '/AB/CDF' etc.

However, when I attempt to execute:

SELECT KONP./AB/CDE, KONP./AB/CDF FROM KONP
WHERE....

it doesn't recognise the columns. Nor does:

SELECT /AB/CDE, /AB/CDF FROM KONP
WHERE....

Is this an escape character thing? I thought the escape character was backslash not slash. What should I do so that SQL+ can get the columns I need?

Square brackets maybe? I'm guessing because I'm not sure what you're saying - are those slashes supposed to mean something or are they part of the column names and not otherwise interesting? (If so then wtf)

So maybe
code:
SELECT [/AB/CDE], [/AB/CDF] from KONP WHERE ...

m5
Oct 1, 2001
With left joins like that, what helps is making sure you get your qualifying (filtering, in other words) criteria into the JOIN expressions. Join in an order such that your most restrictive conditions are evaluated as early as possible.

m5
Oct 1, 2001

Golbez posted:

I'm not sure what you mean by getting the qualifying criteria into the join expressions... In this case, this join is simply to fill out the info on getting the actual name of the Department instead of the Department ID, etc... Now, sometimes the WHERE will be only on Entries, but sometimes it will be on the DepartmentID, but even then, it's still searching for the DepartmentID in Entries, using the one in Departments only to join the DepartmentName.

I mean that if you're filtering on
code:
d.DepartmentName = 'SALES'
then the query plan isn't (necessarily) the same if you put that in the WHERE clause instead of the JOIN:
code:
SELECT e.EntryID, e.EntryText, d.DepartmentName, a.AgencyName, p.PhaseName
  FROM Departments d
       LEFT JOIN Entries e ON e.DepartmentID = d.DepartmentID
             AND d.DepartmentName = 'SALES'
       LEFT JOIN Agencies a ON d.DepartmentID = a.DepartmentID
       LEFT JOIN Phases p ON e.PhaseID = p.PhaseID
The book SQL Tuning has a bunch of extremely good stuff about query-writing. It's mostly database-agnostic: it's not one of those guides about how to get Oracle or MS SQL Server to do particular things, but instead about how to write SQL code to express query algorithms as efficiently as possible given what you know (and what the server can't know) about your data.

m5 fucked around with this message at 16:16 on Nov 27, 2007

m5
Oct 1, 2001
Check out Aqua Fold "DataStudio" for a free, flawed, but overall useful tool to provide graphical "EXPLAIN" rendering for a variety of databases.

m5
Oct 1, 2001
In SQL Server you can do this:
code:
select foo, datepart(minute, timestamp) / 10
from whatever
group by datepart(minute, timestamp) / 10

m5
Oct 1, 2001

deimos posted:

CoC SQL gods, I have a question, is there a way to apply an EAV-like schema without the SQL heresy that is EAV. I have a client asking for what is basically a product database that can be compared, the problem is that there's many different products that I want to be able to define dynamically (eventually offload this to their administrators). It's going to be used to track price and feature changes through telecommunication providers (this is for pricing and mostly marketing/decision making reasons, and it's not going to be outside facing, so performance is not so critical (few clients, some patience).

So basically I need to define my data models dynamically. I was thinking of using this sort of modeling.

An example of my Data Definition (only products themselves need to be dynamically define):
Product1 (Voice lines)
- Company - this is really metadata
- Contract Length - same
- Actual Price
- Extra Line Price
- Facility MRC
- SLC Price
- Regulatory fees
- Total Fees
- Total Facility MRC plus Fees
- Notes

Product2 (Long Distance Plans)
- Company - this is really metadata
- Contract Length - same
- Local Calls
- LD US
- LD Int
- TF Services
- CCs
- Timeframe
- Per Minute Price
- Notes

An example of atoms:
Company 1's Product1:
- Company: Company1
- Contract Length: 0yr contract
- Actual Price: $350
- ELP: $310
- SLCP: $660
- RF: $30
- TF: $440
- TFPF: $900
- Notes: Does not include feature X or Y

Company 1's Product2:
- Company: Company1
- Contract Length: 0yr
- Local Calls: True
- LD US: True
- LD Int: True
- TF Services: True
- CCs: True
- Timeframe: 24hr (fractioned 30/6)
- Per Minute Price: $.05/min
- Notes: <blank>


My guess is that SQL is just not designed for this and it's better to use something like an RDF datastore for this, and if this is the case, does anyone have any resources for learning to use RDF?

Come up with a generalized model. If you're going to be able to implement some sort of comparison, you'll be doing that anyway.

m5
Oct 1, 2001

deimos posted:

Gah forgot to mention, the comparison is only between companies on the same products, the problem with a generalized model is that the products are very different, some might have 35 different attributes (this is pretty much the case with wireless plans) and might grow dynamically (wireless plans are constantly adding features, think of alltel's favorites and other stuff).

Well, then in that case I'd pack the most "stable" attributes into a primary table. For some of the others, instead of a single lump of name/value pairs, maybe there could be separate tables for different sorts of optional attributes. You then left-join to those tables from the main one. Maybe.

m5
Oct 1, 2001

Triple Tech posted:

Is there a way to programmatically detect the existance of a table, to let's say conditionalize running a create (create if not exist)? This is Sybase I'm working on. Apparently there are stored procedures that list out all the tables, and I guess I could parse those... Are those stored procedures made by my coworkers or are those Sybase built ins?

Isn't there a "sysobjects" table, or something like that? That's what we do in SQL Server.

m5
Oct 1, 2001

fletcher posted:

code:
SELECT	COUNT(comment.id),
		picture.id
FROM	picture
JOIN	comment ON 
		picture.id = comment.pictureId 
WHERE	picture.owner = :id
GROUP BY	comment.pictureId
Why is this not returning the correct count? It's close, but still a few off for each picture. Am I grouping by the wrong column?

What happens when you group on "picture.id"? (What DBMS is that?)

m5
Oct 1, 2001

Bitruder posted:

Thanks for the previous reply!

Another question:
The following works fine when the subqueries only return 1 item, but when a subquery returns more than 1 items, I'd like to have all the results work by ORing them together, but it errors out now. Is there a way to treat multiple subquery rows as ORs?
code:
SELECT * FROM entries 
WHERE ID = (SELECT entryID FROM metadata 
                        WHERE fieldID = (SELECT ID FROM metafields WHERE name=:name))
EDIT:
SOLVED! Instead of =, I want "IN".

For some servers, expressing that as an "EXISTS" clause results in a better query plan.
code:
SELECT * FROM entries
 WHERE EXISTS (
   SELECT null FROM metadata
    WHERE metadata.entryID = entries.ID
      AND EXISTS (
        SELECT null FROM metafields
         WHERE metafields.fieldID = metadata.fieldID
           AND name = :name
      )
 )

m5
Oct 1, 2001

Freezy posted:

Yes, but then I would not know specifically which of orange/apple matched :)

I suppose I could take the matching myfield and parse it myself, outside of MySQL. It would just be nice to have a pure MySQL solution.

You could dynamically build the query. You'd need a "WHERE" clause predicate to select rows that match one or more of the search strings. You'd then have columns with a "CASE" clause for each input string separately, counting up ones or zeros depending on whether an individual string matches the row. Your result set would be a single row with separate values for each of the separate input strings.

m5
Oct 1, 2001
As it happens, I implemented an event log system almost exactly like that. We've got over 300 different types of events, and maybe six or eight different kinds of event "targets".

One basic caveat: I would seriously recommend that you reconsider wanting to builid in those FK constraints. They'll really load things down. In our system, we've got literally millions and millions of events - every day's batch processing causes many hundreds of thousands of new ones to be logged. Those constraints will make deletes take forever if you're not careful.

Also, if your event log is anything like an audit trail, why would you want entries in it to go away? Things that happened don't "un-happen" when the target is removed from the system.

Our events track the target info, the "actor" responsible (if any), the event type, the timestamp, and disposition information (our events can generate audit email, and we have to keep track of whether an event has been thusly processed). There's a head table to keep track of common attributes (like you said), and then auxilliary tables by event type for extra attributes. Our events are defined in an XML file, and some code generates Java classes and SQL code for maintaining stuff. The generation happens automatically as part of the build.

We used to keep FK constraints on the aux tables back to the head table, but that was causing us real problems. We've never kept FK constraints back to target objects.

m5
Oct 1, 2001

minato posted:

The events system was actually just an example to couch the general principle in and get it across, I'm not actually writing an events system. In my particular case, the FKs will not introduce significant overhead.

I was more concerned about what people thought of breaking 3rd-normal form in order to simplify the matter of referring to various different types, and whether people had actually used the EAV anti-design pattern I mentioned in Option 3 and regretted it. Thanks for the comments so far!

The EAV approach would be laughably inadequate at the scale my database operates.

The normalized version is a little tricky to deal with when a system wants to read through events, like the mail sender I mentioned. What it does is operate over new events in batches. Over each batch (range of event IDs), it does a single query to get a count of events of each type that happen to be there. It can then construct a set of queries, one per type in the batch, to get the events of each type. The queries are built from SQL generated automatically from the XML. That gnarliness is all centralized in one piece of code, so nothing else in the system really needs to care about it.

When loading just a single event, of course, there's no problem because again the SQL is built straight from the event specification.

m5
Oct 1, 2001

Xae posted:

Yes, yes you should.

There is a reason why there is millions of dollars put into R&D. The database will do constraints, and transactions faster, more accurately, and just plain better than anything you can possibly conceive of.

Yes, but constraints are not free.

Example: if you have a column in a table that references another table, and you put a FK constraint on that, you'd better put an index on it too - even if you have no reason to do so. Why? Because if you don't, when you delete rows from the target table the server will have no choice but to do a table scan of the other table to make sure you're not violating the constraint.

Another example: in the case discussed above, with the head table and a heterogeny of auxilliary tables, if each auxilliary table has a FK constraint back to the main table you'll have hell to pay when it's time to clean out a bunch of rows from the main table. (Actually what you'll have to do is drop all the constraints, clean out the crap, and then re-add the constraints - if you don't you'll be waiting days. This has happened to us and we now don't have those constraints.)

I believe that FK constraints are a good idea when appropriate and when you can afford them, but to blindly throw them into your schema whenever you have a cross-table reference is not wise.

m5
Oct 1, 2001

Xae posted:

A full scan, only if you don't bother making an index, is a required thing. Otherwise you breach data integrity. You have orphaned records. You are now hoping that something else doesn't take the old PK and completely gently caress your system. I prefer to rely on something more than hope and dumb luck...

Any system in which deletes would cause that much of a strain that it would be unworkable with todays hardware probably isn't deleting anyway. They simply end date the record, set a delete flag or otherwise execute a logical delete. Data has value, and you don't just throw it away.

You're not familiar with the real world. The tables I'm talking about have tens of millions of rows in them. We delete stuff because we need to curb infinite growth in the database - I think it's well over 100GB now in backup form.

You only have orphaned records if the software has bugs in it, mostly. If there are a few orphaned records floating around among millions and millions of fine happy ones, I'm OK with that. I don't need my database to be a perfect shining jewel of integrity. I need it to mostly work, and to support the heavy load it's under with acceptable performance.

m5
Oct 1, 2001
Look, I don't know how to explain the situation other than to say that maintenance of foreign key constraints in the tables we have has sometimes become untenable.

I have not claimed that the code is perfect, nor that I expect no software errors. The issues are performance issues. We cannot buy a server much bigger than what we've got, and partitioning the database is more work than can be afforded now (possibly ever).

m5
Oct 1, 2001

Xae posted:

Part of the thing is that he is asking for design advice. You need to catch things like "will we need to partition" before you start making the system. If you don't, then you end up in a situation where you become trapped and can't fix it.

If he was running in a system I wouldn't advice changing it, but since the system isn't yet built it is best to design it correctly, so that when the inevitable happens, and you are in the best position to deal with the situation.

You design following "the rules", then break where needed due to money, political or other "real" constraints.

Oh we knew we would have to partition, and we definitely can, but it's not going to be a small project. There's more to it than just the database part. (The overall situation is not the best one could hope for; retards are now in control of what we do.)

Note that what we've done is remove FK constraints where they became too burdensome. Where they're not, we still have them.

m5
Oct 1, 2001

schzim posted:

I'd like to include an example of what I did so some can explain
A) Why this was so slow
and
B) What a better method would have been.

INSERT INTO memberships (user_id, group_id) SELECT users.user_id, groups.group_id FROM users, groups WHERE users.user_name = '%PYTHONVAR' AND groups.name = '%PYTHONVAR'

As I parsed through the txt file and found a group and then as i found a user i would generate a query like this.

PS Overall it made like 70k writes into memberships ;)

You could have uploaded the (username, groupname) pairs into their own table and then run a single insert/select (which is what Victor was suggesting).
code:
insert into memberships (userid, groupid)
select u.user_id, g.group_id
  from namepairs np
  join users u on np.username = u.user_name
  join groups g on np.groupname = g.name
You'd save on the overhead of all those separate database interactions.

m5
Oct 1, 2001

Walked posted:

Woah, raddical :2bong:
Thanks :)

To bump it up, how would one get the most recent n entries? I know to script it with multiple recursive queries from C#, but is there a way to run a single query to get the two highest IDs?

Some databases have pagination or "TOP" qualifiers. In SQL Server you can say
code:
select top 2 * from ...

m5
Oct 1, 2001

Walked posted:

edit: And is there any way that anyone knows how to run SQL statements in MSSQL one-by-one, without using their little query editor and having to hit F5 each time? CLI style basically.

We have a Java CLI tool we wrote. It's probably very easy to build something like that in almost any language with facilities for talking to the database, and it's an instructive little project.

m5
Oct 1, 2001
Funny. I had the same thought the other day. Our app is all Java, so something like StringTemplate (the ANTLR thing) or a simple home-grown one that used OGNL or something would make a lot of things nice.

m5
Oct 1, 2001

triplekungfu posted:

This question popped into my head while reading another thread, but I figured it was more suited here. Also, I'm using Hibernate/HQL but I suspect the issue is pretty much the same regardless.

Let's say we've got two classes/tables, A and B, is there any way to select between the two in a query using parameter substitution rather than string concatenation?

Huh? I mean, the answer is probably "yes", but I don't know what "select between the two" means. Are you talking about a join?

m5
Oct 1, 2001
... and that brings us back to the idea of using a templating system for plugging stuff into SQL statements.

m5
Oct 1, 2001

Quantum Jiggles posted:

Yeah, this is what I fear. The application is in fact doing this globally as a common pattern. I'll probably have to rip out the custom ORM layer and replace it with Hibernate or something that will allow intelligent queries, then rewrite all the inefficient methods. :sigh:

I don't know what your application does, but one architectural thing that can be a problem is for the app to use the ORM layer as a means of producing views of large aggregations of objects (like, tables). Doing a query per row to produce a tabular "inventory" view is never going to be as good as producing the view with a single query.

m5
Oct 1, 2001

nbv4 posted:

using another table for tags seems like too much work for such a simple tagging system that my project requires.

Well, that's the way you do things in SQL.

Adbot
ADBOT LOVES YOU

m5
Oct 1, 2001

nbv4 posted:

Then how would it be done?

The user types in their tags. PHP splits the string up at the commas. A recursive function goes through each tag, and checks to see if it's already in the database. If it's not, it adds it, then gets the auto_incrimented value and sticks it into the item's 'tags' row. Thats like 5 or 6 queries. Why do it that way when you can just stuff it all into a TEXT column and be done with it? I'm not writing an enterprise level project here where the database has to fulltext through millions of rows. Each tag exists within each user. None of these tags need to exist "globally".

If the tags are really "structural", then they can be discrete columns along with other stuff about the entity. If there's an uncertain, possibly expanding population of tags and tag values, however, then using another table is the "right" way to do it.

(Aside: why do you say that your tag-checking function is recursive? Are there tags within tags?)

There are a couple of ways to do the SQL, and it's not 5 or 6 queries. Your tag table would have a primary key of its own (probably), and then a column for the main entity ID, plus the tag name and value columns. How you manage the updates would depend on the way you load and save the entity. One possibility is that you load up the entity, manage the tag set as a map, and then save it all back. In that case, your SQL would (in one trip to the database) delete all the existing tag rows and then insert all the updated tags.

You're welcome to jam all your tags together into a string if you like. I've gotten used to always thinking in terms of scalability, because that's the world I know.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply