Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Victor
Jun 18, 2004

m5 posted:

Personally I think it's risky to perform relational comparisons on fields that might be null. That's what COALESCE and ISNULL (MSSQL) and CASE are for.
Ehh, with three-valued logic, unknown or true = true, unknown and false = false, which is sufficient for the code posted in the date is the only nullable criteria-column. That being said, null will bite you in the rear end if you aren't careful (especially with subqueries that return a null). I want intellisense that tells me what columns are nullable, when I'm comparing different data types/lengths, and which columns are covered by indexes. Sigh, one can dream, right? Who knows how much intellisense will make it into SQL2008.

Adbot
ADBOT LOVES YOU

Victor
Jun 18, 2004
Last time I tried SQL Prompt, I found that it requires you to type ctrl-space or something to activate the dropdown (or the dropdown would appear way too often), wouldn't autocomplete after pressing space, and other such annoyances. Maybe I've just acclimated to VS2005's intellisense, but I really like it, more than any other intellisense I've used. Oh, I don't think SQL Prompt does autocompletion for INFORMATION_SCHEMA, which I use a lot.

Victor
Jun 18, 2004
Why not just do the following? Hopefully you have some character that is never in the title.
code:
replace(replace(replace(replace('^' + title, 
    '^A', ''), 
    '^An', ''), 
    '^The', ''), 
    '^', '')
If you don't want to rely on some character never being in the title, then you could just use a case when...end statement.

Edit: you're kinda screwed with Regex unless you want to use it to determine whether the first word is one of the above, and if so, find the index of the first space and grab the string after that space. (MySQL regexes only do matching, not capturing of groups or replacement.)

Victor
Jun 18, 2004
Find one of the many table-valued udfs that split strings, and then pass the ids as a delimited list.

Victor
Jun 18, 2004

Satan's Scallion posted:

What we do now is bulk load the list of values to a temporary table, create an index, and then simply query with a join against this table.
Maybe there is no analog on Oracle, but in SQL Server, one would just create a one-column table with the clustered index on that column; no additional index would be needed. (Clustered indexes are simply indexes that define how the rows are physically ordered.)

Victor
Jun 18, 2004

chocojosh posted:

sp_executesql will not accept dynamic table names or field names to be entered.
Correct; all it allows is for parametrization and outputting scalar results to variables in the calling code.

chocojosh posted:

I'm wondering if there's any "gotchas" for using temp tables or table variables (and what's the difference between the two).
Indexes cannot be created on table variables. You cannot access table variables in sp_executesql statements that were declared in the calling code. Table variables are local to the calling code, while temporary tables are local to the current connection (#) or all connections (##).

I want to know why Microsoft hasn't worked on passing tables around, or at least views that can be dynamically selected from without dynamic sql.

Victor
Jun 18, 2004

chocojosh posted:

I am choosing my table dynamically, based on an input parameter (I *know* you shouldn't do this due to lack of query caching). To do this requires using the exec statement which can make you a victim to sql injection.

sp_executesql protects you against sql injection; but, it does not allow you to choose your table name dynamically (it will only allow you to pass in parameters for your where clause).
As long as you're smart and do replace(varchar_variable, '''', ''''''), you're safe from SQL injection. This isn't to say that parametrization is bad, just that it isn't a must.

You can do dynamic SQL with sp_executesql just like you can with exec(sql). sp_executesql only allows scalar parameters, which may be what you were getting at.

Victor
Jun 18, 2004

pig-fukr posted:

They have. In 2008 you get table-valued parameters that can be passed around, provided you're willing to strongly type them and read-only access will do what you need:

http://blogs.techrepublic.com.com/datacenter/?p=168
Sick! Is there some place on the net where all the features of 2008 (new over 2005/2000) are described, and described well? I've seen several overviews of what is coming in 2008 (like intellisense, yay!), but I haven't yet seen the above. I just glanced at 2008 Books Online and it does not seem conducive to learning about the new features.

Victor
Jun 18, 2004

chocojosh posted:

replace(varchar_variable, '''', '''''')


Is that the *ONLY* thing I need to do to avoid SQL injection? Can you provide me with a link (not that I don't trust you, but I'd rather not leave security holes in my software due to my sloppyness)? If so, I may just use one long exec string instead of the table variable.
If you do that, there is no way I know of that one could get a single quote inside a string. As far as I know, that is the only way one can terminate a string. I don't have any papers or whatnot to back this up, but I haven't searched for them either. The only SQL injection exploits I've heard about [0] are from terminating quoted values, either by putting a single quote in there or by screwing with unicode characters in such a way as to fool the character conversion routines [1] into converting a non-single-quote-character into a single-quote-character after the replace operation above. As far as I know, this cannot happen inside sprocs (any possibly-insecure character conversions have already been done by the time the replace operation is called). It isn't a problem in .NET code AFAIK, because System.String is always represented in UTF-8.

[0] assuming that anything that doesn't go through the replace function above is strongly typed into something other than a string, e.g. int, float, date, etc.
[1] that process the input from initial text input -> when the database query optimizer gets its hands on the query

Xae posted:

You would need to check for ';' as well, since they might try to submit a compound statement. ie "Select * from sys;Drop X;"
Wrong. If you cannot escape the single quotes, your text will only ever be interpreted as a literal.

Victor
Jun 18, 2004
Hairy == slow? Ugly?

Victor
Jun 18, 2004
I don't think there is some solid limit -- the 8000 number just comes from the maximum storage length of char, varchar, binary, varbinary, in SQL 2000 and earlier (2005 has [n]varchar(max), and varbinary(max)). Selecting that many IDs from a GUI is a bit questionable, so if it is possible, I would just put a limit on that (say, to a 100 entries), and also have a "select all" option, in which case you wouldn't use in. However, people can be retarded and require that users be allowed to select hundreds of entries, so this might not be an option. If so, I guess a temporary table would be the best approach, albeit ugly.

Victor
Jun 18, 2004

m5 posted:

In our application, it isn't a case of idiot customers, and selecting thousands of things to operate on makes perfect sense.
I'm assuming they aren't clicking thousands of items? My "idiot" comment was predicated upon that happening.

Victor
Jun 18, 2004
Join illustrations.

Victor
Jun 18, 2004
If you have an index on owner, it's not very expensive. However, that makes it more expensive to insert/remove rows from your messages table. Whether this actually matters depends on details you have not provided.

Victor
Jun 18, 2004
I might make a thread about sweet SQL2005 features at some point, but I just wanted to get this out there: using cross apply/outer apply allows you to join a table-valued function to a table, using values from the table as input to the UDF. That's useful for, among other things, normalizing data with a CLR Regex Split function. I learned about this tonight, after getting pissed that there was no way to link a table-valued UDF to a table, since relations can't refer to each other outside of the join condition. SQL2005 to the rescue!

Victor
Jun 18, 2004
Let's say I have deployed a table-valued function called dbo.RegexSplit that returns a table containing the column value. Let's say I have a table that is (foo int not null, list_of_bars varchar(100) not null). I want to normalize that to (foo int not null, bar int not null). First, I tried
code:
insert  Normalized
select  foo,
        value
from    Denormalized
inner join dbo.RegexSplit(/* no access to Denormalized */) on /* nothing to put here */
The solution is cross apply:
code:
insert  Normalized
select  foo,
        cast(value as int)
from    Denormalized
cross apply dbo.RegexSplit(list_of_bars, ',')

Victor
Jun 18, 2004
Something tells me we should write up a tutorial on joins or create a FAQ entry. I've made the point m5 notes above before -- sometimes you just want an left/right join to fail but don't want the entire row eliminated from the results set.

Victor
Jun 18, 2004
Two columns: userId and contentId. The PK is both of those columns.

Victor
Jun 18, 2004
Have you googled your question?

Victor
Jun 18, 2004
Wow, I can see how reading EXPLAIN results would be a bit painful. Compare that to MS SQL Server query plans:

Victor
Jun 18, 2004
How does the MySQL example actually do what you think it is doing? You are grouping by UserID, but only getting one row back -- do you only have one user? If you are convinced that MySQL really is doing the proper thing, do post the results, because I would be very surprised.

The TSQL version can be fixed by only assigning to variables, and not retrieving data. However, be warned that, at least if you throw in an ORDER BY clause, SQL Server is not guaranteed to populate your variable properly. If you'd like, I can yank a script that generates pretty CREATE TABLE code from INFORMATION_SCHEMA, which assigns to a @sql variable in the select statement in the style above, and breaks (only returns one row) if I add an ORDER BY ORDINAL_POSITION, in SQL 2005 (SQL 2000 works fine with that ORDER BY).

Edit: if you're using SQL 2005, do this properly by writing an SQL CLR aggregate. I have two such aggregates: dbo.JoinWithPipe and dbo.JoinWithCommaSpace.

Victor
Jun 18, 2004
What version of SQL Server are you using? I realize I misread your first post after you posted sample results, so I'm pretty sure you can't do what you want with variable assignment, but I think you can another way. SQL2005 offers a lot of sweet functionality that could potentially make this a lot easier.

Victor
Jun 18, 2004
This guy says cursors are the best method for doing running totals.

Here are a few different ways to do running totals. I don't have time to explain them now; I can do that later if necessary. Only one of them requires SQL Server 2005; it uses recursive CTEs.
code:
-- play with these (turn them on)
set statistics profile off
set statistics time off

create table #data (a int not null, b int not null)
insert #data
    select a = 1, b = 1 union
    select 1, 2 union
    select 2, 1 union
    select 2, 2

;with 
ordered as (
    select  *, row_number = row_number() over (order by a, b)
    from    #data
),
running as (
    select  a, b, nest = 0, row_number, sum = b
    from    ordered
    where   row_number = 1
    union all
    select  cur.a, cur.b, prev.nest + 1, cur.row_number, prev.sum + cur.b
    from    ordered cur
    inner join running prev on prev.row_number + 1 = cur.row_number
)
select * from running

select  *, (select sum(b) from #data where a = t.a and b <= t.b or a < t.a)
from    #data t

select  t1.a, t1.b, sum(t2.b)
from    #data t1
inner join #data t2 on t2.a < t1.a or t2.a = t1.a and t2.b <= t1.b
group by t1.a, t1.b

declare @a int, @b int, @sum int
declare c cursor local fast_forward for
select  a, b from #data

open c
set @sum = 0
create table #results (a int not null, b int not null, sum int not null)

fetch next from c into @a, @b

while @@fetch_status = 0
begin
    set @sum = @sum + @b
    insert  #results values (@a, @b, @sum)
    
    fetch next from c into @a, @b
end

select * from #results

drop table #results
drop table #data

Victor
Jun 18, 2004
That SQL is revolting, especially the table breaking part. The SQL formatter in the OP sucks, IMHO.

You have two tables in the first query that don't show up in the second query.
code:
select operation    = v_sp1operationtype.name,
       build        = dp_cbs_stack_version_revision.datapointvalue,
       arch         = v_processorarchitecture.name,
       sku          = v_vistasku.name,
       dateandtime  = h_sessionheader.clientsessionendtime,
       machine_name = dp_cbs_machine_name.datapointvalue,
       is_test      = ((h_sessionheader.flags & 16) / 16),
       hash         = hashbytes('SHA1',dp_cbs_machine_name.datapointvalue) + ...
       [b]-- result[/b]
from   dp_cbs_machine_name
       inner join (v_sp1operationtype
       inner join (dp_cbs_stack_version_revision
       inner join (h_sessionheader
       inner join ([b]v_processorarchitecture[/b]
       inner join ([b]v_vistasku full 
       outer join[/b] v_sp1operationresult 
         on v_sp1operationresult.sessionid = v_vistasku.sessionid)
         on v_vistasku.sessionid = v_processorarchitecture.sessionid)
         on v_processorarchitecture.sessionid = h_sessionheader.sessionid)
         on h_sessionheader.sessionid = dp_cbs_stack_version_revision.sessionid)
         on dp_cbs_stack_version_revision.sessionid = v_sp1operationtype.sessionid)
         on v_sp1operationtype.sessionid = dp_cbs_machine_name.sessionid
where  dp_cbs_machine_name.datapointvalue != '(null)'
       [b]and (h_sessionheader.clientuploadtime < (getdate() - .5))
       and v_sp1operationresult.name is null[/b]


select operation    = v_sp1operationtype.name,
       build        = dp_cbs_stack_version_revision.datapointvalue,
       [b]-- arch[/b]
       [b]-- sku[/b]
       dateandtime  = h_sessionheader.clientsessionendtime,
       machine_name = dp_cbs_machine_name.datapointvalue,
       [b]-- is_test[/b]
       hash         = hashbytes('SHA1',dp_cbs_machine_name.datapointvalue) + ... ,
       result       = v_sp1operationresult.name
from   dp_cbs_machine_name
       inner join (v_sp1operationresult
       inner join (dp_cbs_stack_version_revision
       inner join (h_sessionheader
       inner join v_sp1operationtype
         on v_sp1operationtype.sessionid = h_sessionheader.sessionid)
         on h_sessionheader.sessionid = dp_cbs_stack_version_revision.sessionid)
         on dp_cbs_stack_version_revision.sessionid = v_sp1operationresult.sessionid)
         on v_sp1operationresult.sessionid = dp_cbs_machine_name.sessionid
where  dp_cbs_machine_name.datapointvalue != '(null)'

Victor
Jun 18, 2004
Let's say I run a query on the table Thread. Then I add an inner join to Post. Will not the number of results balloon?

Victor
Jun 18, 2004

Ryouga Inverse posted:

Look at the two tables in question. A given install will only have one arch and only one sku; it's impossible to install the x64 and x86 versions simultaneously, for example.

So, given that I'm only doing inner joins on tables that have one-to-one relationships with the initial table, is the same true?
Have you verified that they are one-to-one? The only way I can see this not being the culprit is if I don't fully understand the semantics of full outer join when used like you have in your example. You can figure this out (and report back to me): select sessionid from all the tables in the first query and tell me if any are ever null.

quote:

Is there a way to construct a query such that I only get the results returned by the first query, that aren't returned by the second? I look at the queries and what I come up with is already in there (the OperationResult.NAME IS NULL bit) but all I'm really doing is stumbling in the dark.
If you were using SQL 2005, you could use the except operator. Instead, follow this pattern:
code:
select  a, b
from    table_name_or_derived_table t1
left join table_name_or_derived_table t2 on 
        (t2.a = t1.a or t2.a is null and t1.a is null)
    and (t2.b = t1.b or t2.b is null and t1.b is null)
where   t2.a is null
    and t2.b is null
Use derived tables that contain your two massive queries:
code:
from (
massive query here
) t1

Victor
Jun 18, 2004
Ok, if you have SQL2005.. just comment out the columns in each query that aren't found in another, make them in the same order in each query, then stick an except in between the two queries and try to run it.

With derived tables, you'd literally replace "massive query here" with your massive query. However, using except is much cleaner and leaves much less room for error, so try that!

Victor
Jun 18, 2004
noonches, you have to be kidding me. You're advocating an RDBMS which returns different results based on configuration like indexes. That means I could attempt to speed of a platform backed by MySQL by adding an index, and all of a sudden break the application. How on earth could that be construed in any way other than retarded?

This query can be made to work:
code:
SELECT DISTINCT search FROM searches WHERE userid = 1 ORDER BY timestamp DESC LIMIT 2
I don't see why LIMIT 2 can cause an incorrect result, as it is nonintuitive to me. I could blame it on MySQL being a toy RDBMS, but that doesn't help out the "OP". (freaking megathreads)

Try this:
code:
select  timestamp = max(timestamp),
        search
from    searches
where userid = 1
group by search
order by max(timestamp)
limit 2

Victor
Jun 18, 2004
Try the following:
code:
select  *
from (
    select  timestamp = max(timestamp),
            search
    from    searches
    where userid = 1
    group by search
) t
order by timestamp
limit 2
Maybe even try putting the "order by" in the derived table. (SQL Server does not allow this, but perhaps MySQL requires it.)

Victor
Jun 18, 2004

Mashi posted:

My question is: What is the cleanest / most efficient way to retrieve this data as if it were stored the regular horizontal way?
I just learned that MS SQL Server does a single execute per subquery, per row returned, in some cases. My my example, converting the subquery to a join (which might require a derived table) caused performance to increase from 1m27s to 2s. Who knows what other RDBMSes use; you did not specify one. (*cough* rules *cough*)

Victor
Jun 18, 2004

fletcher posted:

If you have a mysql database of hundreds of thousands of articles what is the best way to pull up the top 100 articles ordered by number of views?

I'm guessing something like this is pretty taxing on the db server since it does a full table scan?
Not if you index the views column. Now, depending on how often the number of views is updated, this index might be a performance problem. Try it out!

Victor
Jun 18, 2004
On EAV: if you have SQL2005, it becomes a lot prettier with PIVOT functionality. A lot prettier. Dunno about how badly it performs...

Dakha posted:

This is probably really easy to fix, but I'm trying select a list of material numbers, and their corresponding text descriptions. However, the table with the text descriptions has several entries for each material number, for different languages. I want to get the german description, unless it doesn't exist, in which case I want the english descriptions.

I'm currently using:

SELECT material_number, material_text
FROM materials
INNER JOIN mat_desc
ON materials.material_number = mat_desc.material_number
GROUP BY material_number, material_text;

This only works because language = 'DE' is ordered first by default, and 'EN' is ordered second. This seems pretty dodgy though. Is there a proper way of doing this?
code:
select  material_number,
        coalesce(de.material_text, en.material_text)
from    materials m
left join mat_desc de on de.material_number = m.material_number and de.language = 'DE'
inner join mat_desc en on en.material_number = m.material_number and en.language = 'EN'
This is if you can ensure the EN version exists. Otherwise, you need two left joins and some business logic for what to do when both joins fail.

Victor
Jun 18, 2004

deimos posted:

Yeah postgresql has the tablefuncs contrib for this, I was just wondering if anyone had a better idea. I am guessing I will make a product table and just foreign key an EAV-based table to it.
It really depends on your software. If you go the table inheritance approach, then it's possible that running DDL is part of standard application operation -- that's always viewed with extreme amounts of suspicion. However, just like any powerful functionality, there are ways to use it that will probably be OK.

quote:

Another question, do I separate the "design/definition" data and the actual data or do I put them in the same table with some sort of flag?
I would separate them. Otherwise, whenever you query for one type of data you have to filter out the other. That's just ugly. Not to mention that if you want to add a column to one set of data but not the other, you'd be screwed.

quote:

Since I don't think I'll ever have to do anything with the data other than load and save (in terms of RDBMS interaction, because I won't ever have to search it) I could even make it a blob and just use XML/JSON to store it (yes, this is relational heresy, I know, but I don't care).
If you never need the database to know more than that it's a blob of data, it's perfectly acceptable to store that blob of data in the DB as a blob. C.J. Date writes in Databases in Depth that one could have columns (he calls them attributes) that store tables (he calls them relations). Blobs become a problem when you start running, say, XPath, in the DB, against XML data. That's when things start getting retarded. Then again, SQL Server appears to be heading towards making that easier; I wouldn't be surprised if they were to allow creating indexes on certain XPath expressions or something. The important thing is to document what's in the blob/xml. That way, if anyone ever has to screw with this data, they will fall in love with you instead of wanting to murder you in the coldest of blood.

Victor
Jun 18, 2004

Triple Tech posted:

1) When should business logic be written at the database level versus the programming language level? Or is it something like "always in the DB unless you can't"?
If you need to be able to run insert/update queries directly against the database that run through your business logic, you pretty much have to include the business logic in the DB. However, many people make web services so that you write to the DB exclusively through the web service. The problem is one of scalability: computing all the business rules in the DB can get expensive. Scaling databases that aren't read-only is hard. Moreover, it can be better to express business logic with non-SQL syntax. My company puts almost all of the business logic in the DB because we can and don't run into scalability issues. That way, our application that talks to the databases of different clients (with vastly different schemas) doesn't need to have a clue about the business rules.

quote:

2) My coworker just told me of this technique I've never heard before, so I wanted to know how common place it is. When importing a set of data, like many rows, you create a temporary table that's the same as the table you want to insert the stuff into, then you use some crazy SQL command to import the stuff en masse, and then delete the temporary table. True? What crazy command is this? And you wouldn't just loop over the stuff and INSERT each row/item?
This technique can be extremely performant (yes, that's becoming a word, deal with it grammar Nazis). I can bulk insert tens of thousands of rows per second into a single-socket, quad-core SQL Server. If we had gigabet network, I wouldn't be surprised if I could get it up to hundreds of thousands of rows per second. In contrast, inserting rows one by one is much slower.

Victor
Jun 18, 2004
Doing a bulk insert into SQL Server with the utility bcp, inserting into a temporary table and then transforming it and inserting it into a permanent table, is still much faster than doing one-by-one inserts into the permanent table. At least, that's the way it is with SQL Server. If you don't need to do any keying (transforming a string such as "Mammography" to the ID of the row in some "dictionary" table that contains that code), then yes, you can insert directly into the permanent table. However, if you need to do transforms, and the permanent table has indexes on columns needing transforms, it will most definitely be faster to use a temporary table. Sometimes one must use a temporary table: what if the permanent table has a column for the ID for the code but not the code itself? (E.g. procedure_fk, which points to a Procedure record (id: 1, description: "Mammography"). Yes, I've been looking at a lot of hospital data lately.)

SQL Server has an analog to your Oracle example: OPENROWSET.

Victor
Jun 18, 2004
There's also the issue of needing to verify data integrity/quality before you shove it in a production table.

Victor
Jun 18, 2004
INFORMATION_SCHEMA.TABLES might exist.

Victor
Jun 18, 2004
Did you try
code:
select * from INFORMATION_SCHEMA.TABLES
?

Victor
Jun 18, 2004
Scarboy, I'd suggest doing a subquery, unless you found the performance prohibitive. Are you using MySQL? If so, you can use the ability to select non-aggregated fields that weren't grouped by to your advantage (note the use of coalesce):
code:
select   r.*,
         times_downloaded = coalesce(count(a.report_id), 0)
from     Reports r
left join Activity a on 
         a.report_id = r.id
     and a.created_on > r.updated_at
group by r.id
order by r.updated_at desc,
         r.created_on desc
Strictly speaking, everything in r should be grouped by. However, grouping by the id (assuming it is unique) effectively does that.

Adbot
ADBOT LOVES YOU

Victor
Jun 18, 2004
I'll probably make a thread on it one of these days, but for now: Using LINQ in SQLCLR.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply