Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
chocojosh
Jun 9, 2007

D00D.
So essentially I want to do:
1) EXEC 'Some 40 line select statement that dynamically picks the table'
2) Insert the result of (1) into a temp table
3) sp_executesql 'Parameterized search query into the table created in (2)'.

Suggestions for step 2? Any links on how to make temporary tables in SQL Server 2005, and anything I should know?

(More details, but not important)

I've started a stored proc that starts with a fairly complicated select statement to retrieve a bunch of information. The problem is that I'm using the same stored proc for different tables and I use EXEC with the table name at the top (Yes, I know this is horrible, but I don't want to have to write a new stored proc for each table).

After I get all this data, I need to filter it by a bunch of search conditions, based on the user's input and passed as parameters to the stored proc. For this I want to use sp_executesql (because it prevents SQL injection, as I learned when I used exec and entered ' in my search condition). sp_executesql will not accept dynamic table names or field names to be entered.

Adbot
ADBOT LOVES YOU

chocojosh
Jun 9, 2007

D00D.
Thanks for the link.

I'm wondering if there's any "gotchas" for using temp tables or table variables (and what's the difference between the two).


I'm not sure what you mean by

"It is generally easier and more efficient to have the Database do filtering than doing it your self. Unless you are filtering by non-indexed rows on a very large table."

chocojosh
Jun 9, 2007

D00D.

Victor posted:

Correct; all it allows is for parametrization and outputting scalar results to variables in the calling code.

Indexes cannot be created on table variables. You cannot access table variables in sp_executesql statements that were declared in the calling code. Table variables are local to the calling code, while temporary tables are local to the current connection (#) or all connections (##).

I want to know why Microsoft hasn't worked on passing tables around, or at least views that can be dynamically selected from without dynamic sql.

Alright, thank you! I'm going to use a table variable because I don't want it to be accessed out of the calling code or to anyone else in the connection. I found this Good overview of temp tables and table variables that also makes this choice clearer: http://www.odetocode.com/Articles/365.aspx

I think that support for selecting dynamically from views or tables would be great, and would be a huge help for me. The topic is discussed a bit here: http://www.sommarskog.se/dynamic_sql.html#Dyn_table -- I've been wondering HOW to implement his pre-processor idea.


Xae: Essentially I need to first get all my data, and then filter it by doing a second select statement with dynamic where conditions (i.e. I build up a where clause for each field the user can possibly search by). By filtering I meant a second select statement that can remove rows from the temp table; not some kind of filtering in my programming language :) I honestly wish there was an easier solution, but not if I want to use sp_executesql which I need to use (this makes the choco very sad).

chocojosh
Jun 9, 2007

D00D.

Xae posted:

Is there a reason you can't to it all in one statement? Is this a two step process, where a user selects one group then refines? Whenever possible I try to avoid Selecting the same data twice.

This is due to the limitation of sp_executesql.

I am choosing my table dynamically, based on an input parameter (I *know* you shouldn't do this due to lack of query caching). To do this requires using the exec statement which can make you a victim to sql injection.

sp_executesql protects you against sql injection; but, it does not allow you to choose your table name dynamically (it will only allow you to pass in parameters for your where clause).

I use exec to do the dynamic table lookup thingie, and then do an sp_executesql to be able to do a dynamic search (I use dynamic searching conditions because the search conditions vary based on the number of characters inputted, and doing a different statement for each combination would force me to have a lot of code duplication).

chocojosh
Jun 9, 2007

D00D.

Victor posted:

As long as you're smart and do replace(varchar_variable, '''', ''''''), you're safe from SQL injection. This isn't to say that parametrization is bad, just that it isn't a must.

You can do dynamic SQL with sp_executesql just like you can with exec(sql). sp_executesql only allows scalar parameters, which may be what you were getting at.

replace(varchar_variable, '''', '''''')


Is that the *ONLY* thing I need to do to avoid SQL injection? Can you provide me with a link (not that I don't trust you, but I'd rather not leave security holes in my software due to my sloppyness)? If so, I may just use one long exec string instead of the table variable.

chocojosh
Jun 9, 2007

D00D.
Bringing the discussion back to SQL Server 2005 for a quick moment.

There is one add-on that would help me save HOURS of work. I'd love it if I have some text highlighted, that one of my right click options would be to "go to definition" for a stored proc, user defined function (and even table, although I'd prefer just a select *)

For example, right now I want to see a stored procedure called up_extractfirstvalue because I am not sure how it is being used in my stored procedure. If I could simply highlight up_extractfirstvalue, right click, and select "Go To definition" instead of finding it in our list of thousands of stored procs, it would be so helpful!

chocojosh
Jun 9, 2007

D00D.

Alex007 posted:

In SQL Server management studio, just map CTRL-F1 to "sp_helptext".

Go to Tools > Options > Environment > Keyboard.

Then, all you have to do is higlight a storedproc / table / function, and press CTRL-F1.

Edit:



Great tip!

Would be better if it opened a new window and allow me to edit the query also (although I didn't ask for that initially, I know). :)

chocojosh
Jun 9, 2007

D00D.

Golbez posted:

The following search:

code:
 SELECT ... WHERE MATCH (text fields) AGAINST ('"test range"' IN BOOLEAN MODE) 
will return all results with that exact phrase. So, what if I want to find 'test ranges'? Is there a way to smarten up the mysql literal string search, or do I have to hack in plurals? Note that this gives me 65 results, but if I take it out of boolean mode - so that it's looking for each word individually - I get thousands of results, because a lot of records have either test or range. Any ideas?

I think you want the between clause.

http://www.w3schools.com/sql/sql_between.asp

chocojosh
Jun 9, 2007

D00D.

imBen posted:

Call me crazy but I don't think that's at all what he's looking for. He's asking how to stem words, not look for the items that fall in the middle of them.

My bad. I got confused by his example of "test range" and "test ranges". For some reason I thought he wanted to search a *test range* (i.e. between two values).

chocojosh
Jun 9, 2007

D00D.

kalleboo posted:

Ångström, or god forbid 甘い贅沢 will break your system ;)

I don't know about other DB systems, but MySQL lets you do

SELECT DISTINCT manufacturer
FROM planes
WHERE pilot_id='5'
ORDER BY manufacturer IS NULL, manufacturer

If you can't do that, how about the union clause (yes, I realize that this is probably not the most efficient)?

SELECT DISTINCT manufacturer
FROM planes
WHERE pilot_id='5' AND
manufacturer IS NOT NULL
ORDER BY manufacturer
UNION ALL
SELECT DISTINCT manufacturer
FROM planes
WHERE pilot_id='5' AND
manufacturer IS NOT NULL
ORDER BY manufacturer

chocojosh
Jun 9, 2007

D00D.

Jethro posted:

I think you meant to make one of those "IS NOT NULL"s an "IS NULL". Also, I think most DBs only let you have one ORDER BY clause which comes after all of the SELECT statments and orders the entire query.

Entirely correct; this is what happens when I post from memory (I don't use SQL every day at my job either).

chocojosh
Jun 9, 2007

D00D.

Walked posted:

Can someone help me understand:

http://msdn2.microsoft.com/en-us/library/ms186734.aspx

I am not getting what OVER does in this statement - I think I understand the remainder of how the paging works, but I'm somewhat new with SQL and I dont really "get" the OVER clause.

:confused:

edit: Or feel free to suggest me a paging solution for SQL Server 2005 - I'm trying to avoid using the ASP.NET PagedDataSource :downs:

Regarding paging, for .NET 1.1 (what we use at my company) we have some code to return only the approrpiate records because from what I've been told .NET needs to call the stored procedure each time the page index is changed (thus if you have 1 million records and you're displaying 50, you have to get the entire million each time you change a page). I've also been told that this has been remedied in .NET 2.0, but I honestly don't know.

Anyways, the basic idea of our paging solution is to pass in the page number as a parameter to the stored procedure and then use that page number parameter with a predefined page size and the ROW_NUM to determine which page to return.

code:

DECLARE @FirstRec int, @LastRec int
SELECT @FirstRec = (@CurrentPage - 1) * @PageSize
SELECT @LastRec = (@CurrentPage * @PageSize + 1)

SELECT STUFF
FROM TABLE
WHERE ROW_NUM > @FirstRec AND < @LastRec
--I may have messed up the > and < signs.. perhaps > <= instead..
I recommend googling "custom paging asp.net" with the type of ASP.NET control you will be using (i.e. pageddatasource).

The website http://dotnetjunkies.com/Tutorial/EA868776-D71E-448A-BC23-B64B871F967F.dcik is decent except for two modifications I would make to their code:

1) Don't use the Identity column, use the ROW_NUM. It is possible to have gaps in the Identity column after you delete records and then you will get pages that are of different sizes (i.e. page 2 will return records 11-20
2) The code for C# code behind, instead of doing all that stuff about enabling and disabling the next and previous buttons, set the virtualcount property (from the total number of returned records) and the current page index, and the datasource should take care of figuring out what buttons to display.

chocojosh
Jun 9, 2007

D00D.
Is ON DELETE CASCADE evil?

For some reason at my company we don't use ON DELETE CASCADE. I've been told it's to prevent accidental loss of data. On the other hand, there is minor frustration when the database schema gets updated and any stored procedures that deletes the referenced table of the newly added foreign key needs to be modified.

Opinions of experienced database designers/developers?

chocojosh
Jun 9, 2007

D00D.
Yes, but you can add a 25 character salt. This way, even if your user enters a password of 8 characters, the actual password is 33 characters. That's going to take much more time to crack than an 8 character password.

Also, adding a salt makes it much harder for someone to figure out what type of hashing you're using. There are certain popular passwords (e.g. "password"). If someone on your system uses that password and it is unsalted, it is very easy to tell which hashing scheme is used (just analyze a table of different hashing schemes for the value "password").

chocojosh
Jun 9, 2007

D00D.

noonches posted:

Also, I've actually never seen pipes used for string concatenation in mysql before, news to me.

I believe that's in oracle. I recall seeing | used for concatenation in oracle for my school courses.

chocojosh
Jun 9, 2007

D00D.

LightI3ulb posted:

My boss wants a table in our MySQL database that stores customer contracts in pdf format. Is this workable?

I found this, http://www.perlmonks.org/?node_id=150255 , which applies directly to me (LAMP environment [Perl]), but I'm still not sold that it's a good idea.

What I believe you want (although I have never coded it myself) is a binary large object (BLOB). I scanned the article you posted and it was talking about mySQL 4.0.1, which is outdated. I would suggest to try using just a regular BLOB data column and check the performance before going for some fancy solution.

Hopefully this should get you started: http://www.google.ca/search?hl=en&client=firefox-a&rls=com.ubuntu%3Aen-US%3Aofficial&hs=IM1&q=blob+pdf+file+mysql&btnG=Search&meta=

chocojosh fucked around with this message at 22:25 on Mar 28, 2008

chocojosh
Jun 9, 2007

D00D.
I have a question about database design. I'm not sure if it should be here but I would prefer to avoid making a separate thread.

Essentially, if you have two database tables (entities) who share a relationship with cardinality 1 to N, where N is a positive number, what do we do?

For example, I built a website for a trip I am organizing. Users can make an account and enter which hotel they want, which bus they want, and if they want certain extras. They pay in person at our student club's office, and then I enter the payment information into the database also.

I know that there is going to be a maximum of four people in any hotel room (and obviously a minimum of one). Thus, I made a table Room with fields Person1, Person2, Person3, Person4 where those four fields store the username of the appropriate user.

I can see how this is bad design and very inflexible. However, the system is meant to be updated once a year (as we offer the trip once a year to our members) and I figured it was more important to get the site done then worry about "future-proofing". However, I was wondering what better solutions there are (make a joining table called UserRoom that would have a trigger on insert to make sure no more than 4 people are in a given room?)

chocojosh
Jun 9, 2007

D00D.

GroceryBagHead posted:

Join table is the way to go and let your application handle validation of how many people you want to cram per room (maybe it will be 6 next year)


I think for next year I'm going to have to update the design with a better table schema.. I've had to write a few reports two nights ago and the code was quite bad: (User U JOIN Room R ON U.UserName = R.Person1 OR U.UserName = R.Person2 OR U.UserName = R.Person3 OR U.Username = R.Person4).

The current table schema has met my needs for this year though, so at least I got something working on my first try :)

Zombywuf posted:

Just say 'no' to triggers.

Can you give me some reading material as to why? At my job the DBAs do not use triggers (I was told we have one in the entire application). I'd really like to know more about any dangers/problems with triggers.

chocojosh
Jun 9, 2007

D00D.

Aredna posted:

Second: Does anyone have experience automating SQL query parameters from an e-mail? The e-mail is in a set format every day and is just plain text so a number of tools could be used to parse out the data. The problem is that I have no idea how to go taking the e-mail from when I receive it in Outlook and getting it to the SQL server to run.

I'd recommend making a C# console application. The C# application would query your mail server, get the e-mail body text, parse it, and then make the appropriate queries in the database.

I have never actually accessed e-mail from C#, but I am pretty sure that there must be some way to do it. You could always ask on the .NET megaquestion thread. Accessing sql server from C# is fairly trivial (where I work we use the wonderful SqlHelper class to wrap low-level details -- see the article here: http://aspnet.4guysfromrolla.com/articles/062503-1.aspx).

Since the application will compile and result in an exe, it can be set to run every day through windows scheduled task manager.

chocojosh
Jun 9, 2007

D00D.

Jethro posted:

You could use sp_executesql instead of EXECUTE, but I don't think it really gains you anything in this case. Either way, this is definitely a case of needing dynamic SQL, assuming there isn't a totally different way of doing what you want to do.

I may be mistaken, but I believe that sp_executesql only allows for parameterization on the WHERE clause. That is, you cannot use sp_executesql to dynamically select different fields/tables (SELECT and FROM clauses), but it can be used for a dynamic where clause. This would be because sp_executesql does make an execution plan that it will re-use assuming you pass the exact same string to sp_executesql (note that you can use different parameter values as the string passed to sp_executesql takes a list of parameters).


Talking about dynamic sql, I found this paper was really helpful when I first had to work with it: http://www.sommarskog.se/dynamic_sql.html

chocojosh
Jun 9, 2007

D00D.

Jethro posted:

That was my impression as well, which is why I didn't think that TT would get much out of using sp_executesql instead of EXECUTE.

Except that I don't think TT CAN use sp_executesql -- it should be an error.

Note that sp_executesql is drat awesome for dynamic where conditions (think many different search criteria that don't need to apply).

chocojosh
Jun 9, 2007

D00D.

yaoi prophet posted:

Do not ever, under any circumstances, commit or rollback in your procedure. Doing so ruins the whole transaction model of Oracle, which is based on the idea that an entire set of operations can be transparently undone at the client level. If individual procedures decide to commit/rollback at their own time, it literally breaks the integrity of the client's transaction.

...

Edit: Personally, I think the keywords COMMIT and ROLLBACK should not even be allowed in pl/sql code. They just don't belong. If you really have the need (like with a one-off script, as is sometimes needed), you should have to run it through EXECUTE IMMEDIATE, just to drive home how gross it is to be doing it.

Does this apply to C# development (with SQL Server)? Our stored procedures are full of CREATE/COMMIT/ROLLBACK trans.

chocojosh
Jun 9, 2007

D00D.

Code Jockey posted:

Okay gentlemen, I need some clarification.

My boss is under the assumption that grouping a fuckton of queries in a single SqlCommand execution classifies as a "transaction". Like so:

"insert into blah values(something, something);insert into blah values(something2, something2);insert into..." and so on.

Now, I'm trying to convince him to let me rewrite our transaction wrapper [which basically, I'm not even joking, concats a string of queries together like that] into an actual, real transaction system. My question is, let's say we execute the above pile of queries, and detect an exception, and execute a rollback. My understanding is that the rollback would only rollback the last executed query in this case, so if it fails at the fifth query in the list, the other four would still stay, if it failed on the second, the first one would stay, etc. This is causing serious loving problems and needs to change - I just want to make sure that I have my facts straight, since he's the type to be dead set in his ways unless I can point out exact reasons why things aren't written right.

It should be noted that my boss doesn't have a CS degree, and really... really needs to not be near SQL, obviously.

For reference, we're running MS SQL 2005 and the pages/libraries running queries are in a mix of C# and VB.NET 2005.

Why don't you just make a small test and show him some results? :)

It shouldn't take too long. Either throw an error (raiserror) or deliberately make one of the statements break (e.g. delete a record that is being referenced by a foreign key and on cascade is not set).

chocojosh
Jun 9, 2007

D00D.

Code Jockey posted:

I totally agree. I remember when I first encountered this when I was rewriting a page which created customer records, and I saw like 4 inserts jammed into one line. He told me "Yeah, that basically makes a transaction..."

It was a good preview of things to come. I won't even start into the internal, catch-all, horribly bloated "data access layer" he wrote, which has the fun habit of a) eating exceptions and not returning them, and b) returning null if there were zero results returned from a query or an exception was thrown.

Just let that last line sink in for a second.

Needless to say, I only use it for really, really, basic dumb stuff like dropdown list population.

e. I'll probably hit that Coding WTF thread later, I've got stories.

Please do, I'd love to see them :)

chocojosh
Jun 9, 2007

D00D.

TiMBuS posted:

So yeah, for a situation like this, would I just dump everything in one table, or would I make (potentially)thousands of tables? Or, maybe there is some kind of alternative data structure that I haven't thought of? You tell me.
Your answer will probably affect every database I ever make in the future here.

I'm sure some of the very talented DBAs can answer you better than I can. I'm going to take a stab at it though anyways.

First normal form states that all database entries should be atomic. Note that atomic means that it conceptually cannot be broken down further. There should be only --one-- piece of information Strings are considered atomic even though you can extract a substring. You can also combine several related fields as long as there is only

Another approach would be to make N different fields if you assume you can have up to a fixed N of values (e.g. assume you can have up to 10 forum threads and make 10 fields Thread!, Thread2, ... Thread10). This is a quick solution for something small/dirty, but I would not recommend it as a good practice (I used this approach for my last project which was a website that will only need to be updated once a year and I was running late on it).

Why should you avoid composite/non-atomic values? I'm not too sure. I would think it's because the relational model is meant to handle joins much better than it is in handling substrings. It also becomes difficult to enforce referential integrity if the values are non-atomic.

I don't think it's a problem to have 1000s of tables if your problem domain is so big that you'll actually need one thousand. I know at my company our web application (I'd estimate 0.5 M - 1 M LOC) has a few hundred database tables, and it's never been a problem.

chocojosh
Jun 9, 2007

D00D.

Rebus posted:

=> SELECT count(*) AS number FROM hotel WHERE number >= 3 GROUP BY city;
ERROR: Attribute "number" not found


:(

Edit:

Ahhh thanks!!! This fixed it:

SELECT city, count(*) AS number FROM hotel group by city;

You need to add the HAVING clause

I recommend you read over the W3C SQL tutorial: http://www.w3schools.com/sql/default.asp

It's very brief and won't give much detail, but it's perfect as a short introduction or to point you in the right direction if you have an idea of what you want.

chocojosh
Jun 9, 2007

D00D.

Grigori Rasputin posted:

My bad, that's supposed to be "SELECT NAME FROM [table] where ID=x", I have it the other way around which makes no sense.

I'm using a .NET datasource control to run the stored procedure for use in an ASP.NET page. I have the param direction set to out in the control, but the out param ends up null everytime. I had it working for integers with an explicit RETURN, but those apparently don't work with strings.

Google found me this: http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=97795

Where I work we use the SqlHelper class (from Microsoft Application Blocks). While several years old, it's a great way to get what you need from Sql Server in much less code. See the article: http://aspnet.4guysfromrolla.com/articles/070203-1.aspx

With SqlHelper, to retrieve a string bet would be to use SqlHelper.FillDataSet() method in C# and in Sql Server to "Select" the string you want to output.

*Waiting for victor to say how SqlHelper is horrible and inefficient :)

chocojosh
Jun 9, 2007

D00D.

epswing posted:

-- does exactly what i want, but i get the feeling there's a better way
select s.name as size, sum(case when p.sid is null then 0 else 1 end) as total
from size s left join (select sid from person) p on s.id=p.sid
group by s.id
order by s.id;

Hmm, I haven't tested this but shouldn't you just be able to:

select s.name as size, count(*) as total
from size s left join person p on s.id=p.sid
group by s.id
order by s.id;

http://www.java2s.com/Code/SQL/Select-Clause/COUNTandGROUPBY.htm

chocojosh
Jun 9, 2007

D00D.

epswing posted:

I'm almost positive I've tried exactly that, no dice.

code:
|size | total|
|-----+------|
|S    | 2    |
|M    | 3    |
|L    | 1    |
Incorrect total for L. It works if the true total is at least 1 for each size.

http://www.w3schools.com/sql/func_count.asp This says that it will return the count of NON-Null values.

As there is ONE record containing the value L after the left join, the COUNT(*) for L is 1. However, the p.id field should be NULL (you can confirm this by doing just the left join and taking out the where conditions). Thus, replacing the count(*) with count(p.id) should fix it... in theory. This is untested :)

chocojosh fucked around with this message at 19:52 on May 26, 2008

chocojosh
Jun 9, 2007

D00D.

Casimirus posted:

Is there any way to access external databases dynamically from within a stored procedure?

I'm using MSSQL 2005, and I figured out that I can access external databases if I know their name ahead of time, like
code:
SELECT * from master.sys.databases
What I can't figure out is how to take a database name passed to the stored procedure and then access things like
code:
SELECT * from [@otherdatabase].atable
I know that it's not best practice or whatever, but this is for something to be run manually. I guess I could build my statement in a string and call that system stored procedure that executes SQL, but that would be a nightmare. If I do have to go that route though, will that gently caress up a transaction?

I"ve done a lot of work with sp_executesql. sp_executesql is miles above EXEC (checks for sql injection AND can create/store an execution plan!) and is worth the extra hassles in using it.

I don't see why sp_executesql would gently caress up a transaction. I might suggest though that you don't do a transaction in it, but that you use @@ERROR (Sql server 2000) or TRY/CATCH (Sql server 2005) in the caller of the sp_executesql to handle your transaction.

code:
BEGIN TRY

BEGIN TRANS
  //BUILD SP_EXECUTE SQL string
   sp_executesql <string>,
       @param1 = @value1,
       @param2 = @value2,
       ...

COMMIT TRANS
END TRY

BEGIN CATCH
   ROLLBACK TRANS
END CATCH

chocojosh
Jun 9, 2007

D00D.
At my work we have a C#/ASP.NET application that all use the same database on one development server.

When I go to use the profiler to see the parameters of a stored procedure that is called ("New Trace"), all the developers' stored procs are shown. It becomes annoying sometimes to figure out which

There's an NTUserName column that lists the user who is calling the database command. Is there any way to have it work through the application based on the user's windows login? I.e. If I'm logged in to windows as jk, run the C#/ASP.NET application off localhost and then call a stored procedure through a database connection, is there any way to specify the NTUserName column that will appear in the trace?

chocojosh
Jun 9, 2007

D00D.

Alex007 posted:

On my servers, the NTUSername isn't always filled, but I prefer to filter by SPID anyway. run sp_who2 to list the connections, decide which one you want to trace (you can see usernames and machine names here) and enter that SPID in the trace filter.

sp_who2 returns *10* different results for my user. I think that it's making one different spid for each "New Query" tab I have opened in Sql server studio. Also, it seems that if I log out and log back into my application I have a different spid.

Is there a way to filter the results of sp_who2. Something similar to

SELECT * FROM "exec sp_who2" where
Login = mylogin
and hostname = myhostname

Also, I just realized something -- our web.configs are setup to use a standard username/password to access the database, even though they setup windows authentication for when I go into sql server 2005. Could this be a good reason why my ntusername isn't showing up? Then again, why can't it use the hostname field returned by sp_who2? :(

chocojosh fucked around with this message at 16:19 on Jun 4, 2008

chocojosh
Jun 9, 2007

D00D.
How would you go about importing a CSV file into the database with some custom logic and be able to report errors (ideally by sending an e-mail)? We're using SQL Server 2005.

Essentially each row in the CSV file should have an ID that matches a record in a database table. If there is a match, then there is an insert done in a separate table using the values from the CSV file. If there is no match, then we may need to report an error with the information from the CSV file. If there are rows in the DB table that are not in the CSV file, then we may need to do an update or report an error depending on the values in the table.

My supervisor suggested taking a look into Sql Server Integration Services, which I know absolutely nothing about. Any recommendations for where to learn about SSIS or if it can do what I need it to do? Any suggestions on other tools to look into.

This would be running from a windows service that would execute the script once a day automatically.

chocojosh
Jun 9, 2007

D00D.

Ardhanari posted:

Look into BULK INSERT.

Bulk insert doesn't seem to support custom rules.

I'm guessing the best thing to do would be to first check the database and ensure that our input is valid. If there are any errors, report them first, and keep the valid data and use bulk insert on the valid data?

chocojosh
Jun 9, 2007

D00D.

xenilk posted:

From a previous code I made a query (that varies alot and as you can see it can create a lot of inner join.

GALLERY_ID and CATEGORY_ID are indexed aswell
code:
SELECT   DISTINCT (`GALLERY`.`GALLERY_ID`)
FROM     `GALLERY`
         INNER JOIN `GALLERYCATEGORY` AS `A`
           ON `A`.`GALLERY_ID` = `GALLERY`.`GALLERY_ID`
         INNER JOIN `GALLERYCATEGORY` AS `CAT38`
           ON `CAT38`.`GALLERY_ID` = `GALLERY`.`GALLERY_ID`
         INNER JOIN `GALLERYCATEGORY` AS `CAT52`
           ON `CAT52`.`GALLERY_ID` = `GALLERY`.`GALLERY_ID`
         INNER JOIN `GALLERYCATEGORY` AS `CAT53`
           ON `CAT53`.`GALLERY_ID` = `GALLERY`.`GALLERY_ID`
         INNER JOIN `GALLERYCATEGORY` AS `CAT74`
           ON `CAT74`.`GALLERY_ID` = `GALLERY`.`GALLERY_ID`
         INNER JOIN `GALLERYCATEGORY` AS `CAT96`
           ON `CAT96`.`GALLERY_ID` = `GALLERY`.`GALLERY_ID`
WHERE    (`A`.`CATEGORY_ID` = 82)
         AND (`CAT38`.`CATEGORY_ID` != 38
              AND `CAT52`.`CATEGORY_ID` != 52
              AND `CAT53`.`CATEGORY_ID` != 53
              AND `CAT74`.`CATEGORY_ID` != 74
              AND `CAT96`.`CATEGORY_ID` != 96)

ORDER BY RAND()
LIMIT    1;
I'm thinking that the Inner Join makes it very painful for the SQL server since it has to query the table over and over again for the same field. Is there any more efficient way to do this?

Thanks!

Why are you joining the same table 6 times?

Couldn't this work:

code:
SELECT (DISTINCT G.Gallery_ID)
FROM Gallery G
INNER JOIN GalleryCategory GC ON G.Gallery_ID = GC.Gallery_ID
WHERE GC.Category_ID NOT IN (38,52,53,74,82,96)

chocojosh
Jun 9, 2007

D00D.
Mezz: Thanks for the clarification, I misread the = for != in the first WHERE statement (and this was a great example why when you ask for help you should also explain *what* you want to do instead of your solution :))



Regarding SSIS: my supervisor insists that I use SSIS for this little tool I'm working on. I think they're hinting that I test out a new technology for the company (I've shown interest before in learning new stuff).

Any books you'd recommend? I've been given permission from the CTO to go pick up a book and have it expensed by the company.

Jethro: Thanks for the BOL page. I'd rather have a real book to start off with (I tend to prefer real books over online books for learning stuff. I prefer using online books/articles when I'm already comfortable with the tool/language and need to do a very specific task).

chocojosh
Jun 9, 2007

D00D.

Jethro posted:

Just for further clarification, this means that, unless you specifically know that you want to discard duplicates you should use UNION ALL, since UNION forces the RDBMS to check for duplicates, even if there aren't any.

And for even more clarification:

UNION ALL is O(n) and UNION is O(n ^ 2) (n = number of records in both sets).

Why? Because UNION ALL simply takes both sets and pushes into a new table, an O(n) operation. UNION is O(n ^ 2) because for each record it goes to add, it needs to check all the previous records for a match (which takes 1 + 2 + 3 + ... + n - 1 = (n - 1)(n) / 2 operations).

chocojosh
Jun 9, 2007

D00D.

Factor Mystic posted:

In this case, is 'SELECT DISTINCT ...' the proper way to prevent duplicate rows in the result?

Yes. http://www.w3schools.com/sql/sql_select.asp


I just did a simple test to reassure myself that DISTINCT applies to all columns in the select clause, which is does.

code:

create table #Temp (
  col1 varchar(10),
  col2 varchar(10))

insert into #Temp values ('0', '2')
insert into #Temp values ('1', '2')
insert into #Temp values ('1', '2')
insert into #Temp values ('1', '2')
insert into #Temp values ('1', '3')
insert into #Temp values ('1', '3')
insert into #Temp values ('2', '2')

select * from #Temp

select distinct col1, col2 from #Temp

Drop table #Temp
The select * returns:

code:
0	2
1	2
1	2
1	2
1	3
1	3
2	2
The select distinct returns:

code:
0	2
1	2
1	3
2	2

chocojosh
Jun 9, 2007

D00D.

No Safe Word posted:

I've been working with SSIS for about a year now and the two Wrox books I used to get started: Professional SQL Server 2005 Integration Services and the Expert counterpart were both pretty good.

Also, reading through Jamie Thomson's blog entries are good too: http://blogs.conchango.com/jamiethomson/

Awesome. I've spent a few hours playing with SSIS and the Professional Wrox book came in from amazon on Friday. I'm starting to understand how it all fits together.

The main thing I have left to do now is to figure out how to get the errors from my Data Flow task (recordset destination, datareader destination, variable) to an Execute SQL task. I need to be able to call a stored procedure for each erroneous row (perhaps also call one stored proc for the entire batch of rows). I imagine in the Data Flow I just set output destinations and then I go to the control flow and link my Data Flow task to multiple Execute SQL tasks?

Zombywuf: As a coder I much prefer using plain old SQL/.NET to handle this kind of work, although I can see how there is a market for it in suitably large systems.

Adbot
ADBOT LOVES YOU

chocojosh
Jun 9, 2007

D00D.

No Safe Word posted:

Use the OLE DB Command. Getting stuff outside the Data Flow is (as far as I know) impossible*, the control flow has no "data" between its tasks.


* - unless you use package variables which are scoped to the entire package, but obviously you can only do so much with that

I may not be able to use an OLE DB command for what I need. This is what I have:

Right now in my dataflow I'm joining a CSV file and a database table (to import daily information that we get from an external source).

What I'm doing now is that if there are values that exist in one source and not the other (obtained from a FULL-OUTER JOIN and checking for NULL), I want to send an e-mail for each value (record/row) that is not in both lists (there's a few other rules, but that's the gist of it).

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply