Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
chocojosh
Jun 9, 2007

D00D.

Jethro posted:

You might be able to use an OLE DB Command for that, but I think your best bet would be to store that information in a different table and then run a SQL task (or whatever) afterwards. What you'd want to do either way is use a multi-cast to create a second copy of the data, then have one copy go to the destination tables as normal, while the other copy goes into a conditional split, which you use to discard the "good" rows. The "bad" rows then either go through your OLE DB Command, or they go to another table for use outside of the Data Flow. If you don't want your "bad" rows to go into the destination tables, then you don't need the multi-cast, just have the conditional split send the good rows on to the destination instead of discarding them.

First, I'm a moron for not realizing that the OLE DB Command is in the Data Flow and not in the Control Flow. It *is* exactly what I needed.

In this instance I *don't* need the multicast -- currently some people are inputting the data manually every day and there is a CSV file already created.

Thanks for your help! Did you find it a bit intimidating at first to learn how to use SSIS? I'm having some trouble because I have to think: *ok, I know how to do a join in SQL, but how do I do it here? --scan the list of tools-- Hmm.. this merge join seems interesting.. let's google it.. oh, that actually is the right thing.. ok.. let's figure out this dialog box now...*. :)

Adbot
ADBOT LOVES YOU

chocojosh
Jun 9, 2007

D00D.

MrHyde posted:

I have a list in TSQL ('1','2','3') and I want to select them as a table so I can compare them, so I'd like to say something like
code:
SELECT * FROM ('1','2','3')
and get the result as a table in the form

code:
+-----------+
|  COLUMN:  |
+-----------+
| 1         |
| 2         |
| 3         |
+-----------+
Is there an easy way to do this? For some reason I'm totally unable to figure this out.

Dirty/quick way is to use a temp table. I've seen stuff using recursive CTEs but I've never had to use recursive CTEs before.

SET NOCOUNT ON
CREATE TABLE #Temp (col1 int)
DECLARE @Index int
DECLARE @MaxValue int
SET @Index = 1
SET @MaxValue = 10
WHILE @Index <= @MaxValue
BEGIN
INSERT INTO #Temp VALUES (@Index)
SET @Index = @Index + 1
END
select * from #Temp
DROP TABLE #Temp

chocojosh
Jun 9, 2007

D00D.

MrHyde posted:

Thanks for the help. I wasn't really clear. I don't want a list of integers, I just want a list of comma separated values I have, a better example would be ('firstVal','2ndval','3rdval')

I guess the actual problem is I have a list of values from an outside source and I have a table. I want to figure out which values are in the list, but not in the table.

If the "outside source" happens to be a file you may be able to use bulk insert to load the data into a (temporary) table.

Then you have two tables, your temp containing the values from the list and your actual DB table. At this point you can do an outer join or a subquery with NOT EXISTS to compare the tables.

chocojosh
Jun 9, 2007

D00D.
In SSIS what is the best way to connect two different data flow tasks. Basic googling said I cannot use a recordset destination/source to share data and that I should instead use a raw file. I'd prefer not to use a raw file as I don't see why I should have to write to disk when I just want to store a bit of data between data flow tasks.

Edit: Another simple SSIS questions:

I have a CSV file that I import. On each line of the CSV file I store the same value in the last field (auto-generated from a program). I want to store that value into a variable so that I can use it elsewhere that may not have access to the CSV file. How can I do that? Would I need to first open the file with an execute script task?

chocojosh fucked around with this message at 15:52 on Jul 9, 2008

chocojosh
Jun 9, 2007

D00D.

No Safe Word posted:

The first question is "why does this have to be two DFTs?", though I'm sure you considered that.

The crappy but pretty reliable way is to use global temp tables and setting your connection's RetainSameConnection property (under the Expressions) to True. Little muss, little fuss (except that you have to create the table in order to use it as an OLE DB src if you don't want to write a query to query it, and that you have to drop the table at the end of the job).

The "standard" way is raw files, but that involves getting file space somewhere.

Another way, kind of like the first one, is a permanent "staging" table, which is basically like the global temp table except it's a real table that you keep between runs. Obviously you can truncate it as needed to keep the space usage down. I'm a fan of this one where it's feasible.

Why is the first way crappy if I may ask? I think I'd prefer to avoid creating a permanent table in our DB (over 500 tables for entire application) and I'd rather not create a raw file (may be irrational of me,


I have my SSIS package divided into three tasks.

Task 1: Import and filter errors (each error will have an e-mail sent so the appropriate person can fix it manually)
Task 2: Insert records (two out of three cases)
Task 3: Insert the final set of records (these records depend on data from task 2).

I admit that I could combine task 1 and task 2 but my supervisor found my first task a little bit complex as we're both new to SSIS (~20 elements [mainly error handling]) and suggested that I separate it. While it may not be the SSIS way, I do prefer smaller functions/tasks that are modulated. If I'm doing something horribly wrong please let me know.


BTW, another silly quick question -- is there a way to easily REMOVE columns from the flow? I have a bunch of derived columns each creating the same column (text of an e-mail) then I have a union all to combine all of these derived columns. After the derived column is created I *don't* want any of the other columns in the union all (as I've already created the error e-mail text).


Many, many, thanks for your help in figuring out how to actually *do* things.

chocojosh
Jun 9, 2007

D00D.

No Safe Word posted:

Because global temp tables are icky to me. Temp tables in general are icky, global ones even more so.

This all sounds fine and yeah the global temp tables or real/actual staging tables are what I'd do.

1) The derived column can replace a given column but I prefer never to do that because:
2) You can choose which columns you want to bring together in the Union. Right-click a row for a given data column and just choose "Delete" and it won't be passed through the Union. Even if you didn't already have one I sometimes just put in a one-input Union to do nothing but eliminate unnecessary rows so I don't get 6-10 warning messages in my logging db every time I process a row because not all the columns are used.

Supervisor didn't like any of the three solutions that much for passing data between flows (flat out refused a global temp table and is hesitant for similar reasons as me to use temp raw file). We may just end up shoving it all in one task (the error handling is, as always, the biggest part of the code ironically). It's a huge help though to be able to tell him "here are your options, pick one".. even if we don't like any of them!

Thanks for the union tip. I can't believe I missed that (or why they couldn't put a drat button called delete as they do elsewhere).

chocojosh
Jun 9, 2007

D00D.
My learning of SSIS goes from "Hmm.. this tool is pretty neat and I love the concept but they need to get the interface improved and a few bugs fixed" down to "why the hell can't I do something so basic?".


I have a small stored procedure that I have a few custom RAISERROR calls or that an error can be thrown when attempting to insert (constraint being invalidated). I want to get the error message so that I can e-mail a detailed error message to the business analysts (not developers) who can use the UI to make a quick change on production.

Attempt 1: Use an OLE DB Command and set the error to redirect row
Result: The row gets redirected but I can't access the error description. I can access an integer error code that I don't need.
I found this later on (after attempt 3) but it doesn't give detailed error messages. For a unique constraint failing it said something along the lines of "Integrity constraint failed".

Attempt 2: Use an OLE DB Command and set the error to fail component
Result: It would fail the component completely which we don't want (we want to attempt to insert as much data as possible and send error e-mails for those that fail. Also, if I add an onerror handler to send an e-mail, I would get nice e-mails about how a worker thread failed; a great thing to send off to a business analyst.

Alright.. I walked away and took a break for 20 minutes and then thought of another trick.. let's use a stored procedure output parameter.

Attempt 3: Use an OLE DB Command and use an error message output parameter with a conditional split
Result: Couldn't read the output parameter. I've read online reports that it does and doesn't work. I gave up after an hour and a half of playing with my stored proc and SSIS.

Attempt 4: Use google and hope like hell I can find something. I end up finding a blog post that describes a way to get the output parameter from a script task.
Result: I had to do some fiddling with the connections but it *worked* (for some reason my Script Task requires an ADO.NET connection whereas my OLE Sources / Destinations need an OLE destination to the exact same database).

Final solution:
However, I don't like using output parameters for error conditions -- that's why we have exception handling. Consequently, I decided to use an exception. Modifying the script just slightly I catch the exception in the script task and then set it to the read/write variable of the script task instead of using the output parameter.

In summary I've spent about 1.5 days just trying to answer the problem of "How can I send a nice error message to my business analyst/future developers so they know exactly what to fix". What a major pain in the rear end.

chocojosh
Jun 9, 2007

D00D.

No Safe Word posted:

Not much to add except, "yeah, that sounds about right". :smith:

It's not my tool of choice, I just happen to be pretty well versed in it these days. Tomorrow's my last day with SSIS for a while though (new project at work!), huzzah.

Well, thank you! I feel much better knowing that I'm not incompetent for spending a day and a half to figure out such a small thing (and luckily my boss is very laid back).

chocojosh
Jun 9, 2007

D00D.

zim posted:

I have no hair left to pull out...

I have a table called table...

Do you expect us to do your homework?
http://www.w3schools.com/sql/sql_groupby.asp

chocojosh
Jun 9, 2007

D00D.

MoNsTeR posted:

We use this as an interview question. There are at least 5 available techniques, but somewhere between 50% and 75% of interviewees fail to provide even one working solution.

To give this post more content, the 5 I know of are:
- MINUS
- NOT IN
- NOT EXISTS
- NOT ANY (Oracle only? Bizarre syntax that I never use)
- null outer join

What is a NULL outer join? I thought it would be a "LEFT OUTER JOIN" (you could use Right outer join but in practice I've only seen left).

I do like the subquery answers -- a good trick would be to ask for TWO ways to do it :argh:

chocojosh
Jun 9, 2007

D00D.

Safety Shaun posted:

How do I make this query:

SELECT `ATT_IP`,
`ATT_TIMESTAMP`,
COUNT(* )
FROM ATTEMPTS
GROUP BY `ATT_IP`
ORDER BY `ATTEMPTS`.`ATT_TIMESTAMP` DESC

Spit out the most recent timestamp associated with each group of ATT_IP?

Shouldn't that give you a syntax error since you're not grouping by ATT_TIMESTAMP or using ATT_TIMESTAMP in an aggregate operation? I think you want:

code:
SELECT   `ATT_IP`,
         [b]MAX(`ATT_TIMESTAMP`)[/b],
         COUNT(* )
FROM     ATTEMPTS
GROUP BY `ATT_IP`
ORDER BY `ATTEMPTS`.`ATT_TIMESTAMP` DESC
Here's some sample code I whipped up quickly

DECLARE @Temp TABLE (
A_IP nvarchar(100),
A_TimeStamp int
)

insert into @Temp(A_IP, A_TimeStamp) VALUES ('1', 1)
insert into @Temp(A_IP, A_TimeStamp) VALUES ('1', 2)
insert into @Temp(A_IP, A_TimeStamp) VALUES ('1', 3)
insert into @Temp(A_IP, A_TimeStamp) VALUES ('2', 10)
insert into @Temp(A_IP, A_TimeStamp) VALUES ('3', 5)
insert into @Temp(A_IP, A_TimeStamp) VALUES ('4', -1)
insert into @Temp(A_IP, A_TimeStamp) VALUES ('2', 12)

select A_IP, MAX(A_TimeStamp) As 'Most recent', count(*) As 'count'
from @Temp
group by A_IP
order by A_IP

chocojosh
Jun 9, 2007

D00D.

Begby posted:

It seems to me that the employee ID should never get reused for a new employee. i.e., if Bob quits then his ID gets retired and John gets a new ID. The bob record would still stick around but be marked as no longer employed or something. This way all the history would still be linked to Bob.

That makes the most sense to me unless I am misunderstanding your table structure/question.

Begby, I think the problem is as follows:

You have employee Sue (ID 1) who is supervised by Bob (ID 2). Thus in the supervises table you have a record (1, 2) -- Employee 1 is supervised by Employee 2.

When Bob is replaced by John (ID 3), if you update the record (1, 2) to become the record (1, 3), then you've lost the fact that Employee 1 was formerly supervised by Employee 2.


I like Victor's approach, I'd only have two points to modify to that:

1) Why do we only have one datetime for supervision? I guess the end date of supervision would be either the start date of a new supervisor record or when the employee quits? What if an employee gets promoted to be president and thus has no supervisor?

2) "supervision_hierarchy_historical_id in Table A, and probably in Table B." I think that may be backwards? What I'm understanding is that supervision hierarchy historical id is a surrogate key in table B and may be in Table A (if you often do queries on "list employees and their current supervisors").

chocojosh
Jun 9, 2007

D00D.

MoNsTeR posted:


More generally, I've yet to see a really elegant solution to problems like this that are induced by change-over-time type processes.

I have no clue how the far the research has advanced but wanted to point this out:

http://en.wikipedia.org/wiki/Database_normalization#Sixth_normal_form

chocojosh
Jun 9, 2007

D00D.
I had a small query to do today and I was thinking through it and wanted the opinion of the much more experienced DBAs here.

I have a table Language, with columns Key, Language, Value. It provides a way to support internationalization for a bilingual website I'm doing, inspired by .NET's resource files and GetString call.

I needed to provide a dump of the text for a translator, so I ended up doing the following:

code:
SELECT EL.Name As 'Key', EL.Value as 'English Text', FL.Value as 'French Text' 
FROM Language EL INNER JOIN Language FL ON EL.Name = FL.Name 
WHERE EL.Language = 'EN' AND FL.Language = 'FR';
This provides a table with (Key, English Text, French Text), so that the translator can take my English text, give me the correct French text, and then I can redump it to the database with the key.

I decided to use a self-join because I wanted to show the data of two records of the table in one record on the output. However, while this query definitely does not scale well as each new language would require another self-join. I'm wondering if I'm using self-joins in the appropriate way (to merge multiple records of a table into one record) and if there's a better way to be doing this type of query.

chocojosh
Jun 9, 2007

D00D.

hexadecimal posted:

I sort by different criteria, but index is a primary key right? that is only applicable to one column in a table?

http://www.google.ca/search?q=database+index&ie=utf-8&oe=utf-8&aq=t&rls=com.ubuntu:en-US:unofficial&client=firefox-a

Index (database) posted:

A database index is a data structure that improves the speed of operations on a database table. Indexes can be created using one or more columns of a database table, providing the basis for both rapid random look ups and efficient access of ordered records.

A primary key uniquely identifies the record. Normally the DBMS makes it an index (because you are likely to filter your table records by primary key) although I recall reading that older versions of MySQL does not do this by default (I have not checked the recent versions).

Indexes are used to make lookup fasters. Think SELECT posts WHERE userid = 'hexadecimal' From what I understand the most simple index will simply be a hash table on the appropriate column so that instead of doing O(log n) (sorted data) or O(n) (unsorted data) to find the record that has the specified value, it uses the hash table to do an O(1) lookup. DBMSs do offer more advanced indexes, but I've never needed to use them.

This was covered in my university undergraduate database course and I don't go to a fancy university. It also could've been answered really easily by google. I'm seriously wondering how you're in a master's program.

chocojosh
Jun 9, 2007

D00D.
Do you have any tool that you recommend for drawing database diagrams for team discussions/design review?

My university recommends using ERDs (this is being used to help my team on an undergrad project course) but I find they're a pain to create and maintain. I much prefer having all of the attributes inside the same box as the table, like when you create a database diagram.

I would prefer if it would be available in LINUX although I do have access to Windows. I have also heard that there are some tools that will auto-generate SQL statements which would be pretty neat but definitely not something I need.

chocojosh
Jun 9, 2007

D00D.

Azerban posted:

So I've got a schema that looks like this;

Student (sID: integer, sName: string, major: string, year: string)
Class (cName: string, meets_at: time, roomno: string, fID: integer)
Faculty (fID: integer, fName: string, deptID: integer)

with a relationship;

Enrolled (sID: integer, cName: string)

And I need to write a bunch of check constraints, and one in particular is kicking my rear end. It needs to make sure faculty in different departments can never teach in the same room; ie each room is specific to each department. I have no idea where to start, the logic is evading me. Anybody able to point me in the right direction?

If I understand your post, we need to avoid having
record1.roomno = record2.roomno
AND record3.deptID != record4.deptID
AND record1.fID = record3.fID
AND record2.fID = record4.fID

Normally when I need to simultaneously access two records in the same table it's a hint for me to look at self-joins.

Adbot
ADBOT LOVES YOU

chocojosh
Jun 9, 2007

D00D.

Ledneh posted:

"Why don't we use a DB lookup to determine the correct stored procedure instead of hardcoding it, so we can extend the program without having to change or recompile it?"

My understanding is that you want to be able to dynamically change the SQL used in a DB lookup.

If you have a fixed set of values then you can use a lookup scheme as you're thinking of. Of course, each time you want a new query you're going to have to add it to the database. In my opinion this solution is relatively simple but there isn't much to gain except for avoiding compiles.


If however you want to dynamically build some SQL for different situations then that isn't so hard. You need to figure out how you're going to represent your SQL in a format that doesn't require compilation and that can be read in dynamically (say as an external XML file). You could either simply type in the SQL into an XML file, or you can use some other type of representation that can be converted to SQL. Then, you can read in the SQL and execute it (lookup prepared statements). This has more to gain in my opinion because you can create completely new queries without having to go into the DB and make stored procedures.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply