Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
kumba
Nov 8, 2003

I posted my food for USPOL Thanksgiving!

enjoy the ride

Lipstick Apathy

Norton the First posted:

My knowledge of SQL only extends through the first dozen exercises on CodeSignal, can anyone lend me a hand?

I have a table called QSR of investment return data for a number of funds. It looks like this when I import it into Excel:



I want to break this row in two, so that the columns after "datadate" just say "ret_Q," "ret_1Y," "ret_3Y," with one row representing income returns and the other row representing total returns. What would that involve?

code:
SELECT NAME, 
       datadate, 
       ret_income_q  AS ret_Q, 
       ret_income_1y AS ret_1Y, 
       ret_income_3y AS ret_3Y, 
       1             AS 'orderByMe' 
UNION ALL 
SELECT NAME, 
       datadate, 
       ret_total_q  AS ret_Q, 
       ret_total_1y AS ret_1Y, 
       ret_total_3y AS ret_3Y, 
       2            AS 'orderByMe' 
ORDER  BY NAME, 
          datadate, 
          orderByMe 
If your data structure is such that you can guarantee ordering by id will always result in income on top and total on bottom, you can order by id instead and get rid of the 'orderByMe' parts

Adbot
ADBOT LOVES YOU

Norton the First
Dec 4, 2018

by Fluffdaddy

kumba posted:

code:
SELECT NAME, 
       datadate, 
       ret_income_q  AS ret_Q, 
       ret_income_1y AS ret_1Y, 
       ret_income_3y AS ret_3Y, 
       1             AS 'orderByMe' 
UNION ALL 
SELECT NAME, 
       datadate, 
       ret_total_q  AS ret_Q, 
       ret_total_1y AS ret_1Y, 
       ret_total_3y AS ret_3Y, 
       2            AS 'orderByMe' 
ORDER  BY NAME, 
          datadate, 
          orderByMe 
If your data structure is such that you can guarantee ordering by id will always result in income on top and total on bottom, you can order by id instead and get rid of the 'orderByMe' parts

Thank you!

eta: a lot! This is perfect.

Norton the First fucked around with this message at 21:24 on Jan 30, 2019

Mycroft Holmes
Mar 26, 2010

by Azathoth
it's that time again. I'm trying to select the first monday six months from a variable. I've got it down to find the first sunday six months from a date. How do i get it to monday?
code:
select ename, hiredate, to_char(round(add_months(hiredate, 6), 'day'), 'DAY, "the" ddspth "of" MONTH, YYYY') as "REVIEW"
from emp

NihilCredo
Jun 6, 2011

iram omni possibili modo preme:
plus una illa te diffamabit, quam multæ virtutes commendabunt

Mycroft Holmes posted:

it's that time again. I'm trying to select the first monday six months from a variable. I've got it down to find the first sunday six months from a date. How do i get it to monday?
code:
select ename, hiredate, to_char(round(add_months(hiredate, 6), 'day'), 'DAY, "the" ddspth "of" MONTH, YYYY') as "REVIEW"
from emp

Not sure about the dialect you're using (Oracle?) but I would do this in pseudosql:

code:

select whatever
from emp
join (values (1,2,3,4,5,6,7)) as days(n)
on get_day_of_week(hiredate + six months + n days) = 'monday'

Use 0..6 instead of 1..7 if you want to include the day exactly six months after, obviously.

Mycroft Holmes
Mar 26, 2010

by Azathoth
n/m fixed it

Mycroft Holmes fucked around with this message at 20:44 on Feb 3, 2019

Kuule hain nussivan
Nov 27, 2008

Yay, more fun with data that has silly formatting and needs to be made slightly less silly.

I have an nvarchar column where the contents are always an id, which is possibly followed by a comma and some sub-identifiers. The sub identifiers are always either a single number or a range with a hyphen (so either 2 or 3-4). Is there a nice and simple way of breaking the single row into x number of duplicate rows according to sub-identifier.

For example, if I have 1, 2-5 then this would be broken down into...

1 | 2 | All other columns
1 | 3 | All other columns
1 | 4 | All other columns
1 | 5 | All other columns

The material is very small, so performance can be absolutely horrendous.

NihilCredo
Jun 6, 2011

iram omni possibili modo preme:
plus una illa te diffamabit, quam multæ virtutes commendabunt

Create a subquery or temp table that holds the numbers from 1 to max_whatever_you_need. You can do it with VALUES if it's just a few, if it's ginormous you can generate them with a recursive query, google it.

Then join that against your table with a join condition that looks like (numbers.N >= extract_minimum_value_from_column(table.id) AND numbers.N <= extract_maximum_value_from_column(table.id)).

How you extract those numbers depends on what string splitting functions your DB offers, but a basic CASE WHEN id LIKE something THEN SUBSTRING(id, x, y) .. can probably do the job.

Scaramouche
Mar 26, 2001

SPACE FACE! SPACE FACE!

I did a similar thing waaaaay back in this thread with the XML syntax I think. Posting here so I can click on my name and find it and edit it in.

EDIT-Hmm, not so helpful after all:
code:
SELECT 
    CAST('<Node>' + REPLACE((SELECT SUBSTRING(CarString, CHARINDEX('[',CarString) + 1, 
       ((CHARINDEX(']',CarString)- 1) - CHARINDEX('[',CarString)))),' ','</Node><Node>') + 
       '</Node>' AS xml).query('Node[1]/text()') As Year,
    CAST(blah blah looong line) As Manufacturer,
    CAST(blah blah looong line) As Model,
    CAST(blah blah looong line) As Make
FROM CarsSELECT 
    CAST('<Node>' + REPLACE((SELECT SUBSTRING(CarString, CHARINDEX('[',CarString) + 1, 
       ((CHARINDEX(']',CarString)- 1) - CHARINDEX('[',CarString)))),' ','</Node><Node>') + 
       '</Node>' AS xml).query('Node[1]/text()') As Year,
    CAST(blah blah looong line) As Manufacturer,
    CAST(blah blah looong line) As Model,
    CAST(blah blah looong line) As Make
FROM Cars
I was getting elements from a text list of Carname,Carmanu,Carmodel,Carmake with no guarantee any of them other than Carname would exist. This is probably more harmful than helpful.

Scaramouche fucked around with this message at 18:41 on Feb 11, 2019

hummingbird hoedown
Sep 23, 2004


IS THAT A STUPID NEWBIE AVATAR? FUCK NO, YOU'RE GETTING A PENTAR

SKILCRAFT KREW Reppin' Quality Blind Made Products
I figured I'd ask in here before I post in a SH/SC forum at large.

I've run into what seems to be a very common problem in MS Access. I'm creating a combo box that references other tables and only their primary key number is showing up instead of the text value for that record.

I have separate tables for cities, states, and countries.
I have a table of personnel.
I have a table of offices that use the city, state and country tables to give them locations.
I have a table of event categories.
I have a table of events that use comb box box tied to the event categories.
I have a table of event schedules because single events sometimes have specific blocks of time where sections of the event occur (like event setup, the event itself, and event teardown) and also a specific location because some events occur in more than one place. The office responsible for that piece of the event is also chosen on this table.

I want to build a table where I can assign personnel to event schedules, so I've used lookup wizard to make combo boxes where I can pick an event and pick a person to assign to that event. I've set up the event personnel table to have a combobox where the event schedule shows up so you know you're choosing the correct part of the main event to assign someone to, but the box always shows the event primary key number and the city, state, and country primary key number instead of the text they're associated with. The office and start date and end date work correctly.

What might be going on here?

Scaramouche
Mar 26, 2001

SPACE FACE! SPACE FACE!

I've got a old, old, (think like 12 years) MSSQL Server table that needs to be freshened up a bit by converting all the VARCHAR to NVARCHAR because weirdo people that use weirdo languages insist on doing business with us and aren't happy when their names are recorded as ?? ? ??? ??. The tables aren't that big, maybe 2-3 million rows, but they are in production and are constantly being hit with inserts/updates. I'm assuming ALTER TABLE is a no go here due to the lock/execution time and the only other solution is CREATE TABLE -> INSERT with the new columns?

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!
https://docs.microsoft.com/en-us/sql/t-sql/statements/alter-table-transact-sql?view=sql-server-2017#locks-and-alter-table says that adding non null columns with a default is (caveats) a fast operation. This permits you to migrate piecemeal at the expense of slightly higher latencies per call during your migration. Plan appropriate tests and metrics so you can determine when it becomes viable to switch over entirely to the new column.

Scaramouche
Mar 26, 2001

SPACE FACE! SPACE FACE!

PhantomOfTheCopier posted:

https://docs.microsoft.com/en-us/sql/t-sql/statements/alter-table-transact-sql?view=sql-server-2017#locks-and-alter-table says that adding non null columns with a default is (caveats) a fast operation. This permits you to migrate piecemeal at the expense of slightly higher latencies per call during your migration. Plan appropriate tests and metrics so you can determine when it becomes viable to switch over entirely to the new column.

Hmm maybe this won't be so bad. No indexes will be affected, and no defaults will be changed/inserted, there are no triggers or constraints being affected. It's basically looking like just ALTER COLUMN WeirdoName NVARCHAR(100). It looks like they've cleaned up this process considerably in MSSQL 2012 onward. Luckily the table is denormalized pretty good and the "big" table only references these bad old VARCHAR guys by ID. Thanks for the link!

tmesis
Jan 18, 2007

sup holmes
Megamarm
I have a query generated by application code, that thanks to a never-ending buffet of "UNION SELECT these three values", exceeds six megabytes in length. The query fails with a 1064 syntax error, and returns "Memory exhausted near <some part of the query somewhere in the middle of the 6 MB mess>". I'm assuming that this is a limit of the query parser, but i'm struggling to find any documentation on the parser's limits so i can go back to the development team to tell them to rewrite better. All i can find regarding this error is people trying to do masses of inserts and having their problem easily solved by breaking up their inserts.

Scaramouche
Mar 26, 2001

SPACE FACE! SPACE FACE!

Is that the MySQL error? It really varies depending on the platform. If MySQL you can get the size from console I believe:
code:
SHOW VARIABLES LIKE 'max_allowed_packet';
I think you can edit an cnf file as per below to change it:
https://stackoverflow.com/questions/16335011/what-is-maximum-query-size-for-mysql

tmesis
Jan 18, 2007

sup holmes
Megamarm
Yeah that is the error returned. max_allowed_packet is set to 1G for us, which i think is the max allowed size? My theory is that because the query is so long, the query parser is just giving up half way through and trying to run the query at that point, which is malformed since it stopped somewhere in the middle.

TheFluff
Dec 13, 2006

FRIENDS, LISTEN TO ME
I AM A SEAGULL
OF WEALTH AND TASTE
In postgres, what's the performance of an update that matches no rows like? Is it slower than a SELECT with the equivalent WHERE clause? EXPLAIN ANALYZE is claiming there's an update operation even though no rows are getting touched.

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!
Include EXPLAIN VERBOSE ANALYZE.

Much depends on concurrent database traffic and the type of update being performed. For example, "how many rows are changed if the update has to calculate a value for a field, but all fields already have the target value?". There was a bug in CTEs related to locking an updates.

Hard to say without info and my crystal ball is out for repair.

TheFluff
Dec 13, 2006

FRIENDS, LISTEN TO ME
I AM A SEAGULL
OF WEALTH AND TASTE

PhantomOfTheCopier posted:

Include EXPLAIN VERBOSE ANALYZE.

Much depends on concurrent database traffic and the type of update being performed. For example, "how many rows are changed if the update has to calculate a value for a field, but all fields already have the target value?". There was a bug in CTEs related to locking an updates.

Hard to say without info and my crystal ball is out for repair.

Thanks for the suggestion, I'll try it out when I'm back at work.

Basically I want to see if I can replace some dumb legacy code that very frequently does something like

1. SELECT * FROM some_table WHERE unique_id = <param>
2. application level check if some_table_row.last_active was more than a minute ago
3. if the check in step 2 is true, UPDATE some_table SET last_active = NOW() WHERE unique_id = <param>

with:
UPDATE some_table SET last_active = NOW() WHERE unique_id = <param> AND last_active < now_minus_1_minute

The entire thing is silly and I'd rather just replace it with some other mechanism entirely, but for now I gotta live with it. The problem with the first variant is mainly that it quite frequently gets triggered in several places simultaneously, so there's like three or four updates to the same row in rapid succession. This isn't a problem for the logic (I don't care that it's not transaction safe, for example - it doesn't matter which operation ends up being the last to commit), but I was wondering if I could reduce the number of writes to the table by using the second variant, without making the performance in the case when the last_active condition is false significantly worse.

EXPLAIN ANALYZE said that the second variant would use an index scan on the unique id only, regardless of the last_active condition, which makes sense to me, but it also makes me suspect it might lock the row to check if the last_active condition is false.

With all the above written out, I guess the real answer is I should just go benchmark it.

TheFluff fucked around with this message at 19:01 on Mar 9, 2019

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!
Let's compare...

TheFluff posted:

legacy code that very frequently does something like

1. SELECT *
2. application level check
3. UPDATE

with:
UPDATE
In step 2, you've lost. There is no reasonable circumstance where (send all data over network + perform simple check + send more data back) is going to be close to (do it as soon as the row has been read from disk/cache). Exceptions are often made for very complex update conditions, such as lengthy mathematical algorithms, or when the database chosen (ie not Pg) is so dumb it can't handle "simple checks" without running out of memory.

If your explain shows the same index use as #1, you can't lose.

My guess is that you have no joint index on (id,timestamp), and pg isn't going to bother using a bare index on (timestamp) separately from the (id) because that requires construction a full Cartesian product, which is considerably worse than just checking each (id) from the index scan. You can check with an explain in a select with equivalent WHERE.

Doing the select * in #1 is rather derpy, because that means sending over a tonne of traffic just to check one field.

If there's something in the app logging the touches, just do an UPDATE RETURNING.


PS. If there's an index on (timestamp), it might be more efficient to fetch all 'expired' rows where the id is as given. (That is, reverse the order of the logic.) This very much depends on your data distribution and what percent of the id are touched routinely. The query planner presumably has statistics showing that the id is best first based on available indexes, but there are ways to force order if needed.

PhantomOfTheCopier fucked around with this message at 22:53 on Mar 9, 2019

Ruggan
Feb 20, 2007
WHAT THAT SMELL LIKE?!


I’m guessing that the more highly filtered update will still just do an Index Seek on the UK - it’s going to know that that value is highly selective and will only return a single row. If that index is a secondary nonclustered index that does not include the time stamp it will also need to do a Key Lookup, which you’d need to do anyway to get the data for the app level check. Then some scalar comparison for filtering which would be new (moving the check from the app to the db) but which would be far less expensive than sending the data over the wire and conditionally doing a second request to update.

Like Phantom said, if the app needs to know if an update was done, just return the rowcount affected by the update: 1 means it was, 0 means it wasn’t.

The only thing I could think of that might be bad in switching your code is the escalated table locking for an update statement doing something during high concurrency. If all of a sudden you’re taking out 10x more (probably row-level) locks because you’re issuing updates for each request rather than only when necessary, you could have some problems. Not sure if your db will take out a lock or not in this case. If it did, impact could be reduced by implementing dirty reads or transaction isolation levels.

TheFluff
Dec 13, 2006

FRIENDS, LISTEN TO ME
I AM A SEAGULL
OF WEALTH AND TASTE
Thanks for the input, both of you.

I don't think Postgres has clustered indexes in the way MSSQL does, really. There is a clustering option, but AFAIK it's a one-time operation which reorders the table layout on disk once but does not maintain that order for rows updated or inserted after that. Primary keys are not clustered by default. As far as I can recall the execution plan, there was only one operation listed, and that was the index scan for the unique index. I figured what was happening was that it knows there can only be one matching row since it's a unique index, so it goes and gets that and then it doesn't need to do any other operations to check the other condition. The last_active column isn't indexed. In this case I suspect adding an index on (id, timestamp) wouldn't really be beneficial given how frequent the updates are.

The locking contention is the main thing I'd be worried about and I haven't really found any way to get any insight into that, other than benchmarking at semi-realistic scale (or #yolo deploy to production). I found some articles about counting the number of locks but they seemed to be mainly concerned with transactions and I couldn't make any sense of the results when I tried it. Still, I'm pretty confident that it's very unlikely the change would make anything worse.

TheFluff fucked around with this message at 05:35 on Mar 10, 2019

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!
I admit to missing that the id was unique in the table of interest; we'll call it assuming the worse or blame it on this cold. UNIQUE doesn't lock the table, and it's not going to lock the row for longer than step 3.

You can do a two transaction demo, first selecting a non-expired id 'for update', which will lock it, then an update on that id with the now<timestamp conditional. Likewise do the experiment in the other order. You should be able to see which actions block the others.

PhantomOfTheCopier fucked around with this message at 16:19 on Mar 10, 2019

abelwingnut
Dec 23, 2002


dropping in to say just how loving much i hate importing and exporting between different db engines and all the fun with encodings and datatypes this brings. currently trying to export data from sql server into an acceptable format for postgres and its copy function. apparently whatever utf-8 sql server spits out is not acceptable for postgres, but the regular latin works? and now i'm having such fun with NULL in an integer column. and this is all after learning that i can't import data into a postgres rds instance from an s3 bucket, yet you can into a mysql rds instance, which just seems amazingly dumb. oh, and this is all because my company refuses to just allow me to connect sql server to the aws rds instance and use database migration service to link them up

such joy!

abelwingnut fucked around with this message at 00:28 on Mar 11, 2019

TheFluff
Dec 13, 2006

FRIENDS, LISTEN TO ME
I AM A SEAGULL
OF WEALTH AND TASTE

Abel Wingnut posted:

dropping in to say just how loving much i hate importing and exporting between different db engines and all the fun with encodings and datatypes this brings. currently trying to export data from sql server into an acceptable format for postgres and its copy function. apparently whatever utf-8 sql server spits out is not acceptable for postgres, but the regular latin works? and now i'm having such fun with NULL in an integer column. and this is all after learning that i can't import data into a postgres rds instance from an s3 bucket, yet you can into a mysql rds instance, which just seems amazingly dumb. oh, and this is all because my company refuses to just allow me to connect sql server to the aws rds instance and use database migration service to link them up

such joy!

I didn't know SQL Server could export UTF-8 at all (I would've figured it was all UTF-16), but apparently that was added quite a long time ago. Asking a really dumb question here though (and I realize you're here to vent and didn't ask for help, so no offense intended): could it be just the byte order marker that SQL Server probably puts in the output file that's spooking Postgres?

abelwingnut
Dec 23, 2002


yea, i just discovered checking unicode defaults to utf-16, not utf-8. so that is certainly part of the problem. reworking that all now and going to see if that helps.

but yes, i do believe the byte order marker is spooking postgres. i get the 'oxFF' error, even. not really sure how to avoid sql server spitting that out

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!
Stack Overflow suggests the PG ODBC driver with Server Management Studio SSIS export. :munch:


https://wiki.postgresql.org/wiki/Converting_from_other_Databases_to_PostgreSQL#Microsoft_SQL_Server

https://github.com/tds-fdw/tds_fdw (but you probably can't run this against RDS)

PhantomOfTheCopier fucked around with this message at 02:56 on Mar 11, 2019

downout
Jul 6, 2009

Abel Wingnut posted:

dropping in to say just how loving much i hate importing and exporting between different db engines and all the fun with encodings and datatypes this brings. currently trying to export data from sql server into an acceptable format for postgres and its copy function. apparently whatever utf-8 sql server spits out is not acceptable for postgres, but the regular latin works? and now i'm having such fun with NULL in an integer column. and this is all after learning that i can't import data into a postgres rds instance from an s3 bucket, yet you can into a mysql rds instance, which just seems amazingly dumb. oh, and this is all because my company refuses to just allow me to connect sql server to the aws rds instance and use database migration service to link them up

such joy!

If it's any solace getting data into sql server is just as much of a nightmare. I've only ever got one method to work using a file, and I have to relearn it every 2 or 3 years.

GI_Clutch
Aug 22, 2000

by Fluffdaddy
Dinosaur Gum

downout posted:

If it's any solace getting data into sql server is just as much of a nightmare. I've only ever got one method to work using a file, and I have to relearn it every 2 or 3 years.

One of my projects we get fixed length field flat files of case/demographic data from mainframe systems that are almost thirty years old. Trying to import them into SQL Server would result in errors. According to the mainframe guys, "Yeah, sometimes there are weird characters in there." We had to get things working quickly so we built a preprocessor that converted each line to ASCII. If the converted line contained a question mark (which the non-ASCII characters were converted to), skip it. Tada, the file loaded. We tried including the converted lines, but the conversion changes the number of characters per line, causing it to fail because the fields are the wrong size.

Maybe there's some simple changes I could have made to the DTSX package to make the file load just fine (changing the code page for the flat file?), but I was under pressure to get it done ASAP, so yeah. The data is just used for convenience purposes for the users of the system and we haven't heard a single issue of "Why isn't X showing up?" so we haven't revisited it. We have so many other things going on anyway.

mr_package
Jun 13, 2000
I have an internal tool being used as part of a dev/build system, and one of the requirements is to support branch/tag/trunk with different versions of the data. So if someone changes "Exit" to "Quit" on trunk, previous tags should still build with "Exit". Is this a solved problem in database world? To me it seems smarter to just use a simple xml or other plain text file for the data, directly in the repo. Then the VCS will track everything: diff/history/merge/etc., we already have tools to this, and get it for free this way.

Doing this in the database seems like it's going to bloat the dataset hugely. I can kind of see a workaround: use the current set (about 4k rows) for a root dictionary and then only track the changes in each branch. But again, this is what VCS is for. Also, with a db I need to find a way to sync the data for workflow: when someone creates a branch they shouldn't really have to go to this web app and make a new branch just for the strings. But what about merges? If someone wants to make a new string or some changes on a feature branch then merge back to trunk, I'm supposed to code support for this into the app? Perhaps it's easy, it's kind of outside of my realm so I don't know yet, just seems weird when this is all solved problems in the VCS world-- why am I redoing it in a database/webapp?

Anyway using a database for this tool makes sense in general. The data is relational, and using a db allows us to easily link UI/code references to shared strings/groups, e.g. we can have button_exit and tooltip_quit point to the same string. We can export / import for localization. All that stuff. But with versioning requirement not sure RDBMS is the best solution here. My gut says we'd be better served with a tool (webapp, commandline, whatever) that people can run locally, point at the branch they are working on, make any changes they need, and commit the changes. Then the actual build scripts could read that file and export the strings into the correct format/platform (win/mac/ios/etc.) and language. Right now with the current db, it's living on its own as the central 'source of truth' and users need to export into the VCS when they are done. Versioning isn't supported today: all the work is going on trunk.

Ruggan
Feb 20, 2007
WHAT THAT SMELL LIKE?!


Guess I’m not understanding what you’re asking. Are you trying to figure out how to version the data within a database table or set of tables so that the proper data can be associated with your code repo?

mr_package
Jun 13, 2000
Naively I'm gonna say both but the schema isn't going to change, just some rows added or modified, between the versions of the data.

Best I can explain it is to think of someone pulling a branch for a new feature. It has some new UI elements with buttons like "open" "close" "new" etc. So they add in new_feature_open new_feature_close new_feature_new and either add the corresponding "Open" "Close" "New" strings, or optionally group/link them to the existing strings for open/close/new that were already added because they show in other parts of the UI.

So the idea here is if someone builds a previous (shipped) tag, none of those strings/ids exist, nothing breaks, and no one complains that they build an old tag and windows say "close" when they used to say "exit", that kind of thing.

Then when the dev merges their feature back to trunk we need to update the database too, those new strings need to be "merged" into the "trunk" version of the database. So I have to version the data and keep it in sync with any branches/tags created over the course of development. In the current tool, before making a build, they would need to export the data/strings into the various files required by the build system and commit the changed files to VCS and so right now those are the files that are tracked. And I've basically been tasked with tacking on versioning to this system (previous dev working on it left before it was finished) and I'm looking at this thing not sure current approach really supports it in a good way.

My gut says use single huge monolithic XML file for all the data and then provide a tool that can read/write that file; build scripts read that file and export to whatever files the build system needs at build time.

But maybe I'm getting worked up over nothing and current approach is fine?

downout
Jul 6, 2009

mr_package posted:

Naively I'm gonna say both but the schema isn't going to change, just some rows added or modified, between the versions of the data.

Best I can explain it is to think of someone pulling a branch for a new feature. It has some new UI elements with buttons like "open" "close" "new" etc. So they add in new_feature_open new_feature_close new_feature_new and either add the corresponding "Open" "Close" "New" strings, or optionally group/link them to the existing strings for open/close/new that were already added because they show in other parts of the UI.

So the idea here is if someone builds a previous (shipped) tag, none of those strings/ids exist, nothing breaks, and no one complains that they build an old tag and windows say "close" when they used to say "exit", that kind of thing.

Then when the dev merges their feature back to trunk we need to update the database too, those new strings need to be "merged" into the "trunk" version of the database. So I have to version the data and keep it in sync with any branches/tags created over the course of development. In the current tool, before making a build, they would need to export the data/strings into the various files required by the build system and commit the changed files to VCS and so right now those are the files that are tracked. And I've basically been tasked with tacking on versioning to this system (previous dev working on it left before it was finished) and I'm looking at this thing not sure current approach really supports it in a good way.

My gut says use single huge monolithic XML file for all the data and then provide a tool that can read/write that file; build scripts read that file and export to whatever files the build system needs at build time.

But maybe I'm getting worked up over nothing and current approach is fine?

I would agree the data needs to be stored in the file system in some way, so that it can be maintained in the repo. Instead of XML you could eval using a database system that keeps data in files. That would come with the benefit of having a GUI interface for managing the data, and it should be possible to find something that would easily import/export data with database connections/statements.

I've encountered similar situations, but what was used to maintain it wasn't optimal. I think we had a starting database, and then a series of scripts that loaded data to the starting state. Then it data was modified in a later commit, either the load scripts were updated or additional scripts were made to update the data. It was a lovely solution, but it was what was in place and there wasn't time to make a better solution.

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!
I had to read that a few times. It's not clear why there's a database, nor why there should be one. It also sounds like you're locked into "building RCS in a database", instead of thinking about how to model your data first, then getting code branching and tagging from that.

So then, there are dozens of ways to do it depending on the problems you're trying to solve and the features you need. The 'flat' approach is to just include a revision number with the rows, then a build target can be bound to those. If you prefer tags, rows could be tagged, or a separate table could track tagged rows.

"Database audit tables" are usually used to track changes within a table. Triggers on insert/update/delete add audit entries (to an audit table) before each action. They could likewise create a row revision table (meaning the main table is always head/trunk/tip/main).

You could also fake it with RCS and just point rows at files. :clint:



PS: There's a filesystem and revision system listed on the PostgreSQL foreign data wrapper page.

PhantomOfTheCopier fucked around with this message at 23:59 on Mar 18, 2019

Ruggan
Feb 20, 2007
WHAT THAT SMELL LIKE?!


I agree with pretty much everything Phantom said but also want to underscore that I see absolutely no reason that what amounts to app configuration belongs in a database. Seems like a lot of extra trouble for no gain. What’s the benefit of having this stuff in a database?

Either way, if it’s necessary and you have no control, yeah just recreate the concept of a VCS in a database. Or dump the data to flat files and VCS those.

mr_package
Jun 13, 2000
I think the main driver was to collect all the application strings in one place, because every platform was using its own language file(s). Some are XML, some are just ini-style, etc. and someone decided that instead of developers maintaining these files manually we should collect everything into a database, along with metadata such as date, groups, etc. Then the tool generates the per-platform langauge files from the database.

Having modified date in the db means we can export only new strings to provide to translators instead of dumping the whole file.

Having groups allows us to re-use a common string ("Close", "Quit") in various parts of the UI without having to re-translate them: new features can re-use existing strings where possible. So "new_feature_window_close" just gets pointed to the existing string id which is "Close" for en-US but whatever it needs to be for the other languages.

This would also hopefully lead to alignment between platforms, so each platform is using the same string for same place on UI, tooltips, menu items, etc. so we don't have for example a settings menu called "Connection" on Windows and "Network" on Mac.

You are right this is effectively app configuration. I am just trying to take a step back and look at this thing and see what would be the best solution-- easy to maintain, not brittle, not buggy, and usable enough that people don't step around it and go back to just manually maintaining all the languages separately. I have carte blanche to rebuild from scratch if that's what I report needs to happen I just thought maybe if I could learn about the 'best practice' way of doing this in the database I could salvage enough of what's been written to make that approach an acceptable one.

But when I look at this it seems like I could bring all these features together just by including a little bit of metadata in XML (or whatever) and get all these features, plus everything version control brings to the table for free.

For me the real aha moment was when I was trying to model in my head how I would support merging from a feature branch back into trunk, as soon as I realized that would be necessary feature I started thinking database is probably wrong approach for this to begin with. Not that it can't be done but why would you invest all that effort? Unless it's easy to do. My core question was really "Is there an easy/standard way of doing this that I should be doing and just don't know about?" Seems like the answer is not really.

Thanks for all the replies, I know it was a lot to read!

Scaramouche
Mar 26, 2001

SPACE FACE! SPACE FACE!

Pretty straightforward MSSQL question, but I'm just paranoid about running this on production. I'm updating a Foreign Key on another table. To simplify the statement is (objects changed to protect the innocent but the relationship is the same):
UPDATE Customers SET SalesPersonID=xxxx WHERE CustomerID=yyyy

Where SalesPersonID is a foreign key in Customers (primary key of SalesPersons) and CustomerID is a primary key of Customers

In the FK definition these are the settings:
Check Existing Data on Creation: Yes
Enforce Foreign Key Constraint: Yes

INSERT And UPDATE Spec:
Delete Rule: No Action
Update Rule: No Action

I'm hoping to run a couple hundred thousand of these; I'm not too concerned about the seek since it's PrimaryKey, but I'm wondering about the FK constraint; will it lock the Customers table while looking up if SalesPersonID exists in the other table? My gut says with the spec above it'll be just like updating a non-indexed/triggered column but now I'm all paranoid.

Ruggan
Feb 20, 2007
WHAT THAT SMELL LIKE?!


If your foreign key table (SalesPersons) has a clustered primary (the standard way of setting it up) then updating the FK in Customers will acquire a shared lock on SalesPersons to validate the FK. If you’re doing a large update as a single transaction this could lead to deadlocks and/or timeouts.

Easiest way to avoid this is to just batch your transaction using a cursor loop. I often do this when doing any large updates and iterate through per 1k or 5k or however many rows - this lets other queries sneak in between your batches which avoids the problem entirely. But that isn’t helpful if you need the operation to be atomic and fully occur before the next transaction.

Another option you have, per the Internet, is to declare a non-clustered index on the PK column of SalesPersons:

quote:

Define the primary key on the parent table as a non-clustered index. Once you do that, the issue won’t occur since the look-up will occur against the non-clustered index and that won’t be in a locked state since the PK column is not being modified.

From what I understand from that quote, giving SQL a non-clustered alternative lets it check the FK without locking the whole row via the clustered index.

That help you at all?

Oh and if you want an example of a batched loop update I’m happy to post some template code from work tomorrow.

Scaramouche
Mar 26, 2001

SPACE FACE! SPACE FACE!

Ruggan posted:

If your foreign key table (SalesPersons) has a clustered primary (the standard way of setting it up) then updating the FK in Customers will acquire a shared lock on SalesPersons to validate the FK. If you’re doing a large update as a single transaction this could lead to deadlocks and/or timeouts.

Easiest way to avoid this is to just batch your transaction using a cursor loop. I often do this when doing any large updates and iterate through per 1k or 5k or however many rows - this lets other queries sneak in between your batches which avoids the problem entirely. But that isn’t helpful if you need the operation to be atomic and fully occur before the next transaction.

Another option you have, per the Internet, is to declare a non-clustered index on the PK column of SalesPersons:


From what I understand from that quote, giving SQL a non-clustered alternative lets it check the FK without locking the whole row via the clustered index.

That help you at all?

Oh and if you want an example of a batched loop update I’m happy to post some template code from work tomorrow.

It's okay actually; I took in what you said and then decided to run it against the staging instance which is only two weeks behind production. "SalesPersonID" is a clustered index as you had theorized, but an important thing I hadn't realized is the size of SalesPersons, which is tiny (barely 1 million rows). Entire run time for 800,000 Customers rows was less than a second on staging so I went ahead and did it on production where performance was comparable. I think I lucked out on this given the relative size of the FK table, but it looks like I was right to have a concern, it just may not been have been warranted in this case.

Ruggan
Feb 20, 2007
WHAT THAT SMELL LIKE?!


Nike approach all the way. My man.

Adbot
ADBOT LOVES YOU

abelwingnut
Dec 23, 2002


just came across memSQL. anyone used it?

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply