Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
prisoner of waffles
May 8, 2007

Ah! well a-day! what evil looks
Had I from old and young!
Instead of the cross, the fishmech
About my neck was hung.

hailthefish posted:

ORM reduction

2-column foreign key of (gender_id, company_id) from table people to table company_gender_codes should allow your application to enforce its gender invariant

Adbot
ADBOT LOVES YOU

Jabor
Jul 16, 2010

#1 Loser at SpaceChem

CRIP EATIN BREAD posted:

the idea is you don't need a join at all, ever.

the thing you described (make a column with a synthetic value, use it to look up data in another table if you need that data) is literally a join

you're just pointlessly doing it in the client instead of in the database

Shaggar
Apr 26, 2006

CRIP EATIN BREAD posted:

you can use a lookup table like you suggest (join tables) but honestly you'll probably be fine just using a single character as your gender column and then using a separate table with no foreign key that identifies all the gender settings for a specific company, which will be a tuple like (company_id, code, description). obviously using a join table is "more correct" but constantly doing joins for some things is pointless. especially if they have changes, like say a company had A, B and C available, but then decided C isn't an option anymore. So now you'll also have to implement a "deleted" column and it all ends up being a lot of work when you could just use a column, and performance will probably be better avoiding the joins.

if the performance hit of a join on a gender lookup table is an issue you probably need to rethink your database

cinci zoo sniper
Mar 15, 2013




someone kill me, our dwh architect did finally see a blog post about big data so he now is convinced that we need big data solutions for subterabyte scale tabular data because whatever concoction he and our dba came up with can’t work through 10^8 rows of numerics and datetimes

kitten emergency
Jan 13, 2008

get meow this wack-ass crystal prison
do your clients have a regulatory requirement for gender because if they dont then gender is varchar(255) or w/e

CRIP EATIN BREAD
Jun 24, 2002

Hey stop worrying bout my acting bitch, and worry about your WACK ass music. In the mean time... Eat a hot bowl of Dicks! Ice T



Soiled Meat
yeah my point is with humans things get super complicated real quick in situations where normalization doesn't always make sense or is feasible.

there's a lot of things you try ot treat as "flags" or whatever and it turns out that they really aren't flags and just are really a tag.

uncurable mlady posted:

do your clients have a regulatory requirement for gender because if they dont then gender is varchar(255) or w/e

i'm going to assume yes here only because they are collecting it in the first place at all.

cinci zoo sniper
Mar 15, 2013




uncurable mlady posted:

do your clients have a regulatory requirement for gender because if they dont then gender is varchar(255) or w/e

imo even if you are collecting not for a regulatory requirements then varchar is bad idea if you ever plan to analyse anything, you cant just give users variable inputs

bob dobbs is dead
Oct 8, 2017

I love peeps
Nap Ghost

cinci zoo sniper posted:

imo even if you are collecting not for a regulatory requirements then varchar is bad idea if you ever plan to analyse anything, you cant just give users variable inputs

mefi did it iirc

https://metatalk.metafilter.com/20050/Gender-Go-nuts-Somebody-did#838160

about 80% fall into easy categories but lol for the rest

bob dobbs is dead fucked around with this message at 16:21 on Nov 12, 2018

gonadic io
Feb 16, 2011

>>=

cinci zoo sniper posted:

someone kill me, our dwh architect did finally see a blog post about big data so he now is convinced that we need big data solutions for subterabyte scale tabular data because whatever concoction he and our dba came up with can’t work through 10^8 rows of numerics and datetimes

https://mobile.twitter.com/garybernhardt/status/600783770925420546

Finster Dexter
Oct 20, 2014

Beyond is Finster's mad vision of Earth transformed.

cinci zoo sniper posted:

someone kill me, our dwh architect did finally see a blog post about big data so he now is convinced that we need big data solutions for subterabyte scale tabular data because whatever concoction he and our dba came up with can’t work through 10^8 rows of numerics and datetimes

lol if they can't handle such a small amount of rows (srsly 10^8 should not be a big deal for a properly managed rdbms) can't wait to see how their "big data" system performs.

cinci zoo sniper
Mar 15, 2013




Finster Dexter posted:

lol if they can't handle such a small amount of rows (srsly 10^8 should not be a big deal for a properly managed rdbms) can't wait to see how their "big data" system performs.

i really dont mind waiting before seeing how his ""big data"" (its the guy who still cant ssh to copy-paste oneliners i wrote for him) performs, although im kinda safe here since technical implementation of "big data" would fall onto my shoulders so for now my strategy is to repeat "no, we do not need cassandra for this" every two weeks. im sure even i could figure out 10^8 rows with some indexes and maybe a few initial, expensive dtype conversion/normalisation/restructurisation operations to fix some warcrimes (like column that is literally varchar with 4 possible values - "first" through "fourth", with logically consistent with linguistics ordering wrt the entity tagging with those)


normally yes but our dwh has 16gb ram lol

bob dobbs is dead posted:

mefi did it iirc

https://metatalk.metafilter.com/20050/Gender-Go-nuts-Somebody-did#838160

about 80% fall into easy categories but lol for the rest

i do this very frequently and yeah the other 20% is what you also need to do for business forecasting, e.g. "lol for the rest" is the actual work to be done. and then you have to deal with like up to half a dozen misspellings of a word *after* you santisie for whitespace and punctuation

CRIP EATIN BREAD
Jun 24, 2002

Hey stop worrying bout my acting bitch, and worry about your WACK ass music. In the mean time... Eat a hot bowl of Dicks! Ice T



Soiled Meat

cinci zoo sniper posted:

i do this very frequently and yeah the other 20% is what you also need to do for business forecasting, e.g. "lol for the rest" is the actual work to be done. and then you have to deal with like up to half a dozen misspellings of a word *after* you santisie for whitespace and punctuation

you can also try reducing things with a phonetic approximation to get even closer.

cinci zoo sniper
Mar 15, 2013




the weirdest part is that the guy previously worked on a dwh that was >10tb

cinci zoo sniper
Mar 15, 2013




CRIP EATIN BREAD posted:

you can also try reducing things with a phonetic approximation to get even closer.

yeah i can do things like levenstein distance filtering to algorithmically combat minor misspellings but in the end of the day there will be good old manual labour to go through variations like a set - for instance other, etc, misc, miscellaneous, etcetera, n/a, not listed, not applicable - representing the same logical category. last case of this we had in production (partially varchar ticket resolution field for 7 years of history for customer support on 10^7 user scale) saw one of our guys go through the final 3000 variants of the field values that we couldn’t further reduce to summarise them to less than 20 resolution statuses

Finster Dexter
Oct 20, 2014

Beyond is Finster's mad vision of Earth transformed.

cinci zoo sniper posted:

the weirdest part is that the guy previously worked on a dwh that was >10tb

I guess "worked on" means occasionally logged in to aws?

cinci zoo sniper
Mar 15, 2013




Finster Dexter posted:

I guess "worked on" means occasionally logged in to aws?

weirdly enough no, t and l parts of etl. which is probably less weird in the hindsight since his main problem is e and the “metal” of anything that is not oracle

gonadic io
Feb 16, 2011

>>=

CRIP EATIN BREAD posted:

you can also try reducing things with a phonetic approximation to get even closer.

Cool vid about a phonetic profanity filter https://m.youtube.com/watch?v=bJ5ppf0po3k

CRIP EATIN BREAD
Jun 24, 2002

Hey stop worrying bout my acting bitch, and worry about your WACK ass music. In the mean time... Eat a hot bowl of Dicks! Ice T



Soiled Meat
I implemented a TTS for transcribing video to auto caption stuff for $AGENCY and I used Kaldi as the base with one of their pre trained models and it contains a LOT of racial slurs and generally bad terms. I’ve brought this up many times that they need to identify words they don’t want to get autotranslated (it will happen by accident etc) and they keep putting it off and now it’s gone live and I wonder how long before the software I’ve helped write will be on the news.

CRIP EATIN BREAD
Jun 24, 2002

Hey stop worrying bout my acting bitch, and worry about your WACK ass music. In the mean time... Eat a hot bowl of Dicks! Ice T



Soiled Meat
they don’t have the time to pick words not to use but they did have plenty of time ensuring our in-dash dispatching and navigation system for drivers is blind accessible and 508 compliant

Shaggar
Apr 26, 2006
accessibility is probably more important than bad words

CRIP EATIN BREAD
Jun 24, 2002

Hey stop worrying bout my acting bitch, and worry about your WACK ass music. In the mean time... Eat a hot bowl of Dicks! Ice T



Soiled Meat

Shaggar posted:

accessibility is probably more important than bad words

read what I wrote. they wanted a system that was made for people driving cars to be usable by the blind.

Shaggar
Apr 26, 2006
please don't be so ableist

floatman
Mar 17, 2009

Powerful Two-Hander posted:

in all seriousness though, you should use a join because if you don't and just have a single column someone is gonna be posting in this thread in future going "how the do I unfuck this single column that has some mysterious and arbitrary meaning"

Let me tell you a secret friend,
/* Leans in real close, in a quiet voice,
"It already IS a hosed up single column that has mysterious and arbitrary meaning."

I do appreciate the point of view for genders represented as IDs, therefore every organisation can specify their own genders and labels etc.
Unfortunately,
1) there's already information stored for existing customers as gender in a single column that has mysterious and arbitrary meaning,
2) the assignment of meaning is hardcoded everywhere in the code base (including a hilarious SQL statement that assumes IF GENDER NOT "F" THEN "M")
3) and the real talk thing is, there's going to be a need to aggregate customers by genders over organisations. So assuming I let organisations specify id 1 = male for org A, id 123 = male for org B, id XYZ = male for org N, I still need to have some master mapping that states that "all these codes are considered male across all organisations". As the software is brutally shoved down the users throat, I'm thinking might as well just remove the choice from them to stop them from doing special snowflake things.

Like, I get the ideological purity as genders represented as IDs, allowing for many genders per organisation but the reality of living in this cruel world with shitfuck sins of our forefathers architecture is pushing me to a solution where "M" is male.
The gender as id solution is just niceties but it doesn't seem to bring anything to the table but more work for me.

floatman
Mar 17, 2009
Oh, and efficiencies in join? Who cares!
Codebase is so poo poo anyway we're spending a fortune on raw computing power to give the illusion that the system is just slow rather than unusable.

Shaggar
Apr 26, 2006

floatman posted:

Let me tell you a secret friend,
/* Leans in real close, in a quiet voice,
"It already IS a hosed up single column that has mysterious and arbitrary meaning."

I do appreciate the point of view for genders represented as IDs, therefore every organisation can specify their own genders and labels etc.
Unfortunately,
1) there's already information stored for existing customers as gender in a single column that has mysterious and arbitrary meaning,
2) the assignment of meaning is hardcoded everywhere in the code base (including a hilarious SQL statement that assumes IF GENDER NOT "F" THEN "M")
3) and the real talk thing is, there's going to be a need to aggregate customers by genders over organisations. So assuming I let organisations specify id 1 = male for org A, id 123 = male for org B, id XYZ = male for org N, I still need to have some master mapping that states that "all these codes are considered male across all organisations". As the software is brutally shoved down the users throat, I'm thinking might as well just remove the choice from them to stop them from doing special snowflake things.

Like, I get the ideological purity as genders represented as IDs, allowing for many genders per organisation but the reality of living in this cruel world with shitfuck sins of our forefathers architecture is pushing me to a solution where "M" is male.
The gender as id solution is just niceties but it doesn't seem to bring anything to the table but more work for me.

your real issue is the hardcoded logic like if not f then m. switching your current system to a coded table with ids would be relatively simple, you just build a new table and insert the unique values from the current column and replace them with their newly generated ids. then you can add filtering in the UI for which client sees which ids.

if at the end of the day the reality is you still need an M F or U to make a decision then the exercise is largely pointless. You can tell them instead you're going to use the hl7 administrative gender codes which are dumb but give you cover.

The MUMPSorceress
Jan 6, 2012


^SHTPSTS

Gary’s Answer

floatman posted:

Let me tell you a secret friend,
/* Leans in real close, in a quiet voice,
"It already IS a hosed up single column that has mysterious and arbitrary meaning."

I do appreciate the point of view for genders represented as IDs, therefore every organisation can specify their own genders and labels etc.
Unfortunately,
1) there's already information stored for existing customers as gender in a single column that has mysterious and arbitrary meaning,
2) the assignment of meaning is hardcoded everywhere in the code base (including a hilarious SQL statement that assumes IF GENDER NOT "F" THEN "M")
3) and the real talk thing is, there's going to be a need to aggregate customers by genders over organisations. So assuming I let organisations specify id 1 = male for org A, id 123 = male for org B, id XYZ = male for org N, I still need to have some master mapping that states that "all these codes are considered male across all organisations". As the software is brutally shoved down the users throat, I'm thinking might as well just remove the choice from them to stop them from doing special snowflake things.

Like, I get the ideological purity as genders represented as IDs, allowing for many genders per organisation but the reality of living in this cruel world with shitfuck sins of our forefathers architecture is pushing me to a solution where "M" is male.
The gender as id solution is just niceties but it doesn't seem to bring anything to the table but more work for me.

do you work at epic? you're describing the exact situation I spent years trying to remedy

CRIP EATIN BREAD
Jun 24, 2002

Hey stop worrying bout my acting bitch, and worry about your WACK ass music. In the mean time... Eat a hot bowl of Dicks! Ice T



Soiled Meat
implement a DSL using rows in a table as instructions so you can dynamically change the rules at any given time

prisoner of waffles
May 8, 2007

Ah! well a-day! what evil looks
Had I from old and young!
Instead of the cross, the fishmech
About my neck was hung.
gender invariants should be maintained at the application level.

change my mind

redleader
Aug 18, 2005

Engage according to operational parameters
represent gender as a set of floats in the range [0, 1] quantifying whatever aspects of gender are relevant to your application

make them nullable

and store it in a temporal table

CPColin
Sep 9, 2003

Big ol' smile.
Experts Exchange for a while had a class called FemaleCache because it was simpler to just hold in memory the entire set of member ID's who set their gender to Female in their profile than it would have been to query the database for this information. The class hung around for several years beyond the point when anything was reading from it.

pokeyman
Nov 26, 2006

That elephant ate my entire platoon.

CPColin posted:

Experts Exchange for a while had a class called FemaleCache because it was simpler to just hold in memory the entire set of member ID's who set their gender to Female in their profile than it would have been to query the database for this information. The class hung around for several years beyond the point when anything was reading from it.

:sigh:

redleader
Aug 18, 2005

Engage according to operational parameters

CPColin posted:

Experts Exchange for a while had a class called FemaleCache because it was simpler to just hold in memory the entire set of member ID's who set their gender to Female in their profile than it would have been to query the database for this information. The class hung around for several years beyond the point when anything was reading from it.

did that, uh, help? at all?

cinci zoo sniper
Mar 15, 2013




redleader posted:

represent gender as a set of floats in the range [0, 1] quantifying whatever aspects of gender are relevant to your application

make them nullable

and store it in a temporal table

*sighs drawing 0.69 katana*

floatman
Mar 17, 2009
Ultimately, each organisation's mapping of gender codes to labels will be sent from one system to another utilising a http endpoint that wraps comma separated string values in json to be further wrapped in an XML

I wish I was joking.

DELETE CASCADE
Oct 25, 2017

i haven't washed my penis since i jerked it to a phtotograph of george w. bush in 2003

CPColin posted:

Experts Exchange for a while had a class called FemaleCache because it was simpler to just hold in memory the entire set of member ID's who set their gender to Female in their profile than it would have been to query the database for this information. The class hung around for several years beyond the point when anything was reading from it.

Was this cache primarily used to fulfill search requests for Females to harass

my homie dhall
Dec 9, 2010

honey, oh please, it's just a machine
caches full of women

cinci zoo sniper
Mar 15, 2013




cjs: i asked for extra space to do pg_dumpall on dwh server and our genious sysops gave me a mount with 500 iops hard limit because we are so out of ssds on our hardware that id need to wait 40 days to get what i asked for, 500gb. guess the average iowait cpu utilisation for the last hour

akadajet
Sep 14, 2003

Ploft-shell crab posted:

caches full of women

lol

Workaday Wizard
Oct 23, 2009

by Pragmatica
let’s talk about http/3. good luck with nat lol

Adbot
ADBOT LOVES YOU

cinci zoo sniper
Mar 15, 2013




Shinku ABOOKEN posted:

let’s talk about http/3. good luck with nat lol

tfw u nat but it still sucks

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply