Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
cinci zoo sniper
Mar 14, 2013



in some edge cases we had to break down the “other” further. if you deal with abstract CRUDFactoryViewController to make your rust team’s micro service talk to your php infrastructure, all of this doesn’t matter. if you need to received arbitrary real world data from disparate sources, you’ll take every null you can and engineer some of your own on top

Adbot
ADBOT LOVES YOU

DELETE CASCADE
Oct 25, 2017

i haven't washed my penis since i jerked it to a phtotograph of george w. bush in 2003


sure but those are just different possibilities for values in the lookup table, not an actual null which behaves more like a non-value at the language level

cinci zoo sniper
Mar 14, 2013



my sql in prod story: day 2 of my first tech job i decided to connect to our data warehouse (centralised, classical enterprise style, solo cluster dwh powering business-critical operational apps, with near-real time streaming micro batch data integration). my task was to write a query to summarise global state of affairs in my department’s garden, as a training exercise - basically select * from * type of affair.

there was no docs detailing recommended sql querying workstation environment setup. threw quick software procurement ticket for dbeaver. got it, went into settings to setup connection (imagine postgres). im usually the kind of person to read every setting at least once, so i quickly found out something called “automatically commit transactions”, a boolean flag. committing stuff sounded scary, since what if i write one of those bad queries that wreck the database? so i disabled it, of course, and went to write my cartesian join on every fact table in the dwh :v:

time between query execution start and a dwh architect standing next to my desk (entire it was in the same building) was below 5 minutes

DELETE CASCADE
Oct 25, 2017

i haven't washed my penis since i jerked it to a phtotograph of george w. bush in 2003


we use bigquery as a data warehouse and i regularly write giant cross joins and nothing bad happens

cinci zoo sniper
Mar 14, 2013



DELETE CASCADE posted:

sure but those are just different possibilities for values in the lookup table, not an actual null which behaves more like a non-value at the language level

not necessarily, it matters for querying ergonomics too

edit: removed bad example

cinci zoo sniper fucked around with this message at 22:53 on Dec 3, 2021

cinci zoo sniper
Mar 14, 2013



DELETE CASCADE posted:

we use bigquery as a data warehouse and i regularly write giant cross joins and nothing bad happens

that query, similarly to regular postgres, placed transaction lock on vast swathes of database objects, setting the data integration process ablaze and temporarily killing end user applications reliant on low data latency

DELETE CASCADE
Oct 25, 2017

i haven't washed my penis since i jerked it to a phtotograph of george w. bush in 2003


why'd you point end user applications to the data warehouse? isn't it meant for analytics, not transaction processing?

cinci zoo sniper
Mar 14, 2013



i think i may have written a bad example from phone, lemme actually do something more coherent instead

DELETE CASCADE posted:

why'd you point end user applications to the data warehouse? isn't it meant for analytics, not transaction processing?

it is meant for analytics indeed, part of users here were people who had to look at real time analytics for operational reasons

DELETE CASCADE
Oct 25, 2017

i haven't washed my penis since i jerked it to a phtotograph of george w. bush in 2003


oic, that'll do it then

MrQueasy
Nov 15, 2005

Quit shakin' me, kid!

cinci zoo sniper posted:

i think i may have written a bad example from phone, lemme actually do something more coherent instead

it is meant for analytics indeed, part of users here were people who had to look at real time analytics for operational reasons

They're supposed to be pointed at specific Data Marts being hydrated by the Data Lake!

cinci zoo sniper
Mar 14, 2013



MrQueasy posted:

They're supposed to be pointed at specific Data Marts being hydrated by the Data Lake!

ill lock you up in the lakehouse if you continue to assault me with Gartner Magic Fighting Words™

cinci zoo sniper
Mar 14, 2013



actually coherent example about querying ergonomics and null http://sqlfiddle.com/#!17/c3496/5/0

MrQueasy
Nov 15, 2005

Quit shakin' me, kid!

cinci zoo sniper posted:

ill lock you up in the lakehouse if you continue to assault me with Gartner Magic Fighting Words™

Is this a threat or a promise?

Carthag Tuek
Oct 15, 2005

Tider skal komme,
tider skal henrulle,
slægt skal følge slægters gang




last couple weeks ive been working on the "age" logic in transcriptions of historical source material & it's getting a bit annoying cause all these:
- nobody has transcribed this field yet, its still undefined
- nothing was written here, its an explicitly empty field
- they wrote the number zero
- they wrote a "nil" sign that means nothing, but isnt the number zero
- they wrote a sign nobody knows

theyre all significantly different

theres display mode for users & transcription mode for volunteers, etc... it's important that the volunteers can see that they typed in 0 for the age, because otherwise they think the value got lost. but also it's important that empty and nil count as zero when a user runs a search. but not when its an untouched field or an unknown sign. etc

i love what im working on, but i would love it a lot more if i didnt have to balance it over 4 code bases in 3 languages

cinci zoo sniper
Mar 14, 2013



MrQueasy posted:

Is this a threat or a promise?

Lakehouse on Delta Lake and Databricks

cinci zoo sniper
Mar 14, 2013



they have like umpteenth blog posts and "whitepapers" with verbal diarrhoea about "data lakehouse". like, just look at this nothingburger

MrQueasy
Nov 15, 2005

Quit shakin' me, kid!

cinci zoo sniper posted:

they have like umpteenth blog posts and "whitepapers" with verbal diarrhoea about "data lakehouse". like, just look at this nothingburger



Oh, I am well aware!

But still... some time at a lakehouse... the lake iced over... hot chocolate spiked with a bit of rum...

cinci zoo sniper
Mar 14, 2013



spinning some yarn, heh, as you look out the window

MononcQc
May 29, 2007

"I believe I did, Bob."



advent of code in awk day 3 trip report:

I had forgotten that Awk has every variable in a global scope by default, and that to prevent scope from leaking in functions, people use a bunch of whitespace and declare function arguments for all local variables not to bleed. For example, the following:
code:
function f(a,b,   i,j,c) {
   for (i=0; i<a; i++) {
       for (j=0, j<b, j++) {
           c += i*j
       }
   }
   return c
}
allows i, j, and c not to be globally scoped. As long as you call the function without these 3 arguments, they'll be initialized to zero values. If you forget to use one of these, you may just corrupt random poo poo in your program.

Also: awk does not support returning arrays as values from functions, only having them as arguments. It also does not support assigning arrays as values to a variable. But apparently everything in the function arguments sort of works, so if you want to copy a 2-dimensional array you can do:

code:
func copy2d(arr,out,    i,j) {   # GNU awk supports 'func' as shorthand for 'function'
    delete out
    for (i in arr) { for (j in arr[i]) { out[i][j]=arr[i][j] } }
}
...
{ copy2d(original_array, new_array) }
and the 'out' array will be emptied and filled with the contents from the 'arr' variable fine.

This may not have been my best idea for a project.

mystes
May 31, 2006



MononcQc posted:

advent of code in awk day 3 trip report:

I had forgotten that Awk has every variable in a global scope by default, and that to prevent scope from leaking in functions, people use a bunch of whitespace and declare function arguments for all local variables not to bleed. For example, the following:
code:
function f(a,b,   i,j,c) {
   for (i=0; i<a; i++) {
       for (j=0, j<b, j++) {
           c += i*j
       }
   }
   return c
}
allows i, j, and c not to be globally scoped. As long as you call the function without these 3 arguments, they'll be initialized to zero values. If you forget to use one of these, you may just corrupt random poo poo in your program.
What the gently caress why would anyone use that, it's even worse than I would have imagined

Carthag Tuek
Oct 15, 2005

Tider skal komme,
tider skal henrulle,
slægt skal følge slægters gang




calll me when theress a dake heltalouse

MononcQc
May 29, 2007

"I believe I did, Bob."



mystes posted:

What the gently caress why would anyone use that, it's even worse than I would have imagined

you mostly hope you don't have to, the moment your awk thing requires functions you hosed up and you're better off using another language.

But awk is very much real good at doing command-line poo poo that would require a half dozen calls to cut or sed all at once, mostly around log parsing or dealing with semi-regular file output.

For example, Kafka has commands with output that may look like:

code:
...
	Topic: topic-name	Partition: 0	Leader: 1276	Replicas: 1276,1273,1278	Isr: 1276,1273,1278	Offline:
	Topic: topic-name	Partition: 12	Leader: 1279	Replicas: 1279,1276,1272	Isr: 1276,1272,1279	Offline:
	Topic: topic-name	Partition: 17	Leader: 1279	Replicas: 1279,1272,1277	Isr: 1272,1277,1279	Offline:
	Topic: topic-name	Partition: 22	Leader: 1278	Replicas: 1278,1276,1273	Isr: 1278,1276,1273	Offline:
	Topic: topic-name	Partition: 25	Leader: 1279	Replicas: 1279,1272,1276	Isr: 1272,1276,1279	Offline:
	Topic: topic-name	Partition: 26	Leader: 1272	Replicas: 1272,1276,1279	Isr: 1272,1276,1279	Offline:
...
So I wanted to build a list of each Replica, and how many partitions it leads and how many it has in total with or without being a leader:
code:
$ kafka-topics --bootstrap-server localhost:9092 --describe --topic topic-name  | \   # output the whole list
  awk 'NR > 1 {print $8}' | \       # show the list of replicas
  awk -F ',' '                      # split on ',' for all variables ($1, $2, $3)
    { a[$1]+=1; b[$2]+=1; c[$3]+=1;  # account per rank, where a leads, b fails over, c follows
      all[$1]=1; all[$2]=1; all[$3]=1}   # just track which brokers exist by marking them in the array
    END {
       for (i in all) {
         # display each broker's rank
          print "broker:", i, "\t leading:", a[i], "\t non-leading:", b[i]+c[i], "\ttotal:", a[i]+b[i]+c[i] 
       }
   }'
Which then outputs:
code:
broker: 1262 	 leading: 6 	 non-leading: 6 	total: 12
broker: 1263 	 leading: 7 	 non-leading: 17 	total: 24
broker: 1264 	 leading: 7 	 non-leading: 13 	total: 20
broker: 1266 	 leading: 6 	 non-leading: 15 	total: 21
broker: 1267 	 leading: 5 	 non-leading: 17 	total: 22
broker: 1268 	 leading: 8 	 non-leading: 11 	total: 19
broker: 1269 	 leading: 2 	 non-leading: 3 	total: 5
and suddenly reveals that broker 1269 is a freaking bum and needs to get more partitions assigned to it, also what is going on with 1262?

Now once this is seen to be useful, we turn that into a more readable script or permanent metrics, but the ability to do that poo poo live on a server is pretty neat for anything that's read-once debugging aid.

MononcQc
May 29, 2007

"I believe I did, Bob."



if you want to see what hell looks like, you can take a look at people doing network programming in awk: https://hub.packtpub.com/network-programming-gawk/

I have no idea why would anyone would do that, but then again, I decided to somehow spend more time there for fun during these holidays so what the hell.

MrQueasy
Nov 15, 2005

Quit shakin' me, kid!

not mine, but some dude is doing AOC2021 in the Shakespeare programming language...

https://github.com/SansCipher/Advent-of-Code-2021

A sample of day 2, part 1.
code:
The Descent To Despair, a play in one act.

Romeo, a young man of Verona.
Tybalt, also a young man of Verona, but much smarter and more dashing than Romeo.
Mistress Page, a woman who loves gossip.
Mistress Ford, another woman who shares the same hobby as Mistress Page.
The Ghost, who keeps track of how much despair the Romeo is in.

                                        ACT I: The Descent To Despair.

                                   Scene I: The preparations of things to come.

[Enter Romeo and the Ghost]

Romeo:
  You are nothing.

The Ghost:
  Listen to your heart!

[Exeunt]

[Enter Mistress Page and Mistress Ford]

Mistress Ford:
  Listen to your heart!

Mistress Page:
  Listen to your heart!

[Exeunt]

[Enter Romeo and Tybalt]

Wheany
Mar 17, 2006

Spinyahahahahahahahahahahahaha!


Doctor Rope

we get coordinate values in all kinds of wacky formats and with latitude and longitude in whatever order. latitude ~60, longitude ~25 = finland. latitude ~25, longitude ~60 = arabian peninsula.

we use the coordinates for showing map markers and i have added cleanup methods that flip the coordinate values if they look wrong in that specific way, then there is a conversion for "ETRS-TM35FIN" coordinates and finally if there are values that fall outside of the window of "finland and a bit of the surrounding area", they are just set to null

pointsofdata
Apr 25, 2011



cinci zoo sniper posted:

they have like umpteenth blog posts and "whitepapers" with verbal diarrhoea about "data lakehouse". like, just look at this nothingburger



data swamp

redleader
Aug 18, 2005

Engage according to operational parameters

sql would be nicer with a better type system. each key column could be its own unique type, making it an error if you try to join e.g. the order.id column to a customer id foreign key column. lose a bunch (maybe all) implicit conversions. make it easier to define and use custom types for columns. add units, so you can't add a value in dollars to a column that stores cents - or more advanced, make units part of the values, so you can't add a us cents value to a canadian cents value (and e.g. SUM() will error if your where clause includes multiple units). sqls should be actively working to design and implement features to improve data integrity, since that's like the #1 concern for a real database

and reorder the parts of a sql statement so we can have non lovely autocomplete

better ways of reusing common code would also be nice

cinci zoo sniper
Mar 14, 2013



that would be nice, but probably is never going to happen

redleader
Aug 18, 2005

Engage according to operational parameters

oh it will 100% never happen, but it's nice to dream and theorycraft

Carthag Tuek
Oct 15, 2005

Tider skal komme,
tider skal henrulle,
slægt skal følge slægters gang




also the code re use thing, thats not happening either :/

Powerful Two-Hander
Mar 9, 2004

Mods please change my name to "Tooter Skeleton" TIA.



idk it's possible to an extent but held back by two things imo: terrible naming conventions in the db itself because people just choose whatever and don't notice existing objects that doo the same thing (getFoo, getallFoo, viewFoo, viewFooWithoutBar), and clunky linking in to calling code that makes it hard to see what existing data access methods (i.e. procs) you have because you just bang in the proc name into dapper and you're done.

And the attempts to fix this with entity framework by putting an abstraction layer in between were total dogshit

pointsofdata
Apr 25, 2011



I've been reading about dbt and it has some cool ideas but this page felt very topsy turvy land to me:
https://docs.getdbt.com/tutorial/using-jinja

{%- set payment_methods = dbt_utils.get_column_values(
table=ref('raw_payments'),
column='payment_method'
) -%}

select
order_id,
{%- for payment_method in payment_methods %}
sum(case when payment_method = '{{payment_method}}' then amount end) as {{payment_method}}_amount
{%- if not loop.last %},{% endif -%}
{% endfor %}
from {{ ref('raw_payments') }}
group by 1


Maybe it's just because it's a dumb example to show the functionality buy I really don't buy that the end result is better than the "repetitive" original.

also wouldn't a SQL pivot work just fine?

Antigravitas
Dec 8, 2019

I accept your surrender

Jinja2 is loving painful tbh.


Unrelated, since I'm watching a 9 hour video of someone reading the coalition contract I wrote a short, terrible, terrible piece of code.


Python code:
file = '/tmp/koalitionsvertrag.txt'
words = set()
with open(file, mode='r') as f:
    while (line := f.readline().strip()):
        for word in line.split(' '):
            words.add(word.replace('.', ''))
for idx, word in enumerate(sorted(words, key=len, reverse=True)):
    if idx >= 30:
        break
    print(f'{idx+1:2}: {word}')

 1: Kapitalanleger-Musterverfahrensgesetz
 2: Science-Entrepreneurship-Initiativen
 3: Schwerbehindertenvertrauenspersonen
 4: Nachhaltigkeitsberichterstattungen
 5: Innenentwicklungsmaßnahmengebietes
 6: Finanzkonteninformationsaustausch
 7: Betriebsrätemodernisierungsgesetz
 8: Waldbrandbekämpfungsmöglichkeiten
 9: Interoperabilitätsverpflichtungen
10: Wirtschaftlichkeitsuntersuchungen
11: Gleichstrom-Übertragungsleitungen
12: Brennstoffemissionshandelsgesetz
13: Barrierefreiheitsstärkungsgesetz
14: Gesundheitssicherstellungsgesetz
15: Schwangerschaftskonfliktberatung
16: Sicherheitsgesetzgebungsvorhaben
17: Behindertengleichstellungsgesetz
18: Unternehmens-Basisdatenregister
19: Anwartschaftsüberführungsgesetz
20: Erneuerbare-Energien-Richtlinie
21: Flottenmodernisierungsprogramms
22: Wohnungseigentumsgemeinschaften
23: Wissenschaftszeitvertragsgesetz
24: Planungssicherstellungsgesetzes
25: Wirtschaftlichkeitsuntersuchung
26: Gemeinnützigkeitsrechtsprechung
27: Internationalisierungsstrategie
28: Konjunkturbereinigungsverfahren
29: Kompetenzfeststellungsverfahren
30: Fachkräfteeinwanderungsgesetzes

pointsofdata
Apr 25, 2011



All the words I tried Google translate on came out exactly as boring as you might expect

Surprise T Rex
Apr 9, 2008



Dinosaur Gum

sql prod story:

Guy was doing some testing locally, updates a row on some table to point to an image stored on his machine. Dunno exactly what led to it happening between testing and running it live but long story short all the paths on the customer’s live DB are suddenly changed to point to the same path.

Pretty sure the boss chewed him out, which he could probably have avoided since we all used the same admin account per customer, except that the path he changed it to was like “C:/Users/JohnSmith/…”.

cinci zoo sniper
Mar 14, 2013



pointsofdata posted:

I've been reading about dbt and it has some cool ideas but this page felt very topsy turvy land to me:
https://docs.getdbt.com/tutorial/using-jinja

{%- set payment_methods = dbt_utils.get_column_values(
table=ref('raw_payments'),
column='payment_method'
) -%}

select
order_id,
{%- for payment_method in payment_methods %}
sum(case when payment_method = '{{payment_method}}' then amount end) as {{payment_method}}_amount
{%- if not loop.last %},{% endif -%}
{% endfor %}
from {{ ref('raw_payments') }}
group by 1


Maybe it's just because it's a dumb example to show the functionality buy I really don't buy that the end result is better than the "repetitive" original.

also wouldn't a SQL pivot work just fine?

dbt is fairly good, ive used it quite often in prod. unfortunately it’s also relatively underdeveloped, and will never actually get finished imo, because devs are focused on monetised it as a “cloud service”

while jinja sucks, and i wish we had something better to template sql like that in, the example imo is ideologically fine - they’ve just picked something way too short. ive handled similar individual queries measuring hundreds-thousands loc, where writing it this way removes impressive amounts of boilerplate, and also helps with not having to remember what all the options for each field were

my personal dbt highlight, which is in my experience underrated and, thus, under-utilised is tests. https://docs.getdbt.com/docs/building-a-dbt-project/tests

Sapozhnik
Jan 2, 2005



Nap Ghost

god in heaven a stringbashing template system for sql queries? :psypop:

actually no i am not in the least bit surprised that somebody would build something that idiotic, this is the sort of thing you build when you have never heard of a query builder

mystes
May 31, 2006



Sapozhnik posted:

god in heaven a stringbashing template system for sql queries? :psypop:

actually no i am not in the least bit surprised that somebody would build something that idiotic, this is the sort of thing you build when you have never heard of a query builder
Huh? ORMs have various limitations so lots of people prefer to write SQL directly.

It's unsafe to directly substitute strings into SQL but it's fine if you generate prepared statements based on the templates.

Adbot
ADBOT LOVES YOU

cinci zoo sniper
Mar 14, 2013



it's worth noting that the use case for dbt and friends is DDL manipulations, you only do DML poo poo in them if you need to manually define an enum or something for your DDL to work

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply