in some edge cases we had to break down the “other” further. if you deal with abstract CRUDFactoryViewController to make your rust team’s micro service talk to your php infrastructure, all of this doesn’t matter. if you need to received arbitrary real world data from disparate sources, you’ll take every null you can and engineer some of your own on top
|
|
# ? Dec 3, 2021 23:15 |
|
|
# ? Apr 24, 2024 17:07 |
|
sure but those are just different possibilities for values in the lookup table, not an actual null which behaves more like a non-value at the language level
|
# ? Dec 3, 2021 23:23 |
my sql in prod story: day 2 of my first tech job i decided to connect to our data warehouse (centralised, classical enterprise style, solo cluster dwh powering business-critical operational apps, with near-real time streaming micro batch data integration). my task was to write a query to summarise global state of affairs in my department’s garden, as a training exercise - basically select * from * type of affair. there was no docs detailing recommended sql querying workstation environment setup. threw quick software procurement ticket for dbeaver. got it, went into settings to setup connection (imagine postgres). im usually the kind of person to read every setting at least once, so i quickly found out something called “automatically commit transactions”, a boolean flag. committing stuff sounded scary, since what if i write one of those bad queries that wreck the database? so i disabled it, of course, and went to write my cartesian join on every fact table in the dwh time between query execution start and a dwh architect standing next to my desk (entire it was in the same building) was below 5 minutes
|
|
# ? Dec 3, 2021 23:24 |
|
we use bigquery as a data warehouse and i regularly write giant cross joins and nothing bad happens
|
# ? Dec 3, 2021 23:29 |
DELETE CASCADE posted:sure but those are just different possibilities for values in the lookup table, not an actual null which behaves more like a non-value at the language level not necessarily, it matters for querying ergonomics too edit: removed bad example cinci zoo sniper fucked around with this message at 23:53 on Dec 3, 2021 |
|
# ? Dec 3, 2021 23:32 |
DELETE CASCADE posted:we use bigquery as a data warehouse and i regularly write giant cross joins and nothing bad happens that query, similarly to regular postgres, placed transaction lock on vast swathes of database objects, setting the data integration process ablaze and temporarily killing end user applications reliant on low data latency
|
|
# ? Dec 3, 2021 23:35 |
|
why'd you point end user applications to the data warehouse? isn't it meant for analytics, not transaction processing?
|
# ? Dec 3, 2021 23:36 |
i think i may have written a bad example from phone, lemme actually do something more coherent insteadDELETE CASCADE posted:why'd you point end user applications to the data warehouse? isn't it meant for analytics, not transaction processing? it is meant for analytics indeed, part of users here were people who had to look at real time analytics for operational reasons
|
|
# ? Dec 3, 2021 23:39 |
|
oic, that'll do it then
|
# ? Dec 3, 2021 23:44 |
|
cinci zoo sniper posted:i think i may have written a bad example from phone, lemme actually do something more coherent instead They're supposed to be pointed at specific Data Marts being hydrated by the Data Lake!
|
# ? Dec 3, 2021 23:44 |
MrQueasy posted:They're supposed to be pointed at specific Data Marts being hydrated by the Data Lake! ill lock you up in the lakehouse if you continue to assault me with Gartner Magic Fighting Words™
|
|
# ? Dec 3, 2021 23:46 |
actually coherent example about querying ergonomics and null http://sqlfiddle.com/#!17/c3496/5/0
|
|
# ? Dec 4, 2021 00:00 |
|
cinci zoo sniper posted:ill lock you up in the lakehouse if you continue to assault me with Gartner Magic Fighting Words™ Is this a threat or a promise?
|
# ? Dec 4, 2021 00:01 |
|
last couple weeks ive been working on the "age" logic in transcriptions of historical source material & it's getting a bit annoying cause all these: - nobody has transcribed this field yet, its still undefined - nothing was written here, its an explicitly empty field - they wrote the number zero - they wrote a "nil" sign that means nothing, but isnt the number zero - they wrote a sign nobody knows theyre all significantly different theres display mode for users & transcription mode for volunteers, etc... it's important that the volunteers can see that they typed in 0 for the age, because otherwise they think the value got lost. but also it's important that empty and nil count as zero when a user runs a search. but not when its an untouched field or an unknown sign. etc i love what im working on, but i would love it a lot more if i didnt have to balance it over 4 code bases in 3 languages
|
# ? Dec 4, 2021 00:03 |
MrQueasy posted:Is this a threat or a promise? Lakehouse on Delta Lake and Databricks
|
|
# ? Dec 4, 2021 00:05 |
they have like umpteenth blog posts and "whitepapers" with verbal diarrhoea about "data lakehouse". like, just look at this nothingburger
|
|
# ? Dec 4, 2021 00:06 |
|
cinci zoo sniper posted:they have like umpteenth blog posts and "whitepapers" with verbal diarrhoea about "data lakehouse". like, just look at this nothingburger Oh, I am well aware! But still... some time at a lakehouse... the lake iced over... hot chocolate spiked with a bit of rum...
|
# ? Dec 4, 2021 00:08 |
spinning some yarn, heh, as you look out the window
|
|
# ? Dec 4, 2021 00:09 |
|
advent of code in awk day 3 trip report: I had forgotten that Awk has every variable in a global scope by default, and that to prevent scope from leaking in functions, people use a bunch of whitespace and declare function arguments for all local variables not to bleed. For example, the following: code:
Also: awk does not support returning arrays as values from functions, only having them as arguments. It also does not support assigning arrays as values to a variable. But apparently everything in the function arguments sort of works, so if you want to copy a 2-dimensional array you can do: code:
This may not have been my best idea for a project.
|
# ? Dec 4, 2021 02:35 |
|
MononcQc posted:advent of code in awk day 3 trip report:
|
# ? Dec 4, 2021 02:47 |
|
calll me when theress a dake heltalouse
|
# ? Dec 4, 2021 03:07 |
|
mystes posted:What the gently caress why would anyone use that, it's even worse than I would have imagined you mostly hope you don't have to, the moment your awk thing requires functions you hosed up and you're better off using another language. But awk is very much real good at doing command-line poo poo that would require a half dozen calls to cut or sed all at once, mostly around log parsing or dealing with semi-regular file output. For example, Kafka has commands with output that may look like: code:
code:
code:
Now once this is seen to be useful, we turn that into a more readable script or permanent metrics, but the ability to do that poo poo live on a server is pretty neat for anything that's read-once debugging aid.
|
# ? Dec 4, 2021 04:23 |
|
if you want to see what hell looks like, you can take a look at people doing network programming in awk: https://hub.packtpub.com/network-programming-gawk/ I have no idea why would anyone would do that, but then again, I decided to somehow spend more time there for fun during these holidays so what the hell.
|
# ? Dec 4, 2021 04:26 |
|
not mine, but some dude is doing AOC2021 in the Shakespeare programming language... https://github.com/SansCipher/Advent-of-Code-2021 A sample of day 2, part 1. code:
|
# ? Dec 4, 2021 05:40 |
|
we get coordinate values in all kinds of wacky formats and with latitude and longitude in whatever order. latitude ~60, longitude ~25 = finland. latitude ~25, longitude ~60 = arabian peninsula. we use the coordinates for showing map markers and i have added cleanup methods that flip the coordinate values if they look wrong in that specific way, then there is a conversion for "ETRS-TM35FIN" coordinates and finally if there are values that fall outside of the window of "finland and a bit of the surrounding area", they are just set to null
|
# ? Dec 4, 2021 08:31 |
|
cinci zoo sniper posted:they have like umpteenth blog posts and "whitepapers" with verbal diarrhoea about "data lakehouse". like, just look at this nothingburger data swamp
|
# ? Dec 4, 2021 10:17 |
|
sql would be nicer with a better type system. each key column could be its own unique type, making it an error if you try to join e.g. the order.id column to a customer id foreign key column. lose a bunch (maybe all) implicit conversions. make it easier to define and use custom types for columns. add units, so you can't add a value in dollars to a column that stores cents - or more advanced, make units part of the values, so you can't add a us cents value to a canadian cents value (and e.g. SUM() will error if your where clause includes multiple units). sqls should be actively working to design and implement features to improve data integrity, since that's like the #1 concern for a real database and reorder the parts of a sql statement so we can have non lovely autocomplete better ways of reusing common code would also be nice
|
# ? Dec 4, 2021 11:06 |
that would be nice, but probably is never going to happen
|
|
# ? Dec 4, 2021 11:13 |
|
oh it will 100% never happen, but it's nice to dream and theorycraft
|
# ? Dec 4, 2021 11:39 |
|
also the code re use thing, thats not happening either :/
|
# ? Dec 4, 2021 11:50 |
|
idk it's possible to an extent but held back by two things imo: terrible naming conventions in the db itself because people just choose whatever and don't notice existing objects that doo the same thing (getFoo, getallFoo, viewFoo, viewFooWithoutBar), and clunky linking in to calling code that makes it hard to see what existing data access methods (i.e. procs) you have because you just bang in the proc name into dapper and you're done. And the attempts to fix this with entity framework by putting an abstraction layer in between were total dogshit
|
# ? Dec 4, 2021 13:51 |
|
I've been reading about dbt and it has some cool ideas but this page felt very topsy turvy land to me: https://docs.getdbt.com/tutorial/using-jinja {%- set payment_methods = dbt_utils.get_column_values( table=ref('raw_payments'), column='payment_method' ) -%} select order_id, {%- for payment_method in payment_methods %} sum(case when payment_method = '{{payment_method}}' then amount end) as {{payment_method}}_amount {%- if not loop.last %},{% endif -%} {% endfor %} from {{ ref('raw_payments') }} group by 1 Maybe it's just because it's a dumb example to show the functionality buy I really don't buy that the end result is better than the "repetitive" original. also wouldn't a SQL pivot work just fine?
|
# ? Dec 4, 2021 15:12 |
|
Jinja2 is loving painful tbh. Unrelated, since I'm watching a 9 hour video of someone reading the coalition contract I wrote a short, terrible, terrible piece of code. Python code:
|
# ? Dec 4, 2021 15:20 |
|
All the words I tried Google translate on came out exactly as boring as you might expect
|
# ? Dec 4, 2021 16:01 |
|
sql prod story: Guy was doing some testing locally, updates a row on some table to point to an image stored on his machine. Dunno exactly what led to it happening between testing and running it live but long story short all the paths on the customer’s live DB are suddenly changed to point to the same path. Pretty sure the boss chewed him out, which he could probably have avoided since we all used the same admin account per customer, except that the path he changed it to was like “C:/Users/JohnSmith/…”.
|
# ? Dec 4, 2021 16:12 |
pointsofdata posted:I've been reading about dbt and it has some cool ideas but this page felt very topsy turvy land to me: dbt is fairly good, ive used it quite often in prod. unfortunately it’s also relatively underdeveloped, and will never actually get finished imo, because devs are focused on monetised it as a “cloud service” while jinja sucks, and i wish we had something better to template sql like that in, the example imo is ideologically fine - they’ve just picked something way too short. ive handled similar individual queries measuring hundreds-thousands loc, where writing it this way removes impressive amounts of boilerplate, and also helps with not having to remember what all the options for each field were my personal dbt highlight, which is in my experience underrated and, thus, under-utilised is tests. https://docs.getdbt.com/docs/building-a-dbt-project/tests
|
|
# ? Dec 4, 2021 16:17 |
|
god in heaven a stringbashing template system for sql queries? actually no i am not in the least bit surprised that somebody would build something that idiotic, this is the sort of thing you build when you have never heard of a query builder
|
# ? Dec 4, 2021 16:45 |
|
Sapozhnik posted:god in heaven a stringbashing template system for sql queries? It's unsafe to directly substitute strings into SQL but it's fine if you generate prepared statements based on the templates.
|
# ? Dec 4, 2021 17:03 |
it's worth noting that the use case for dbt and friends is DDL manipulations, you only do DML poo poo in them if you need to manually define an enum or something for your DDL to work
|
|
# ? Dec 4, 2021 18:17 |
|
|
# ? Apr 24, 2024 17:07 |
|
jinja2 is good but its for templating documents, not sql queries
|
# ? Dec 4, 2021 19:05 |