|
MALE SHOEGAZE posted:agh whhhhhhyy lmao They're using jruby, i.e. lovely ruby running on top of lovely jvm. It's like getting on a big fat diesel bus that takes a half-hour to get to your destination that is 5 mins away on foot. My first evar terrible programming job was managing an ETL process using perl scripts. I was importing multi-GB tag-delimited (NOT xml) text files into MS SQL Server 2000. I still love perl even though it's p-lang af because I barely knew programming but could still grind through 4 or 5 GB of text in mere seconds and I felt powerful.
|
# ? Aug 7, 2018 18:40 |
|
|
# ? Oct 6, 2024 01:34 |
|
Finster Dexter posted:i.e. lovely ruby
|
# ? Aug 7, 2018 18:43 |
c tp s: i did a small script today to simulate some stuff in python. it was perfect use case for data classes but intel has not updated their distribution to 3.7 yet
|
|
# ? Aug 7, 2018 18:47 |
|
thanks to whoever recommended the emulator101 tutorial it's great. should probably thank the person who wrote it. hope they like bitcoins.
|
# ? Aug 7, 2018 19:00 |
|
MALE SHOEGAZE posted:thanks to whoever recommended the emulator101 tutorial it's great. should probably thank the person who wrote it. hope they like bitcoins. oh no
|
# ? Aug 7, 2018 19:16 |
|
MALE SHOEGAZE posted:thanks to whoever recommended the emulator101 tutorial it's great. should probably thank the person who wrote it. hope they like bitcoins. yeah ive been building out an emulator because of that link. its been a fun distraction from work.
|
# ? Aug 7, 2018 19:26 |
|
HoboMan posted:oh no
|
# ? Aug 7, 2018 19:37 |
|
ctps: chriiiist wasted a week trying to figure out what's going wrong and it's all because the assembly containing the return type for a method had been loaded twice? chriiiiiist
|
# ? Aug 7, 2018 19:44 |
|
are there any libraries/frameworks/tools/etc for doing etl? surely it must be a solved problem by now. i ask because at work we have a bunch (idk, 100+?) of integrations that just do endless variants of 'slurp data in, mangle data, fart data out'. of course, each one is usually its own bespoke little thing, handcrafted with varying degrees of quality, failure handling, logging, retrying logic, and just generally giving-a-poo poo. this is surprisingly a bit of a problem for our ops team, who spend a lot of time chasing these lovely imports and exports
|
# ? Aug 7, 2018 21:30 |
|
redleader posted:are there any libraries/frameworks/tools/etc for doing etl? yes, tons of them redleader posted:surely it must be a solved problem by now. no. no? given how multifarious and poorly-defined the problem domain of ETL is (to say nothing of how different people might understand it in different ways and have different expectations) "solved problem" seems hopelessly optimistic. "lots of tools that will get you between 40% and 70% of the way there, depending on use case" seems realistic to me. I've used https://github.com/pentaho/pentaho-kettle at a job before and It Was Okay.
|
# ? Aug 7, 2018 21:38 |
prisoner of waffles posted:yes, tons of them we have our enterprise dwh built entirely on the pentaho stack (kettle, spoon, data integration, some other buzzwords - im not the developer of the project), and its quite decent. some things take time, there were 1 or 2 bugs along the road, but nothing unsolvable yet. we do not have, however, any meaningful ingests that are not rdbms or, proxy-ishly, document blobs stored in rdbms our dwh development lead misses some bespoke oracle tooling features every now and then, but hes oracle consultant formerly so thats expected the dwh and company are small, but the previous company also had all its etl managed in pentaho stack too and that was billions revenue 50 countries finance with enough numbers and a little bit more
|
|
# ? Aug 7, 2018 21:59 |
i can ask our guy at work tomorrow if he knows any alternatives hed dare to try
|
|
# ? Aug 7, 2018 22:00 |
|
redleader posted:are there any libraries/frameworks/tools/etc for doing etl? surely it must be a solved problem by now. SSIS
|
# ? Aug 7, 2018 22:03 |
|
redleader posted:are there any libraries/frameworks/tools/etc for doing etl? surely it must be a solved problem by now. dbt is great for stuff that fits in a single db. spark is good for everything else
|
# ? Aug 7, 2018 22:48 |
|
C tp s: the input I need to run in this pipeline is full of garbage data that our lovely India based editors manually hosed up. Consequently my day is "run, crash, remove offending data, repeat".
|
# ? Aug 7, 2018 23:17 |
|
just wrap the line parser in a try that continues on catch imo
|
# ? Aug 7, 2018 23:43 |
|
Corla Plankun posted:just wrap the line parser in a try that continues on catch imo It's more complex than that unfortunately. There's *layers of abstraction * involved and we have to have the pipeline fail when data is invalid so it can be cleaned. We can't quietly drop bad data. A pain when doing tests but necessary for production.
|
# ? Aug 7, 2018 23:46 |
|
ok this is kind of the wrong thread but can someone tell me what a particular component of a service principal name (spn) is for? I'm setting up a vendor app using kerberos for authentication and it says to use setspn as follow: code:
and I can't figure out what the gently caress servicename is supposed to be. Msdn docs have a host name or cname there instead but say its optional, and some other examples even an account name. does it matter what is actually used there? The vendor docs say "for a service called search_service, use searchservice as their servicename" which makes no sense edit fucks sake ms what kind of description is this: quote:serviceclass/host:port servicename OK so it is actually an account name? Powerful Two-Hander fucked around with this message at 00:10 on Aug 8, 2018 |
# ? Aug 7, 2018 23:59 |
|
Powerful Two-Hander posted:ok this is kind of the wrong thread but can someone tell me what a particular component of a service principal name (spn) is for? sounds like it's just the unique identifier. eg, if you wanted to delete the SP you'd use delete with the servicename you set there. but i have no idea, it just sounds like entity ID from saml which is also vague and frustrating. i'm basically only chiming in to express displeasure in saml.
|
# ? Aug 8, 2018 00:01 |
|
the service name is the account providing the service afaik. So if your service is running as domain\serviceuser use that. entity id is your issuer id. it basically identifies the authority for the saml token and is used in establishing from where the token originated.
|
# ? Aug 8, 2018 00:13 |
|
MALE SHOEGAZE posted:sounds like it's just the unique identifier. eg, if you wanted to delete the SP you'd use delete with the servicename you set there. i think you're right tbh because that would make sense in the wider context but then msdn docs say its optional unless it's a "non standard" spn and point to this page https://docs.microsoft.com/en-gb/windows/desktop/AD/name-formats-for-unique-spns which has somehow hosed the formatting so it removed the references to the syntax... quote:The SPNs for each replica have the same "" and "" components, where "" identifies more specifically the features provided by the service. Only the "" and optional "" components would vary from SPN to SPN. top tier trolling by the author
|
# ? Aug 8, 2018 00:15 |
|
Powerful Two-Hander posted:i think you're right tbh because that would make sense in the wider context but then msdn docs say its optional unless it's a "non standard" spn and point to this page https://docs.microsoft.com/en-gb/windows/desktop/AD/name-formats-for-unique-spns which has somehow hosed the formatting so it removed the references to the syntax...
|
# ? Aug 8, 2018 00:17 |
|
I think your documentation from the vendor is wrong since setting a servicename would be spn -s http/fqdn/servicename accountname afaict
|
# ? Aug 8, 2018 00:20 |
|
account name is required cause that account is used to handle the kerb encryption.
|
# ? Aug 8, 2018 00:21 |
|
gonadic io posted:goddamn logstash literally has a literal 12 minute startup time see if filebeat + ingest pipelines will do what you need, its possible you could cut out logstash entirely
|
# ? Aug 8, 2018 00:22 |
|
love 2 not escape angle brackets in html pages
|
# ? Aug 8, 2018 00:24 |
|
Shaggar posted:account name is required cause that account is used to handle the kerb encryption. yeah that's what i thought myself as i couldn't see how the SPN would be associated to the account otherwise can't believe I didn't think of the vendor docs being wrong though... they probably never expected anyone to read them lmao
|
# ? Aug 8, 2018 00:27 |
|
from what I gather you only need service name if you're going to be running that service on multiple hosts ex: http/node1.mycoolwebsite.com/www.mycoolwebsite.com http/node2.mycoolwebsite.com/www.mycoolwebsite.com instead of http/www.mycoolwebsite.com Which would I guess only work for one host. What im not clear on is how the host comes into play if you're using the same accounts on 2 nodes. If I use domain\mycoolwebsiteuser for the service on 2 nodes can they both use the http/www.mycoolwebsite.com spn?
|
# ? Aug 8, 2018 00:32 |
|
I think you can associate multiple SPNs to an account but they have to be unique... but then again I've seen other information saying that doing this can cause problems because the auth request might use the wrong spn for the host or something. Like you try to logon to host 2 but the ticket is under spn1 which isn't valid for that host. or something.
|
# ? Aug 8, 2018 00:42 |
|
ctps: I love slowly rolling my own k8s in bash in a circleci build scriptcode:
|
# ? Aug 8, 2018 14:29 |
|
jit bull transpile posted:It's more complex than that unfortunately. There's *layers of abstraction * involved and we have to have the pipeline fail when data is invalid so it can be cleaned. We can't quietly drop bad data. A pain when doing tests but necessary for production. Can you add a step before pipeline failure to count all the garbage data and generate some kind of report? then give it to whomever and tell them you can't accept their data unless it passes this validation
|
# ? Aug 8, 2018 14:37 |
|
Illusive gently caress Man posted:Can you add a step before pipeline failure to count all the garbage data and generate some kind of report? then give it to whomever and tell them you can't accept their data unless it passes this validation This is something I'm working on right now but we still gotta run in the mean time.
|
# ? Aug 8, 2018 17:15 |
|
jit bull transpile posted:This is something I'm working on right now but we still gotta run in the mean time. *shaggars in to room* WHY AREN'T YOU USING Microsoft (r) SQL Server (tm) Integration Services (tm)?
|
# ? Aug 8, 2018 21:39 |
|
i'm pretty sure it's literally illegal for most apple employees to use microsoft software on their work computers (it being an explicitly disallowed use of the machine and therefore a violation of CFAA).
|
# ? Aug 8, 2018 22:21 |
|
there are other apple employees who have posted about using Microsoft stuff (namely c#) in this thread or past variants. SSIS is extremely good. depending on circumstance it might even be worth it to get a sql server license just for SSIS. It cant (for the most part) turn bad data into good data but it can help protect your good data from bad data.
|
# ? Aug 8, 2018 23:10 |
|
you're right. apple misses out on a lot of really good software by not standardizing on windows. now that they're basically just the iphone company there's an opportunity to sell off the faltering mac business and transition over to a superior pc platform. i'm certain that microsoft would offer very reasonable prices for an enterprise agreement covering windows, office, sql server, dynamics, and visual studio.
|
# ? Aug 9, 2018 02:12 |
|
my first lovely programmer job was being a temp at a place optimizing sql queries to speed up crystal reports that were embedded in ASP 1.0 pages, which used COM objects for database access. after 8 months the other lead devs left and I was the lead.
|
# ? Aug 9, 2018 03:08 |
|
CRIP EATIN BREAD posted:my first lovely programmer job was being a temp at a place optimizing sql queries to speed up crystal reports that were embedded in ASP 1.0 pages, which used COM objects for database access. extremely literal
|
# ? Aug 9, 2018 03:56 |
|
It's a Hadoop pipeline anyway. It's Java and Linux all the way
|
# ? Aug 9, 2018 04:17 |
|
|
# ? Oct 6, 2024 01:34 |
|
materialized views: cool and also good
|
# ? Aug 9, 2018 04:55 |