New around here? Register your SA Forums Account here!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Finster Dexter
Oct 20, 2014

Beyond is Finster's mad vision of Earth transformed.

MALE SHOEGAZE posted:

agh whhhhhhyy

there are better ways!

lmao They're using jruby, i.e. lovely ruby running on top of lovely jvm. It's like getting on a big fat diesel bus that takes a half-hour to get to your destination that is 5 mins away on foot.

My first evar terrible programming job was managing an ETL process using perl scripts. I was importing multi-GB tag-delimited (NOT xml) text files into MS SQL Server 2000. I still love perl even though it's p-lang af because I barely knew programming but could still grind through 4 or 5 GB of text in mere seconds and I felt powerful.

Adbot
ADBOT LOVES YOU

Ellie Crabcakes
Jan 31, 2008

Stop emailing my boyfriend Gay Crungus

Finster Dexter posted:

i.e. lovely ruby
bit redundant there

cinci zoo sniper
Mar 14, 2013




c tp s: i did a small script today to simulate some stuff in python. it was perfect use case for data classes but intel has not updated their distribution to 3.7 yet :(

DONT THREAD ON ME
Oct 1, 2002

by Nyc_Tattoo
Floss Finder
thanks to whoever recommended the emulator101 tutorial it's great. should probably thank the person who wrote it. hope they like bitcoins.

HoboMan
Nov 4, 2010

MALE SHOEGAZE posted:

thanks to whoever recommended the emulator101 tutorial it's great. should probably thank the person who wrote it. hope they like bitcoins.

oh no

necrotic
Aug 1, 2005
I owe my brother big time for this!

MALE SHOEGAZE posted:

thanks to whoever recommended the emulator101 tutorial it's great. should probably thank the person who wrote it. hope they like bitcoins.

yeah ive been building out an emulator because of that link. its been a fun distraction from work.

Ellie Crabcakes
Jan 31, 2008

Stop emailing my boyfriend Gay Crungus


prisoner of waffles
May 8, 2007

Ah! well a-day! what evil looks
Had I from old and young!
Instead of the cross, the fishmech
About my neck was hung.
ctps: chriiiist wasted a week trying to figure out what's going wrong and it's all because the assembly containing the return type for a method had been loaded twice? chriiiiiist

redleader
Aug 18, 2005

Engage according to operational parameters
are there any libraries/frameworks/tools/etc for doing etl? surely it must be a solved problem by now.

i ask because at work we have a bunch (idk, 100+?) of integrations that just do endless variants of 'slurp data in, mangle data, fart data out'. of course, each one is usually its own bespoke little thing, handcrafted with varying degrees of quality, failure handling, logging, retrying logic, and just generally giving-a-poo poo. this is surprisingly a bit of a problem for our ops team, who spend a lot of time chasing these lovely imports and exports

prisoner of waffles
May 8, 2007

Ah! well a-day! what evil looks
Had I from old and young!
Instead of the cross, the fishmech
About my neck was hung.

redleader posted:

are there any libraries/frameworks/tools/etc for doing etl?

yes, tons of them

redleader posted:

surely it must be a solved problem by now.

no. no? given how multifarious and poorly-defined the problem domain of ETL is (to say nothing of how different people might understand it in different ways and have different expectations) "solved problem" seems hopelessly optimistic.

"lots of tools that will get you between 40% and 70% of the way there, depending on use case" seems realistic to me.

I've used https://github.com/pentaho/pentaho-kettle at a job before and It Was Okay.

cinci zoo sniper
Mar 14, 2013




prisoner of waffles posted:

yes, tons of them


no. no? given how multifarious and poorly-defined the problem domain of ETL is (to say nothing of how different people might understand it in different ways and have different expectations) "solved problem" seems hopelessly optimistic.

"lots of tools that will get you between 40% and 70% of the way there, depending on use case" seems realistic to me.

I've used https://github.com/pentaho/pentaho-kettle at a job before and It Was Okay.

we have our enterprise dwh built entirely on the pentaho stack (kettle, spoon, data integration, some other buzzwords - im not the developer of the project), and its quite decent. some things take time, there were 1 or 2 bugs along the road, but nothing unsolvable yet. we do not have, however, any meaningful ingests that are not rdbms or, proxy-ishly, document blobs stored in rdbms

our dwh development lead misses some bespoke oracle tooling features every now and then, but hes oracle consultant formerly so thats expected

the dwh and company are small, but the previous company also had all its etl managed in pentaho stack too and that was billions revenue 50 countries finance with enough numbers and a little bit more

cinci zoo sniper
Mar 14, 2013




i can ask our guy at work tomorrow if he knows any alternatives hed dare to try

Shaggar
Apr 26, 2006

redleader posted:

are there any libraries/frameworks/tools/etc for doing etl? surely it must be a solved problem by now.

i ask because at work we have a bunch (idk, 100+?) of integrations that just do endless variants of 'slurp data in, mangle data, fart data out'. of course, each one is usually its own bespoke little thing, handcrafted with varying degrees of quality, failure handling, logging, retrying logic, and just generally giving-a-poo poo. this is surprisingly a bit of a problem for our ops team, who spend a lot of time chasing these lovely imports and exports

SSIS

the talent deficit
Dec 20, 2003

self-deprecation is a very british trait, and problems can arise when the british attempt to do so with a foreign culture





redleader posted:

are there any libraries/frameworks/tools/etc for doing etl? surely it must be a solved problem by now.

i ask because at work we have a bunch (idk, 100+?) of integrations that just do endless variants of 'slurp data in, mangle data, fart data out'. of course, each one is usually its own bespoke little thing, handcrafted with varying degrees of quality, failure handling, logging, retrying logic, and just generally giving-a-poo poo. this is surprisingly a bit of a problem for our ops team, who spend a lot of time chasing these lovely imports and exports

dbt is great for stuff that fits in a single db. spark is good for everything else

The MUMPSorceress
Jan 6, 2012


^SHTPSTS

Gary’s Answer
C tp s: the input I need to run in this pipeline is full of garbage data that our lovely India based editors manually hosed up. Consequently my day is "run, crash, remove offending data, repeat".

Corla Plankun
May 8, 2007

improve the lives of everyone
just wrap the line parser in a try that continues on catch imo

The MUMPSorceress
Jan 6, 2012


^SHTPSTS

Gary’s Answer

Corla Plankun posted:

just wrap the line parser in a try that continues on catch imo

It's more complex than that unfortunately. There's *layers of abstraction * involved and we have to have the pipeline fail when data is invalid so it can be cleaned. We can't quietly drop bad data. A pain when doing tests but necessary for production.

Powerful Two-Hander
Mar 9, 2004

Mods please change my name to "Tooter Skeleton" TIA.


ok this is kind of the wrong thread but can someone tell me what a particular component of a service principal name (spn) is for?

I'm setting up a vendor app using kerberos for authentication and it says to use setspn as follow:


code:

Setspn -S http/{fqdn of host}@domain.com {servicename} 

and I can't figure out what the gently caress servicename is supposed to be. Msdn docs have a host name or cname there instead but say its optional, and some other examples even an account name.

does it matter what is actually used there? The vendor docs say "for a service called search_service, use searchservice as their servicename" which makes no sense

edit fucks sake ms what kind of description is this:

quote:

serviceclass/host:port servicename

serviceclass and host are required, but port and service name are optional. The colon between host and port is only required when a port is present.

For example, to register the FIMService on the standard port (meaning you don't have to specify the port number) on a computer named FIMSVR in a domain named contoso.com that is using a service account named FIMService, use the following command:

setspn -s FIMService/FIMSVR.contoso.com CONTOSO\FIMService

OK so it is actually an account name?

Powerful Two-Hander fucked around with this message at 23:10 on Aug 7, 2018

DONT THREAD ON ME
Oct 1, 2002

by Nyc_Tattoo
Floss Finder

Powerful Two-Hander posted:

ok this is kind of the wrong thread but can someone tell me what a particular component of a service principal name (spn) is for?

I'm setting up a vendor app using kerberos for authentication and it says to use setspn as follow:


code:
Setspn -S http/{fqdn of host}@domain.com {servicename} 
and I can't figure out what the gently caress servicename is supposed to be. Msdn docs have a host name or cname there instead but say its optional, and some other examples even an account name.

does it matter what is actually used there?

sounds like it's just the unique identifier. eg, if you wanted to delete the SP you'd use delete with the servicename you set there.

but i have no idea, it just sounds like entity ID from saml which is also vague and frustrating. i'm basically only chiming in to express displeasure in saml.

Shaggar
Apr 26, 2006
the service name is the account providing the service afaik. So if your service is running as domain\serviceuser use that.


entity id is your issuer id. it basically identifies the authority for the saml token and is used in establishing from where the token originated.

Powerful Two-Hander
Mar 9, 2004

Mods please change my name to "Tooter Skeleton" TIA.


MALE SHOEGAZE posted:

sounds like it's just the unique identifier. eg, if you wanted to delete the SP you'd use delete with the servicename you set there.

but i have no idea, it just sounds like entity ID from saml which is also vague and frustrating. i'm basically only chiming in to express displeasure in saml.

i think you're right tbh because that would make sense in the wider context but then msdn docs say its optional unless it's a "non standard" spn and point to this page https://docs.microsoft.com/en-gb/windows/desktop/AD/name-formats-for-unique-spns which has somehow hosed the formatting so it removed the references to the syntax...


quote:

The SPNs for each replica have the same "" and "" components, where "" identifies more specifically the features provided by the service. Only the "" and optional "" components would vary from SPN to SPN.

top tier trolling by the author

anthonypants
May 6, 2007

by Nyc_Tattoo
Dinosaur Gum

Powerful Two-Hander posted:

i think you're right tbh because that would make sense in the wider context but then msdn docs say its optional unless it's a "non standard" spn and point to this page https://docs.microsoft.com/en-gb/windows/desktop/AD/name-formats-for-unique-spns which has somehow hosed the formatting so it removed the references to the syntax...


top tier trolling by the author
lmao that is insanely bad

Shaggar
Apr 26, 2006
I think your documentation from the vendor is wrong since setting a servicename would be spn -s http/fqdn/servicename accountname afaict

Shaggar
Apr 26, 2006
account name is required cause that account is used to handle the kerb encryption.

Arcsech
Aug 5, 2008

gonadic io posted:

goddamn logstash literally has a literal 12 minute startup time

Apparently it's written in jruby and the plugins are written in Ruby itself which get evaled a bunch

see if filebeat + ingest pipelines will do what you need, its possible you could cut out logstash entirely

Jabor
Jul 16, 2010

#1 Loser at SpaceChem
love 2 not escape angle brackets in html pages

Powerful Two-Hander
Mar 9, 2004

Mods please change my name to "Tooter Skeleton" TIA.


Shaggar posted:

account name is required cause that account is used to handle the kerb encryption.

yeah that's what i thought myself as i couldn't see how the SPN would be associated to the account otherwise

can't believe I didn't think of the vendor docs being wrong though... they probably never expected anyone to read them lmao

Shaggar
Apr 26, 2006
from what I gather you only need service name if you're going to be running that service on multiple hosts ex:

http/node1.mycoolwebsite.com/www.mycoolwebsite.com
http/node2.mycoolwebsite.com/www.mycoolwebsite.com

instead of
http/www.mycoolwebsite.com

Which would I guess only work for one host. What im not clear on is how the host comes into play if you're using the same accounts on 2 nodes. If I use domain\mycoolwebsiteuser for the service on 2 nodes can they both use the http/www.mycoolwebsite.com spn?

Powerful Two-Hander
Mar 9, 2004

Mods please change my name to "Tooter Skeleton" TIA.


I think you can associate multiple SPNs to an account but they have to be unique... but then again I've seen other information saying that doing this can cause problems because the auth request might use the wrong spn for the host or something. Like you try to logon to host 2 but the ticket is under spn1 which isn't valid for that host.

or something.

gonadic io
Feb 16, 2011

>>=
ctps: I love slowly rolling my own k8s in bash in a circleci build script

code:
      - add_ssh_keys
      - run:
          name: SSH to server and deploy
          command: |
            ssh -o "StrictHostKeyChecking=no" \
                   "$SSH_ADDRESS" \
                   "cat json.key | docker login -u _json_key --password-stdin [url]https://eu.gcr.io[/url] ; \
                    docker pull eu.gcr.io/dom-5-status/dom-5-status; \
                    RUNNING_CONTAINERS=docker ps -q; \
                    if [ ! -z $RUNNING_CONTAINERS ]; then docker stop $RUNNING_CONTAINERS; fi; \
                    docker run -it -d \
                      --restart unless-stopped \
                      --volume /app/resources:/usr/src/myapp/resources  \
                      eu.gcr.io/dom-5-status/dom-5-status"

Illusive Fuck Man
Jul 5, 2004
RIP John McCain feel better xoxo 💋 🙏
Taco Defender

jit bull transpile posted:

It's more complex than that unfortunately. There's *layers of abstraction * involved and we have to have the pipeline fail when data is invalid so it can be cleaned. We can't quietly drop bad data. A pain when doing tests but necessary for production.

Can you add a step before pipeline failure to count all the garbage data and generate some kind of report? then give it to whomever and tell them you can't accept their data unless it passes this validation

The MUMPSorceress
Jan 6, 2012


^SHTPSTS

Gary’s Answer

Illusive gently caress Man posted:

Can you add a step before pipeline failure to count all the garbage data and generate some kind of report? then give it to whomever and tell them you can't accept their data unless it passes this validation

This is something I'm working on right now but we still gotta run in the mean time.

redleader
Aug 18, 2005

Engage according to operational parameters

jit bull transpile posted:

This is something I'm working on right now but we still gotta run in the mean time.

*shaggars in to room* WHY AREN'T YOU USING Microsoft (r) SQL Server (tm) Integration Services (tm)?

Fiedler
Jun 29, 2002

I, for one, welcome our new mouse overlords.
i'm pretty sure it's literally illegal for most apple employees to use microsoft software on their work computers (it being an explicitly disallowed use of the machine and therefore a violation of CFAA).

Shaggar
Apr 26, 2006
there are other apple employees who have posted about using Microsoft stuff (namely c#) in this thread or past variants.

SSIS is extremely good. depending on circumstance it might even be worth it to get a sql server license just for SSIS.

It cant (for the most part) turn bad data into good data but it can help protect your good data from bad data.

Fiedler
Jun 29, 2002

I, for one, welcome our new mouse overlords.
you're right. apple misses out on a lot of really good software by not standardizing on windows. now that they're basically just the iphone company there's an opportunity to sell off the faltering mac business and transition over to a superior pc platform. i'm certain that microsoft would offer very reasonable prices for an enterprise agreement covering windows, office, sql server, dynamics, and visual studio.

CRIP EATIN BREAD
Jun 24, 2002

Hey stop worrying bout my acting bitch, and worry about your WACK ass music. In the mean time... Eat a hot bowl of Dicks! Ice T



Soiled Meat
my first lovely programmer job was being a temp at a place optimizing sql queries to speed up crystal reports that were embedded in ASP 1.0 pages, which used COM objects for database access.

after 8 months the other lead devs left and I was the lead.

Mao Zedong Thot
Oct 16, 2008


CRIP EATIN BREAD posted:

my first lovely programmer job was being a temp at a place optimizing sql queries to speed up crystal reports that were embedded in ASP 1.0 pages, which used COM objects for database access.

after 8 months the other lead devs left and I was the lead.

extremely literal :same:

The MUMPSorceress
Jan 6, 2012


^SHTPSTS

Gary’s Answer
It's a Hadoop pipeline anyway. It's Java and Linux all the way

Adbot
ADBOT LOVES YOU

DELETE CASCADE
Oct 25, 2017

i haven't washed my penis since i jerked it to a phtotograph of george w. bush in 2003
materialized views: cool and also good

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply