Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Corla Plankun
May 8, 2007

improve the lives of everyone
if you have workflow infrastructure why not just deploy them all on their own workflows? it sounds like something a tiny little k8s worker could handle and you could scale however you wanted

Adbot
ADBOT LOVES YOU

Corla Plankun
May 8, 2007

improve the lives of everyone
ime optimizing run time inside a python script is a waste. either write it in a faster lang if it matters, or deploy twenty baby worker instances with different params if it doesn't

Corla Plankun
May 8, 2007

improve the lives of everyone
a nice thing about doing lots of workers is when one of the steps crashes, which it will eventually because of pythons crap typing, the rest won't even notice

12 rats tied together
Sep 7, 2006

theres a lot of low hanging fruit you can knock out quickly but yeah you quickly hit diminishing returns trying to make normal python (esp. cpython) go fast in a single process.

op you should rewrite it in cython

MrQueasy
Nov 15, 2005

Probiot-ICK

shoeberto posted:

I'm not really sure what it's doing, it seems to get blocked randomly when there are no resources being used, either on the DB or disk IO. I'm still just playing with it though. We've used this pattern with PHP code before without much trouble so part of me expected it to just work.

It takes 7 hours to fully run, it's going to be run routinely, and each work set really is independent. It's not 100% necessary but it would be nice. I'm not going to sink a ton of time into it though.

I've done a less sophisticated version of this previously with wrapping Python scriprs, but I was hoping to do it from within the code itself rather than just wrapping. We have some workflow infrastructure built on this project now so doing Bash-y type process management isn't really a "clean" solution (even if it would get the job done)

Could you leverage something like AWS Lambda or Batch/ECS to split up the work more?

CarForumPoster
Jun 26, 2013

⚡POWER⚡

Corla Plankun posted:

a nice thing about doing lots of workers is when one of the steps crashes, which it will eventually because of pythons crap typing, the rest won't even notice

drat i felt this

Sapozhnik
Jan 2, 2005

Nap Ghost
Imagine writing new software in python. lol. also lmao.

psiox
Oct 15, 2001

Babylon 5 Street Team

Sapozhnik posted:

Imagine writing new software in python. lol. also lmao.

fart simpson
Jul 2, 2005

DEATH TO AMERICA
:xickos:

Sapozhnik posted:

Imagine writing new software in python. lol. also lmao.

MrQueasy
Nov 15, 2005

Probiot-ICK

Sapozhnik posted:

Imagine writing new software in python. lol. also lmao.

If you don't write it in python, you have to write it in Java.

Java 6.

pokeyman
Nov 26, 2006

That elephant ate my entire platoon.

MrQueasy posted:

If you don't write it in python, you have to write it in Java.

Java 6.

can I have mypy?

if no, java 6 sounds fine

MrQueasy
Nov 15, 2005

Probiot-ICK

pokeyman posted:

can I have mypy?

if no, java 6 sounds fine

No, only jython.

FlapYoJacks
Feb 12, 2009

pokeyman posted:

can I have mypy?

if no, java 6 sounds fine

Mypy and black make Python perfectly acceptable.

Soricidus
Oct 21, 2010
freedom-hating statist shill

MrQueasy posted:

No, only jython.

oh well in that case it’s fine, jython is the one python implementation that is actually capable of running two threads at once

Workaday Wizard
Oct 23, 2009

by Pragmatica
do it in erlang for maximum parallelization. (or just use separate OS processes that's what they're there for lol).

MrQueasy
Nov 15, 2005

Probiot-ICK

Soricidus posted:

oh well in that case it’s fine, jython is the one python implementation that is actually capable of running two threads at once

god drat it... all I've got left to break your enthusiasm is that you have to use Dr. Java as your ide.

champagne posting
Apr 5, 2006

YOU ARE A BRAIN
IN A BUNKER

Soricidus posted:

oh well in that case it’s fine, jython is the one python implementation that is actually capable of running two threads at once

at this stage you might as well write your program in a jvm language

fart simpson
Jul 2, 2005

DEATH TO AMERICA
:xickos:

Boiled Water posted:

at this stage you might as well write your program in a jvm language

jython is a jvm language tho

Kazinsal
Dec 13, 2011



there's a dialect of python that's literally a hobby project to provide a python-like language on a hobby unix-like OS that has better multiprogramming in the base language than actual python does.

python multiprogramming is so bad it makes mass vfork/exec look like a modern paradigm

Antigravitas
Dec 8, 2019

Die Rettung fuer die Landwirte:
I just plug a function that does the work and an iterator that spits out jobs for the function into a Pool from python's multiprocessing module, and I haven't needed more than that so far.

Multithreading though…just don't. Don't do Python multithreading.

Destroyenator
Dec 27, 2004

Don't ask me lady, I live in beer
also python multiprocessing just doesn’t work on lambdas because of the environment constraints so even if you have modern libraries that work with multiprocessing you get a fun surprise when you deploy

NihilCredo
Jun 6, 2011

iram omni possibili modo preme:
plus una illa te diffamabit, quam multæ virtutes commendabunt

disclaimer: i know nothing about python

if python's parallelism is so poo poo, how do python webapps even work (django/flask/etc.)? do they spin up a new interpreter process for every request?

CarForumPoster
Jun 26, 2013

⚡POWER⚡

NihilCredo posted:

disclaimer: i know nothing about python

if python's parallelism is so poo poo, how do python webapps even work (django/flask/etc.)? do they spin up a new interpreter process for every request?

flask uses threading, most people use gunicorn to deploy a flask app in prod. here’s how gunicorn works to parallelize:

https://docs.gunicorn.org/en/stable/design.html

Python is cool and good

Destroyenator posted:

also python multiprocessing just doesn’t work on lambdas because of the environment constraints so even if you have modern libraries that work with multiprocessing you get a fun surprise when you deploy

you mean because lambdas only get one vCPU or some other limitation? you get more than one depending on the memory you allocate to the lambda iirc

CarForumPoster fucked around with this message at 10:51 on Jun 22, 2021

mystes
May 31, 2006

Python is garbage and the gil is terrible.

Destroyenator
Dec 27, 2004

Don't ask me lady, I live in beer

CarForumPoster posted:

you mean because lambdas only get one vCPU or some other limitation? you get more than one depending on the memory you allocate to the lambda iirc

lambdas don’t have access to some shared memory device /dev/shm and multiprocessing needs that to function

CarForumPoster
Jun 26, 2013

⚡POWER⚡

Destroyenator posted:

lambdas don’t have access to some shared memory device /dev/shm and multiprocessing needs that to function

oh weird. haven’t stepped on that landmine yet so thanks. step functions are what I’ve used to idiot proof concurrency with lambdas, working good so far

my homie dhall
Dec 9, 2010

honey, oh please, it's just a machine

NihilCredo posted:

if python's parallelism is so poo poo, how do python webapps even work (django/flask/etc.)? do they spin up a new interpreter process for every request?

pretty much, yeah, the optimization is they fork ahead of time lol

fritz
Jul 26, 2003

Soricidus posted:

that doesn’t take long

it’s python, it will.

my homie dhall
Dec 9, 2010

honey, oh please, it's just a machine

Sapozhnik posted:

Imagine writing new software in python. lol. also lmao.

a real problem is that python is just too drat accessible. there's an interpreter on every platform and its so easy to just spew out across everything. and pretty much anyone with any programming background at all can manage to bang out whatever they need to do with it, even if it's not "good python"

gonadic io
Feb 16, 2011

>>=
see also: javascript

CarForumPoster
Jun 26, 2013

⚡POWER⚡

my homie dhall posted:

a real problem is that python is just too drat accessible. there's an interpreter on every platform and its so easy to just spew out across everything. and pretty much anyone with any programming background at all can manage to bang out whatever they need to do with it, even if it's not "good python"

yes this is problem

software is about the code, guys

its about linting

its about types

dont forget execution time

mystes
May 31, 2006

CarForumPoster posted:

yes this is problem

software is about the code, guys

its about linting

its about types

dont forget execution time
I'll bring the guillotine.

my homie dhall
Dec 9, 2010

honey, oh please, it's just a machine

CarForumPoster posted:

yes this is problem

software is about the code, guys

its about linting

its about types

dont forget execution time

what I'm trying to say is that everyone "knows enough to be dangerous" when it comes to python. the existence of code inherently has negative value and python makes it very easy to produce

Shaggar
Apr 26, 2006
use c#, op

shoeberto
Jun 13, 2020

which way to the MACHINES?

Corla Plankun posted:

ime optimizing run time inside a python script is a waste. either write it in a faster lang if it matters, or deploy twenty baby worker instances with different params if it doesn't
It's not really that it matters, it's just batch processing, but it'd be nice to save time. I'm going to look into how to split the work out a bit though. I've heard people talking about Dask for stuff like this so I'm going to take a look at it.

MrQueasy posted:

Could you leverage something like AWS Lambda or Batch/ECS to split up the work more?
Maybe - I need to research more. We've always had relatively dumb processing pipelines (just monolothic scripts + a database, usually) so I'd need to adapt. But it seems worthwhile.

Sapozhnik posted:

Imagine writing new software in python. lol. also lmao.
Imagine re-implementing Pandas and Scipy in another language. It's batch data processing, not a CRUD app.

shoeberto
Jun 13, 2020

which way to the MACHINES?

my homie dhall posted:

a real problem is that python is just too drat accessible. there's an interpreter on every platform and its so easy to just spew out across everything. and pretty much anyone with any programming background at all can manage to bang out whatever they need to do with it, even if it's not "good python"

We have very specific needs that Python suits very well, and try to do solid development on top of it, but yeah, I've seen this. I had to hire someone to help us with our ETL stuff and the application process was a shitshow.

I know that people have Feelings about homework for job applications, but it was 100% necessary for us to filter out the majority of "data scientists" that don't understand what a data structure is.

Shaggar
Apr 26, 2006
using anything other than ssis for etl is insane.

shoeberto
Jun 13, 2020

which way to the MACHINES?

Shaggar posted:

using anything other than ssis for etl is insane.

Does it work with Postgres or Hadoop clusters?

Powerful Two-Hander
Mar 10, 2004

Mods please change my name to "Tooter Skeleton" TIA.


we have all sorts of etl farming going on in Informatica instances and I swear at least 50% of it must be worthless file transforms and shifting from a to b

Adbot
ADBOT LOVES YOU

Shaggar
Apr 26, 2006
ssis will work with any db, you just need an appropriate driver installed.

for example: a good use of ssis would be to get all of your poo poo out of postgres and into sql server so you dont need to deal with postgres anymore.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply