what the fuck is prometheus anyway? a thread about monitoring - The Something Awful Forums

Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > YOSPOS > what the fuck is prometheus anyway? a thread about monitoring

exe cummings: Jan 22, 2005

lol we just replaced our JMS-driven event bus with a kafka/zoopkeeper solution. the entire enterprise publisher/subscription config process is still an email to the one guy who knows how it works.

# ? May 15, 2019 15:19

Adbot: ADBOT LOVES YOU

# ? Apr 25, 2024 17:50

animist: Aug 28, 2018

yard salad posted:

lol we just replaced our JMS-driven event bus with a kafka/zoopkeeper solution. the entire enterprise publisher/subscription config process is still an email to the one guy who knows how it works.

you send a message and eventually you get a response. sounds like any other distributed system to me imo

# ? May 25, 2019 22:32

distortion park: Apr 25, 2011

elasticsearch changed their data model between major versions so that you can't have multiple types per index anymore. making breaking changes between major versions is fair enough but it doesn't stop it being frustrating when you have to go make changes to your logging code just for this.

# ? Jun 14, 2019 16:16

animist: Aug 28, 2018

just use grep

# ? Jun 16, 2019 15:29

distortion park: Apr 25, 2011

animist posted:

just use grep

there's a reason we always log to a local file as well as ELK

# ? Jun 17, 2019 08:44

MrMoo: Sep 14, 2000

I've updated a thing to use Prometheus-cpp to export metrics to DataDog. It's a bit poo poo but just about works (tm).

I'd like the Civetweb HTTP engine replaced with a Boost/Beast wrapping to sit on an IO context. I'd like not absolutely everything running in the constructor so that it's easier to create a non-centralised metric deployment, like Id Software's idCVar for configuration.

I have no idea what DataDog does, I just see it running in ECS.

# ? Jul 7, 2019 21:28

pram: Jun 10, 2001

datadog is a metrics/alerting service you $$$ for. its nice

# ? Jul 8, 2019 04:38

Progressive JPEG: Feb 19, 2003

datadog gets very sad if you have lots of tags/cardinality

# ? Jul 8, 2019 09:43

pram: Jun 10, 2001

nothing more money wont fix

# ? Jul 8, 2019 09:49

Blinkz0rz: May 27, 2001; MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS

Progressive JPEG posted:

datadog gets very sad if you have lots of tags/cardinality

they had a bug for a long time where their unique identifier was the hostname which is fine except we reuse ip addresses and consequently hostnames as instances come up and down so we'd get all sorts of misattributed events and metrics until we figured out what was going on

# ? Jul 8, 2019 11:19

Captain Foo: May 11, 2004; we vibin'
we slidin'
we breathin'
we dyin'

Blinkz0rz posted:

they had a bug for a long time where their unique identifier was the hostname

lol what the gently caress

# ? Jul 8, 2019 13:09

Blinkz0rz: May 27, 2001; MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS

i think there were other conditions that caused it to became the identifier but it was a big old pain in the butt let me tell you

# ? Jul 8, 2019 13:47

Shaggar: Apr 26, 2006

its amazing how many junior devs make the mistake of using name as a unique id instead of treating it as the display field it is.

# ? Jul 8, 2019 14:36

distortion park: Apr 25, 2011

We still write local text logs for everything as backup because all the modern solutions either crash and lose data occasionally or have broken auth

# ? Jul 8, 2019 21:33

distortion park: Apr 25, 2011

Kafka in particular crashes more frequently than most of the services which write to it

# ? Jul 8, 2019 21:34

pram: Jun 10, 2001

yes kafka is loving garbage software

# ? Jul 8, 2019 21:41

animist: Aug 28, 2018

is there a good way to identify bottlenecks in a bunch of short-lived cloud spot instances? prometheus has pushgateway which seems ok but i haven't tried to actually set it up yet

# ? Jul 9, 2019 07:24

CRIP EATIN BREAD: Jun 24, 2002; Hey stop worrying bout my acting bitch, and worry about your WACK ass music. In the mean time... Eat a hot bowl of Dicks! Ice T; Soiled Meat

you want tracing, not just metrics.

# ? Jul 9, 2019 17:09

psiox: Oct 15, 2001; Babylon 5 Street Team

re: tracing, this was a post i enjoyed about stack overflow's monitoring setup

https://nickcraver.com/blog/2018/11/29/stack-overflow-how-we-do-monitoring/

# ? Jul 9, 2019 20:06

CRIP EATIN BREAD: Jun 24, 2002; Hey stop worrying bout my acting bitch, and worry about your WACK ass music. In the mean time... Eat a hot bowl of Dicks! Ice T; Soiled Meat

we use the opentracing api for everything and use jaeger as our backend which feeds it into elasticsearch

its cool when someone used a constant sampler (samples every trace) and we were producing 10gb of traces each day in our test environment.

pro-tip: use a probabilistic sampler or something that says "sample 5% of traces" instead.

# ? Jul 9, 2019 23:33

psiox: Oct 15, 2001; Babylon 5 Street Team

i know there's the #monitoringlove hashtag but this is ridiculous ! !

anyway that sounds pretty cool, crip, first i'd heard of people taking jaeger seriously but i haven't had to deal with monitoring backends in awhile so hey

# ? Jul 10, 2019 00:29

Silver Alicorn: Mar 30, 2008; 𝓪 𝓻𝓮𝓭 𝓹𝓪𝓷𝓭𝓪 𝓲𝓼 𝓪 𝓬𝓾𝓻𝓲𝓸𝓾𝓼 𝓼𝓸𝓻𝓽 𝓸𝓯 𝓬𝓻𝓮𝓪𝓽𝓾𝓻𝓮

wow that's a big iguana

# ? Jul 10, 2019 01:44

animist: Aug 28, 2018

CRIP EATIN BREAD posted:

you want tracing, not just metrics.

hm, yeah probably. well i don't need anything right away anyway

# ? Jul 10, 2019 06:59

cowboy beepboop: Feb 24, 2001

i want to set up graylog or elk again but i hate elasticsearch

# ? Jul 10, 2019 11:49

kitten emergency: Jan 13, 2008; get meow this wack-ass crystal prison

CRIP EATIN BREAD posted:

we use the opentracing api for everything and use jaeger as our backend which feeds it into elasticsearch

its cool when someone used a constant sampler (samples every trace) and we were producing 10gb of traces each day in our test environment.

pro-tip: use a probabilistic sampler or something that says "sample 5% of traces" instead.

alternately consider a fine saas tracing product that performs tail sampling

# ? Jul 10, 2019 13:49

Guy Axlerod: Dec 29, 2008

CRIP EATIN BREAD posted:

we use the opentracing api for everything and use jaeger as our backend which feeds it into elasticsearch

its cool when someone used a constant sampler (samples every trace) and we were producing 10gb of traces each day in our test environment.

pro-tip: use a probabilistic sampler or something that says "sample 5% of traces" instead.

Yeah, I set up tracing in our app, and the dev team was like "Can we have 100% sample rate in staging?" and also the dev team "We want to do some load tests in staging."

# ? Jul 10, 2019 20:31

CRIP EATIN BREAD: Jun 24, 2002; Hey stop worrying bout my acting bitch, and worry about your WACK ass music. In the mean time... Eat a hot bowl of Dicks! Ice T; Soiled Meat

a good idea is to make it configurable at run-time so you can adjust it in cases liek that

# ? Jul 10, 2019 22:20

Guy Axlerod: Dec 29, 2008

Yeah, it can be set by env var, but I don't trust them to not gently caress up.

We do have some stuff set to 100%, and some stuff set to 1/1000000, while most is at 10% or so.

# ? Jul 10, 2019 22:39

CRIP EATIN BREAD: Jun 24, 2002; Hey stop worrying bout my acting bitch, and worry about your WACK ass music. In the mean time... Eat a hot bowl of Dicks! Ice T; Soiled Meat

no i mean, that it can be adjusted. like you have some sort of external way of updating the values. that way you can just send a command to reduce the sample rate

# ? Jul 10, 2019 22:41

distortion park: Apr 25, 2011

Working out why a log message isn't making it from my app into a kibana dashboard is one of my least favourite things.

# ? Jul 11, 2019 15:44

animist: Aug 28, 2018

its because you've lived a life of sin

# ? Jul 12, 2019 00:23

Dr. Kayak Paddle: May 10, 2006

Anyone done anything with Sysmon 10.0?
Looks like it can pull DNS queries now. Currently doing that with splunk/streamstats from the DCs.

# ? Jul 24, 2019 08:53

cowboy beepboop: Feb 24, 2001

vector.dev looks nice

# ? Jul 24, 2019 10:50

CRIP EATIN BREAD: Jun 24, 2002; Hey stop worrying bout my acting bitch, and worry about your WACK ass music. In the mean time... Eat a hot bowl of Dicks! Ice T; Soiled Meat

my stepdads beer posted:

vector.dev looks nice

it looks neat but also they have these performance comparisons and they aren't even close to feature complete, yet.

# ? Jul 24, 2019 14:32

cowboy beepboop: Feb 24, 2001

yeah I gave it go yesterday, very early days for it.

# ? Jul 25, 2019 00:07

Hed: Mar 31, 2004; Fun Shoe

So if you're sending logs to a managed provider like Datadog, but some fraction of the logs contain transactions that you want to hang on to for billing questions, do you just have a cron job pull out the info from Datadog and put it into your own holdings?

I see most providers center around 15-day stuff, so I like I'm thinking about it wrong if I want to hold on to some things for 30-90 days.

# ? Jun 9, 2020 17:04

Guy Axlerod: Dec 29, 2008

I used paper trail at one point, they had an option to save a copy of everything you sent them to s3. Maybe datadog has something similar?

If you're using this for billing, it seems like you need something more transactional. What if the log never makes it to dd?

# ? Jun 9, 2020 18:43

the talent deficit: Dec 20, 2003; self-deprecation is a very british trait, and problems can arise when the british attempt to do so with a foreign culture

Hed posted:

So if you're sending logs to a managed provider like Datadog, but some fraction of the logs contain transactions that you want to hang on to for billing questions, do you just have a cron job pull out the info from Datadog and put it into your own holdings?

I see most providers center around 15-day stuff, so I like I'm thinking about it wrong if I want to hold on to some things for 30-90 days.

don't rely on logs for core business functions

# ? Jun 9, 2020 19:23

Corla Plankun: May 8, 2007; improve the lives of everyone

Hed posted:

So if you're sending logs to a managed provider like Datadog, but some fraction of the logs contain transactions that you want to hang on to for billing questions, do you just have a cron job pull out the info from Datadog and put it into your own holdings?

I see most providers center around 15-day stuff, so I like I'm thinking about it wrong if I want to hold on to some things for 30-90 days.

definitely don't rely on logs for core business functions but furthermore definitely don't have datadog in-the-loop for anything important. We used it at my last job and ddog was great for general stuff (looking to see if a thing was done restarting, counting the lower bound of errors, etc) but it would routinely have outages/delays/lost logs to the point that if something was missing in datadog we just assumed it was fine

# ? Jun 9, 2020 20:05

Adbot: ADBOT LOVES YOU

# ? Apr 25, 2024 17:50

Hed: Mar 31, 2004; Fun Shoe

Ok I hear you all on that, but then if the log is essentially the audit trail of what happened, I'm struggling to figure out how to do it differently without it looking like a log...

Are people running local logstash then forwarding to a log MSP, and then running extraction jobs locally for business records? Or are you saying something like I should install a callback to an ERP or something for every transaction?

I appreciate the info.

# ? Jun 11, 2020 23:19

1
2
3
4
5

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > YOSPOS > what the fuck is prometheus anyway? a thread about monitoring

Powered by: vBulletin Version 2.2.9 (SABB-v2.24.04)
Copyright ©2000, 2001, Jelsoft Enterprises Limited.
Copyright ©2024, Jeffrey of YOSPOS