Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
exe cummings
Jan 22, 2005

lol we just replaced our JMS-driven event bus with a kafka/zoopkeeper solution. the entire enterprise publisher/subscription config process is still an email to the one guy who knows how it works.

Adbot
ADBOT LOVES YOU

animist
Aug 28, 2018

yard salad posted:

lol we just replaced our JMS-driven event bus with a kafka/zoopkeeper solution. the entire enterprise publisher/subscription config process is still an email to the one guy who knows how it works.

you send a message and eventually you get a response. sounds like any other distributed system to me imo

distortion park
Apr 25, 2011


elasticsearch changed their data model between major versions so that you can't have multiple types per index anymore. making breaking changes between major versions is fair enough but it doesn't stop it being frustrating when you have to go make changes to your logging code just for this.

animist
Aug 28, 2018
just use grep

distortion park
Apr 25, 2011


animist posted:

just use grep

there's a reason we always log to a local file as well as ELK

MrMoo
Sep 14, 2000

I've updated a thing to use Prometheus-cpp to export metrics to DataDog. It's a bit poo poo but just about works (tm).

I'd like the Civetweb HTTP engine replaced with a Boost/Beast wrapping to sit on an IO context. I'd like not absolutely everything running in the constructor so that it's easier to create a non-centralised metric deployment, like Id Software's idCVar for configuration.

I have no idea what DataDog does, I just see it running in ECS.

pram
Jun 10, 2001
datadog is a metrics/alerting service you $$$ for. its nice

Progressive JPEG
Feb 19, 2003

datadog gets very sad if you have lots of tags/cardinality

pram
Jun 10, 2001
nothing more money wont fix

Blinkz0rz
May 27, 2001

MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS

Progressive JPEG posted:

datadog gets very sad if you have lots of tags/cardinality

they had a bug for a long time where their unique identifier was the hostname which is fine except we reuse ip addresses and consequently hostnames as instances come up and down so we'd get all sorts of misattributed events and metrics until we figured out what was going on

Captain Foo
May 11, 2004

we vibin'
we slidin'
we breathin'
we dyin'

Blinkz0rz posted:

they had a bug for a long time where their unique identifier was the hostname

lol what the gently caress

Blinkz0rz
May 27, 2001

MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS
i think there were other conditions that caused it to became the identifier but it was a big old pain in the butt let me tell you

Shaggar
Apr 26, 2006
its amazing how many junior devs make the mistake of using name as a unique id instead of treating it as the display field it is.

distortion park
Apr 25, 2011


We still write local text logs for everything as backup because all the modern solutions either crash and lose data occasionally or have broken auth

distortion park
Apr 25, 2011


Kafka in particular crashes more frequently than most of the services which write to it

pram
Jun 10, 2001
yes kafka is loving garbage software

animist
Aug 28, 2018
is there a good way to identify bottlenecks in a bunch of short-lived cloud spot instances? prometheus has pushgateway which seems ok but i haven't tried to actually set it up yet

CRIP EATIN BREAD
Jun 24, 2002

Hey stop worrying bout my acting bitch, and worry about your WACK ass music. In the mean time... Eat a hot bowl of Dicks! Ice T



Soiled Meat
you want tracing, not just metrics.

psiox
Oct 15, 2001

Babylon 5 Street Team
re: tracing, this was a post i enjoyed about stack overflow's monitoring setup

https://nickcraver.com/blog/2018/11/29/stack-overflow-how-we-do-monitoring/

CRIP EATIN BREAD
Jun 24, 2002

Hey stop worrying bout my acting bitch, and worry about your WACK ass music. In the mean time... Eat a hot bowl of Dicks! Ice T



Soiled Meat
we use the opentracing api for everything and use jaeger as our backend which feeds it into elasticsearch

its cool when someone used a constant sampler (samples every trace) and we were producing 10gb of traces each day in our test environment.

pro-tip: use a probabilistic sampler or something that says "sample 5% of traces" instead.

psiox
Oct 15, 2001

Babylon 5 Street Team
i know there's the #monitoringlove hashtag but this is ridiculous ! !



anyway that sounds pretty cool, crip, first i'd heard of people taking jaeger seriously but i haven't had to deal with monitoring backends in awhile so hey

Silver Alicorn
Mar 30, 2008

𝓪 𝓻𝓮𝓭 𝓹𝓪𝓷𝓭𝓪 𝓲𝓼 𝓪 𝓬𝓾𝓻𝓲𝓸𝓾𝓼 𝓼𝓸𝓻𝓽 𝓸𝓯 𝓬𝓻𝓮𝓪𝓽𝓾𝓻𝓮
wow that's a big iguana

animist
Aug 28, 2018

CRIP EATIN BREAD posted:

you want tracing, not just metrics.

hm, yeah probably. well i don't need anything right away anyway

cowboy beepboop
Feb 24, 2001

i want to set up graylog or elk again but i hate elasticsearch

kitten emergency
Jan 13, 2008

get meow this wack-ass crystal prison

CRIP EATIN BREAD posted:

we use the opentracing api for everything and use jaeger as our backend which feeds it into elasticsearch

its cool when someone used a constant sampler (samples every trace) and we were producing 10gb of traces each day in our test environment.

pro-tip: use a probabilistic sampler or something that says "sample 5% of traces" instead.

alternately consider a fine saas tracing product that performs tail sampling

Guy Axlerod
Dec 29, 2008

CRIP EATIN BREAD posted:

we use the opentracing api for everything and use jaeger as our backend which feeds it into elasticsearch

its cool when someone used a constant sampler (samples every trace) and we were producing 10gb of traces each day in our test environment.

pro-tip: use a probabilistic sampler or something that says "sample 5% of traces" instead.

Yeah, I set up tracing in our app, and the dev team was like "Can we have 100% sample rate in staging?" and also the dev team "We want to do some load tests in staging."

CRIP EATIN BREAD
Jun 24, 2002

Hey stop worrying bout my acting bitch, and worry about your WACK ass music. In the mean time... Eat a hot bowl of Dicks! Ice T



Soiled Meat
a good idea is to make it configurable at run-time so you can adjust it in cases liek that

Guy Axlerod
Dec 29, 2008
Yeah, it can be set by env var, but I don't trust them to not gently caress up.

We do have some stuff set to 100%, and some stuff set to 1/1000000, while most is at 10% or so.

CRIP EATIN BREAD
Jun 24, 2002

Hey stop worrying bout my acting bitch, and worry about your WACK ass music. In the mean time... Eat a hot bowl of Dicks! Ice T



Soiled Meat
no i mean, that it can be adjusted. like you have some sort of external way of updating the values. that way you can just send a command to reduce the sample rate

distortion park
Apr 25, 2011


Working out why a log message isn't making it from my app into a kibana dashboard is one of my least favourite things.

animist
Aug 28, 2018
its because you've lived a life of sin

Dr. Kayak Paddle
May 10, 2006

Anyone done anything with Sysmon 10.0?
Looks like it can pull DNS queries now. Currently doing that with splunk/streamstats from the DCs.

cowboy beepboop
Feb 24, 2001

vector.dev looks nice

CRIP EATIN BREAD
Jun 24, 2002

Hey stop worrying bout my acting bitch, and worry about your WACK ass music. In the mean time... Eat a hot bowl of Dicks! Ice T



Soiled Meat

my stepdads beer posted:

vector.dev looks nice

it looks neat but also they have these performance comparisons and they aren't even close to feature complete, yet.

cowboy beepboop
Feb 24, 2001

yeah I gave it go yesterday, very early days for it.

Hed
Mar 31, 2004

Fun Shoe
So if you're sending logs to a managed provider like Datadog, but some fraction of the logs contain transactions that you want to hang on to for billing questions, do you just have a cron job pull out the info from Datadog and put it into your own holdings?

I see most providers center around 15-day stuff, so I like I'm thinking about it wrong if I want to hold on to some things for 30-90 days.

Guy Axlerod
Dec 29, 2008
I used paper trail at one point, they had an option to save a copy of everything you sent them to s3. Maybe datadog has something similar?

If you're using this for billing, it seems like you need something more transactional. What if the log never makes it to dd?

the talent deficit
Dec 20, 2003

self-deprecation is a very british trait, and problems can arise when the british attempt to do so with a foreign culture





Hed posted:

So if you're sending logs to a managed provider like Datadog, but some fraction of the logs contain transactions that you want to hang on to for billing questions, do you just have a cron job pull out the info from Datadog and put it into your own holdings?

I see most providers center around 15-day stuff, so I like I'm thinking about it wrong if I want to hold on to some things for 30-90 days.

don't rely on logs for core business functions

Corla Plankun
May 8, 2007

improve the lives of everyone

Hed posted:

So if you're sending logs to a managed provider like Datadog, but some fraction of the logs contain transactions that you want to hang on to for billing questions, do you just have a cron job pull out the info from Datadog and put it into your own holdings?

I see most providers center around 15-day stuff, so I like I'm thinking about it wrong if I want to hold on to some things for 30-90 days.

definitely don't rely on logs for core business functions but furthermore definitely don't have datadog in-the-loop for anything important. We used it at my last job and ddog was great for general stuff (looking to see if a thing was done restarting, counting the lower bound of errors, etc) but it would routinely have outages/delays/lost logs to the point that if something was missing in datadog we just assumed it was fine

Adbot
ADBOT LOVES YOU

Hed
Mar 31, 2004

Fun Shoe
Ok I hear you all on that, but then if the log is essentially the audit trail of what happened, I'm struggling to figure out how to do it differently without it looking like a log...

Are people running local logstash then forwarding to a log MSP, and then running extraction jobs locally for business records? Or are you saying something like I should install a callback to an ERP or something for every transaction?

I appreciate the info.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply