Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
MononcQc
May 29, 2007

Dont Touch ME posted:

Reversing unicode is trivial. Allocate the size of the string, iterate through the data, blast it into the new buffer in reverse order. When you hit a codepoint, parse it, inc/dec your markers as appropriate.

you probably want to do it on a grapheme cluster basis rather than codepoints to avoid messing combining marks, but you’ll also need to deal with issues of defining what it means to reverse text when it contains subgroups with LTR and RTL sequences in them.

Adbot
ADBOT LOVES YOU

HappyHippo
Nov 19, 2003
Do you have an Air Miles Card?
actual use case of reversed strings:

https://en.wikipedia.org/wiki/GADDAG

Dont Touch ME
Apr 1, 2018

Soricidus posted:

isn’t that a pretty straightforward classification problem? accuracy would be proportional to the length of the string I guess, but we’ve come on a long way since bush hid the facts

It ends up being more of a mental problem that taxes critical analysis and design skills. You need to be able to look at dozens of encodings, pick out what makes them unique, and figure out the best way to contrast them against each other. It's not the hardest thing in the world, it just requires you to actually think and design a solution. Unicode is easy, but it can be quite hard to distinguish CJK encodings.

MononcQc posted:

you probably want to do it on a grapheme cluster basis rather than codepoints to avoid messing combining marks, but you’ll also need to deal with issues of defining what it means to reverse text when it contains subgroups with LTR and RTL sequences in them.

If we're running into the problems of semantics, we're already in trouble because to me a string is any linearly ordered, atomic data. A true reversed string reverses the order of the bits, bytes or words, depending on the domain. I would happily explain this nuance using the Intel programming manual as a citation before being told that they'll call me back.

jesus WEP
Oct 17, 2004


the real world application of reversing a string (or of fizzing a buzz) isn’t the point, it’s a 10 minute check to see whether there’s any point doing the rest of the interview

Bored Online
May 25, 2009

We don't need Rome telling us what to do.
you call the reverse method bing bong problem solved

Carthag Tuek
Oct 15, 2005

Tider skal komme,
tider skal henrulle,
slægt skal følge slægters gang



can you reverse a string?

yeah but I'm not gonna.

pokeyman
Nov 26, 2006

That elephant ate my entire platoon.

jesus WEP posted:

the real world application of reversing a string (or of fizzing a buzz) isn’t the point, it’s a 10 minute check to see whether there’s any point doing the rest of the interview

"we're not looking for a programmer who thinks about their work, just take the ticket and stamp out a solution and shut the hell up"

pokeyman
Nov 26, 2006

That elephant ate my entire platoon.

HappyHippo posted:

actual use case of reversed strings:

https://en.wikipedia.org/wiki/GADDAG

uh well technically that's not a string, you're reversing a list of tiles from a board game :goonsay:

jk that's a good example

now I wonder what scrabble looks like in, say, korean

shoeberto
Jun 13, 2020

which way to the MACHINES?

Soricidus posted:

isn’t that a pretty straightforward classification problem? accuracy would be proportional to the length of the string I guess, but we’ve come on a long way since bush hid the facts

Turns out a lot of standard Unix tools choke on this problem because of sampling strategies when checking larger documents. Efficient generalized heuristics are hard. (think I had like a 4gb vendor file that mostly appeared ASCII-encoded until you actually tried to load it row-by-row into Postgres)

alexandriao
Jul 20, 2019


Dont Touch ME posted:

You want a non-trivial challenge? Write an algo to make a best guess of a string's encoding. Is it Big5-HKSCS or GB2312?

Is this not the exact reason c2wiki died

Soricidus posted:

isn’t that a pretty straightforward classification problem? accuracy would be proportional to the length of the string I guess, but we’ve come on a long way since bush hid the facts

lmbo

alexandriao fucked around with this message at 17:05 on Mar 31, 2021

Share Bear
Apr 27, 2004

side promoted (not sure what to call this), same level but different department that's more laid back :toot:

plangs: still good and are my friend, for my use case, which is real programming that pays my rent

PIZZA.BAT
Nov 12, 2016


:cheers:


lol

https://www.nchannel.com/blog/csv-file-based-integration-vs-api/

Valeyard
Mar 30, 2012


Grimey Drawer

theres obviously a lot to unpack here, but my choice is..

Since CSVs are plain-text files, its easier for a web developer or other members of your team to create, view, and validate the data as a spreadsheet.

...
...
...

Carthag Tuek
Oct 15, 2005

Tider skal komme,
tider skal henrulle,
slægt skal følge slægters gang



oh wow. there are so many things wrong with that that I don't know where to start

manual data validation via opening csv files in excel

tbh i don't think I can think up a more certain way to gently caress up your data than that

Sapozhnik
Jan 2, 2005

Nap Ghost
All the poo poo programming languages and data formats invented at the dawn of computing are going to reverberate a thousand years into the future with as much force as historical events like the Battle of Hastings

pretty hosed up if you think about it

FlapYoJacks
Feb 12, 2009
The best IPC method was created 30 years ago. A raw socket and serialized structs. :colbert:

Carthag Tuek
Oct 15, 2005

Tider skal komme,
tider skal henrulle,
slægt skal følge slægters gang



Sapozhnik posted:

All the poo poo programming languages and data formats invented at the dawn of computing are going to reverberate a thousand years into the future with as much force as historical events like the Battle of Hastings

pretty hosed up if you think about it

lmao even if we stop reverberating data, we are definitely not gonna have a thousand years

abraham linksys
Sep 6, 2010

:darksouls:
http://digital-preservation.github.io/csv-schema/csv-schema-1.0.html

:shepface:

leper khan
Dec 28, 2010
Honest to god thinks Half Life 2 is a bad game. But at least he likes Monster Hunter.

DoomTrainPhD posted:

The best IPC method was created 30 years ago. A raw socket and serialized structs. :colbert:

The number of people I've worked with that don't know the difference between a raw socket and a websocket is staggering.

Powerful Two-Hander
Mar 10, 2004

Mods please change my name to "Tooter Skeleton" TIA.


oh my god the current argument on the clusterfuck AI project is about the data we'll eventually get back because the business users keep saying "but it's a csv why is it so hard?" do we have any idea what data will be in the csv? No.

FlapYoJacks
Feb 12, 2009

leper khan posted:

The number of people I've worked with that don't know the difference between a raw socket and a websocket is staggering.

Well of course they didn’t get it :buddy:

Carthag Tuek
Oct 15, 2005

Tider skal komme,
tider skal henrulle,
slægt skal følge slægters gang




if i ever get that kind of data i will destroy my job and become destitute

PIZZA.BAT
Nov 12, 2016


:cheers:



i love all these attempts at creating new schemas because it always follows the crab-cycle of xml and eventually just turns into a sub-par xsd

bob dobbs is dead
Oct 8, 2017

I love peeps
Nap Ghost
edn was intended to be a lisp from the start and thats why they just stayed a lisp. its just quoted clojure lol

Agile Vector
May 21, 2007

scrum bored



PIZZA.BAT posted:

the crab-cycle of xml

:aaaaa:

Carthag Tuek
Oct 15, 2005

Tider skal komme,
tider skal henrulle,
slægt skal følge slægters gang




mods pizza.bat pls

Powerful Two-Hander
Mar 10, 2004

Mods please change my name to "Tooter Skeleton" TIA.


PIZZA.BAT posted:

the crab-cycle of xml

this marks up the crab

distortion park
Apr 25, 2011


Carthag Tuek posted:

oh wow. there are so many things wrong with that that I don't know where to start

manual data validation via opening csv files in excel

tbh i don't think I can think up a more certain way to gently caress up your data than that

my previous place had a special automation tasks that:
- saved attachments to emails sent to specific inboxes to various fileshares, with the eventual location based on some rules and the inbox/subject line/attachment name
- converted excel files (some of the attachments were excel files) into csvs
- imported the csvs into our databases

it worked about as well as you would expect. the particularly sad thing was when you realised that some of the people on the other end hadn't automated their side of things, so it went from being a frustrating story about inefficient integrations to one of a poor junior analyst hand curating excel files daily, and presumably getting shouted at every other week when they hosed it up.

mystes
May 31, 2006

quote:

11/07/18 Jillian Hufford, Marketing Analyst
Why CSV File-based Integration Can Be Better than API-based Integration

quote:

Jillian joined nChannel as their Marketing Analyst. Using both her writing and analytic skills, she assists the Marketing and Sales teams. Jillian performs competitor market research, provides analysis of key sales metrics, and writes informative posts on multichannel commerce trends. She holds a BA in Marketing from Otterbein University.
To be fair there are probably a lot of people without the title of "Marketing Analyst" who also hold this dumb opinion.

Anyway even ignoring not having a schema, CSV files completely suck because there's no actual standard and just trying to get them into and out of Excel you're immediately going to run into annoying problems with number formats and trying to get Excel to recognize utf8 and things like that (and that's assuming you're even doing the quoting in the way excel accepts).

I can't even imagine having some business critical process relying on automatically reading csv files, especially ones created by humans in random different programs.

mystes fucked around with this message at 15:05 on Apr 1, 2021

shoeberto
Jun 13, 2020

which way to the MACHINES?

mystes posted:

I can't even imagine having some business critical process relying on automatically reading csv files, especially ones created by humans in random different programs.

This has been a significant part of my job for the past 6 or so years. It gets old. Real old.

Share Bear
Apr 27, 2004

mystes posted:


I can't even imagine having some business critical process relying on automatically reading csv files, especially ones created by humans in random different programs.

i also do a fair bit of csv/excel wrangling aka "ETL"

you gotta establish a spec and say use a lib that implements the rfc and hope that whoever is shipping data to you does

people love csvs and excel

HamAdams
Jun 29, 2018

yospos
My last job had a significant number of small ETL processes written in PHP. It got to the point where nobody knew what was out there or where it lived until something would stop working because somebody kicked the power strip powering the PC that's been under their desk since they started. Their motto had always been "we'll accept any file from the customer"

abraham linksys
Sep 6, 2010

:darksouls:
is it just me or are all "observability platforms" completely incoherent? i have a little web app i'm trying to set up some real low stakes monitoring for - like, "holler at me if there's a wild spike in requests" - and surveying the tooling landscape is an incomprehensible mix of buzzwords

i have sentry set up for exception alerting and honestly that's probably 90% of what i need so maybe i shouldn't worry beyond that. digitalocean's vps monitoring will also yell at me if i am running out of disk space/memory or cpu usage is super high for a while

12 rats tied together
Sep 7, 2006

they are all bullshit, yeah

if all you want to know about is requests per time, that's reasonably classifiable as a metric, so you can cut a lot of it out by looking for metrics tooling instead of "observability". prometheus is the latest emerging standard here and to give them some credit its quite easy to instrument your code with it, assuming that your webserver doesn't already have a prometheus exporter for requests/sec

observability in theory includes logs, events, metrics, and usually some kind of tooling for reading them all in a useful way to find correlations and poo poo. in practice, every monitoring-adjacent vendor slaps the term on their website for SEO

Powerful Two-Hander
Mar 10, 2004

Mods please change my name to "Tooter Skeleton" TIA.


pointsofdata posted:

my previous place had a special automation tasks that:
- saved attachments to emails sent to specific inboxes to various fileshares, with the eventual location based on some rules and the inbox/subject line/attachment name
- converted excel files (some of the attachments were excel files) into csvs
- imported the csvs into our databases

it worked about as well as you would expect. the particularly sad thing was when you realised that some of the people on the other end hadn't automated their side of things, so it went from being a frustrating story about inefficient integrations to one of a poor junior analyst hand curating excel files daily, and presumably getting shouted at every other week when they hosed it up.

oh god people keep on suggesting using magic mailboxes for storing /identifying documents or to "automate a process" instead of actually doing something proper and I keep having to shoot it down.

usually it goes hand in hand with "we can use chat bots!"

Corla Plankun
May 8, 2007

improve the lives of everyone
in their defense, magic mailboxes sound really cool and fun if you don't know anything about the implementation

Powerful Two-Hander
Mar 10, 2004

Mods please change my name to "Tooter Skeleton" TIA.


I think the biggest problem with excel is that it doesn't force you to define column types when you actually create a sheet and now that it's too late to go back and force that behaviour they've built this whole type inference structure (which is, to be fair, less annoying than the SSIS one that only ever looked at the top 1000 rows so would continuously trip you up when doing flat file imports) so now nobody ever defines anything and excel guesses at it and excel steals the csv file association then turbofucks any file you open with it until your just end up quoting every piece of data to force excel to treat it as text

Honorable mention is whoever coded RapidSql which i am forced to use and has a "feature" where it will include comma separators in integers for thousands so if you c&p you can get "1,000" back which excel then thinks is a piece of text.

abraham linksys
Sep 6, 2010

:darksouls:

12 rats tied together posted:

if all you want to know about is requests per time, that's reasonably classifiable as a metric, so you can cut a lot of it out by looking for metrics tooling instead of "observability". prometheus is the latest emerging standard here and to give them some credit its quite easy to instrument your code with it, assuming that your webserver doesn't already have a prometheus exporter for requests/sec

i'd always avoided prometheus since when i originally looked into it, it seemed like it required self-hosting - like, even if you used a "cloud" variant you're still supposed to have an on-prem prometheus database that you send stuff to the cloud from - but it looks like grafana labs recently(?) introduced a thing called the "grafana agent" which is "prometheus without the database since you're just sending all your poo poo to the cloud"

gonna i guess try their grafana cloud thing since it has a free tier. it's pretty incoherent since it's just a bunch of open source poo poo slammed together, but i think if i just stick with metrics and ignore the other stuff it'll be ok. the web framework i use (javalin) has a micrometer plugin and i can set up micrometer to export to prometheus which gets read by the agent, i think

they do also have logs (loki) and traces (tempo, which seems very new) which i might give a go. traces are kinda nice for me because while i only have one service, i do make a lot of requests to external apis, and it'd be nice to keep an eye on how slow those are, though i guess metrics could cover that just as well. and logs would be nice because right now i'm still sshing into my box to look at logs lol

Shaggar
Apr 26, 2006

Powerful Two-Hander posted:

I think the biggest problem with excel is that it doesn't force you to define column types when you actually create a sheet and now that it's too late to go back and force that behaviour they've built this whole type inference structure (which is, to be fair, less annoying than the SSIS one that only ever looked at the top 1000 rows so would continuously trip you up when doing flat file imports) so now nobody ever defines anything and excel guesses at it and excel steals the csv file association then turbofucks any file you open with it until your just end up quoting every piece of data to force excel to treat it as text

Honorable mention is whoever coded RapidSql which i am forced to use and has a "feature" where it will include comma separators in integers for thousands so if you c&p you can get "1,000" back which excel then thinks is a piece of text.

you can customize how the SSIS column suggestion thing works and also have it process way more rows during discovery. However, if you're sure you know what the columns are its best to set the types manually.

Adbot
ADBOT LOVES YOU

Powerful Two-Hander
Mar 10, 2004

Mods please change my name to "Tooter Skeleton" TIA.


Shaggar posted:

you can customize how the SSIS column suggestion thing works and have it process way more rows during discovery as well. However, if you're sure you know what the columns are its best to set the types manually.

yeah tbf last time I used it was like 10 years ago, I mainly liked the soothing way the row counts would move across the flow, that was nice

not so nice when it choked on a column length later so maybe the India team were on to something when they just set everything to varchar(max)

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply