|
Dont Touch ME posted:Reversing unicode is trivial. Allocate the size of the string, iterate through the data, blast it into the new buffer in reverse order. When you hit a codepoint, parse it, inc/dec your markers as appropriate. you probably want to do it on a grapheme cluster basis rather than codepoints to avoid messing combining marks, but you’ll also need to deal with issues of defining what it means to reverse text when it contains subgroups with LTR and RTL sequences in them.
|
# ? Mar 31, 2021 11:52 |
|
|
# ? Apr 25, 2024 01:14 |
|
actual use case of reversed strings: https://en.wikipedia.org/wiki/GADDAG
|
# ? Mar 31, 2021 14:30 |
|
Soricidus posted:isn’t that a pretty straightforward classification problem? accuracy would be proportional to the length of the string I guess, but we’ve come on a long way since bush hid the facts It ends up being more of a mental problem that taxes critical analysis and design skills. You need to be able to look at dozens of encodings, pick out what makes them unique, and figure out the best way to contrast them against each other. It's not the hardest thing in the world, it just requires you to actually think and design a solution. Unicode is easy, but it can be quite hard to distinguish CJK encodings. MononcQc posted:you probably want to do it on a grapheme cluster basis rather than codepoints to avoid messing combining marks, but you’ll also need to deal with issues of defining what it means to reverse text when it contains subgroups with LTR and RTL sequences in them. If we're running into the problems of semantics, we're already in trouble because to me a string is any linearly ordered, atomic data. A true reversed string reverses the order of the bits, bytes or words, depending on the domain. I would happily explain this nuance using the Intel programming manual as a citation before being told that they'll call me back.
|
# ? Mar 31, 2021 15:04 |
|
the real world application of reversing a string (or of fizzing a buzz) isn’t the point, it’s a 10 minute check to see whether there’s any point doing the rest of the interview
|
# ? Mar 31, 2021 15:08 |
|
you call the reverse method bing bong problem solved
|
# ? Mar 31, 2021 15:27 |
|
can you reverse a string? yeah but I'm not gonna.
|
# ? Mar 31, 2021 16:13 |
|
jesus WEP posted:the real world application of reversing a string (or of fizzing a buzz) isn’t the point, it’s a 10 minute check to see whether there’s any point doing the rest of the interview "we're not looking for a programmer who thinks about their work, just take the ticket and stamp out a solution and shut the hell up"
|
# ? Mar 31, 2021 16:19 |
|
HappyHippo posted:actual use case of reversed strings: uh well technically that's not a string, you're reversing a list of tiles from a board game jk that's a good example now I wonder what scrabble looks like in, say, korean
|
# ? Mar 31, 2021 16:23 |
|
Soricidus posted:isn’t that a pretty straightforward classification problem? accuracy would be proportional to the length of the string I guess, but we’ve come on a long way since bush hid the facts Turns out a lot of standard Unix tools choke on this problem because of sampling strategies when checking larger documents. Efficient generalized heuristics are hard. (think I had like a 4gb vendor file that mostly appeared ASCII-encoded until you actually tried to load it row-by-row into Postgres)
|
# ? Mar 31, 2021 16:29 |
|
Dont Touch ME posted:You want a non-trivial challenge? Write an algo to make a best guess of a string's encoding. Is it Big5-HKSCS or GB2312? Is this not the exact reason c2wiki died Soricidus posted:isn’t that a pretty straightforward classification problem? accuracy would be proportional to the length of the string I guess, but we’ve come on a long way since bush hid the facts lmbo alexandriao fucked around with this message at 17:05 on Mar 31, 2021 |
# ? Mar 31, 2021 16:59 |
|
side promoted (not sure what to call this), same level but different department that's more laid back plangs: still good and are my friend, for my use case, which is real programming that pays my rent
|
# ? Mar 31, 2021 17:33 |
|
lol https://www.nchannel.com/blog/csv-file-based-integration-vs-api/
|
# ? Mar 31, 2021 20:15 |
|
theres obviously a lot to unpack here, but my choice is.. Since CSVs are plain-text files, its easier for a web developer or other members of your team to create, view, and validate the data as a spreadsheet. ... ... ...
|
# ? Mar 31, 2021 20:34 |
|
oh wow. there are so many things wrong with that that I don't know where to start manual data validation via opening csv files in excel tbh i don't think I can think up a more certain way to gently caress up your data than that
|
# ? Mar 31, 2021 21:38 |
|
All the poo poo programming languages and data formats invented at the dawn of computing are going to reverberate a thousand years into the future with as much force as historical events like the Battle of Hastings pretty hosed up if you think about it
|
# ? Mar 31, 2021 22:18 |
|
The best IPC method was created 30 years ago. A raw socket and serialized structs.
|
# ? Mar 31, 2021 22:36 |
|
Sapozhnik posted:All the poo poo programming languages and data formats invented at the dawn of computing are going to reverberate a thousand years into the future with as much force as historical events like the Battle of Hastings lmao even if we stop reverberating data, we are definitely not gonna have a thousand years
|
# ? Mar 31, 2021 22:45 |
|
http://digital-preservation.github.io/csv-schema/csv-schema-1.0.html
|
# ? Mar 31, 2021 23:22 |
|
DoomTrainPhD posted:The best IPC method was created 30 years ago. A raw socket and serialized structs. The number of people I've worked with that don't know the difference between a raw socket and a websocket is staggering.
|
# ? Mar 31, 2021 23:59 |
|
oh my god the current argument on the clusterfuck AI project is about the data we'll eventually get back because the business users keep saying "but it's a csv why is it so hard?" do we have any idea what data will be in the csv? No.
|
# ? Apr 1, 2021 00:07 |
|
leper khan posted:The number of people I've worked with that don't know the difference between a raw socket and a websocket is staggering. Well of course they didn’t get it
|
# ? Apr 1, 2021 00:09 |
|
if i ever get that kind of data i will destroy my job and become destitute
|
# ? Apr 1, 2021 00:23 |
|
i love all these attempts at creating new schemas because it always follows the crab-cycle of xml and eventually just turns into a sub-par xsd
|
# ? Apr 1, 2021 02:17 |
|
edn was intended to be a lisp from the start and thats why they just stayed a lisp. its just quoted clojure lol
|
# ? Apr 1, 2021 02:18 |
|
PIZZA.BAT posted:the crab-cycle of xml
|
# ? Apr 1, 2021 02:29 |
|
mods pizza.bat pls
|
# ? Apr 1, 2021 02:31 |
|
PIZZA.BAT posted:the crab-cycle of xml this marks up the crab
|
# ? Apr 1, 2021 12:03 |
|
Carthag Tuek posted:oh wow. there are so many things wrong with that that I don't know where to start my previous place had a special automation tasks that: - saved attachments to emails sent to specific inboxes to various fileshares, with the eventual location based on some rules and the inbox/subject line/attachment name - converted excel files (some of the attachments were excel files) into csvs - imported the csvs into our databases it worked about as well as you would expect. the particularly sad thing was when you realised that some of the people on the other end hadn't automated their side of things, so it went from being a frustrating story about inefficient integrations to one of a poor junior analyst hand curating excel files daily, and presumably getting shouted at every other week when they hosed it up.
|
# ? Apr 1, 2021 14:54 |
|
quote:11/07/18 Jillian Hufford, Marketing Analyst quote:Jillian joined nChannel as their Marketing Analyst. Using both her writing and analytic skills, she assists the Marketing and Sales teams. Jillian performs competitor market research, provides analysis of key sales metrics, and writes informative posts on multichannel commerce trends. She holds a BA in Marketing from Otterbein University. Anyway even ignoring not having a schema, CSV files completely suck because there's no actual standard and just trying to get them into and out of Excel you're immediately going to run into annoying problems with number formats and trying to get Excel to recognize utf8 and things like that (and that's assuming you're even doing the quoting in the way excel accepts). I can't even imagine having some business critical process relying on automatically reading csv files, especially ones created by humans in random different programs. mystes fucked around with this message at 15:05 on Apr 1, 2021 |
# ? Apr 1, 2021 15:00 |
|
mystes posted:I can't even imagine having some business critical process relying on automatically reading csv files, especially ones created by humans in random different programs. This has been a significant part of my job for the past 6 or so years. It gets old. Real old.
|
# ? Apr 1, 2021 15:39 |
|
mystes posted:
i also do a fair bit of csv/excel wrangling aka "ETL" you gotta establish a spec and say use a lib that implements the rfc and hope that whoever is shipping data to you does people love csvs and excel
|
# ? Apr 1, 2021 15:45 |
|
My last job had a significant number of small ETL processes written in PHP. It got to the point where nobody knew what was out there or where it lived until something would stop working because somebody kicked the power strip powering the PC that's been under their desk since they started. Their motto had always been "we'll accept any file from the customer"
|
# ? Apr 1, 2021 16:01 |
|
is it just me or are all "observability platforms" completely incoherent? i have a little web app i'm trying to set up some real low stakes monitoring for - like, "holler at me if there's a wild spike in requests" - and surveying the tooling landscape is an incomprehensible mix of buzzwords i have sentry set up for exception alerting and honestly that's probably 90% of what i need so maybe i shouldn't worry beyond that. digitalocean's vps monitoring will also yell at me if i am running out of disk space/memory or cpu usage is super high for a while
|
# ? Apr 1, 2021 18:21 |
|
they are all bullshit, yeah if all you want to know about is requests per time, that's reasonably classifiable as a metric, so you can cut a lot of it out by looking for metrics tooling instead of "observability". prometheus is the latest emerging standard here and to give them some credit its quite easy to instrument your code with it, assuming that your webserver doesn't already have a prometheus exporter for requests/sec observability in theory includes logs, events, metrics, and usually some kind of tooling for reading them all in a useful way to find correlations and poo poo. in practice, every monitoring-adjacent vendor slaps the term on their website for SEO
|
# ? Apr 1, 2021 18:28 |
|
pointsofdata posted:my previous place had a special automation tasks that: oh god people keep on suggesting using magic mailboxes for storing /identifying documents or to "automate a process" instead of actually doing something proper and I keep having to shoot it down. usually it goes hand in hand with "we can use chat bots!"
|
# ? Apr 1, 2021 18:30 |
|
in their defense, magic mailboxes sound really cool and fun if you don't know anything about the implementation
|
# ? Apr 1, 2021 18:37 |
|
I think the biggest problem with excel is that it doesn't force you to define column types when you actually create a sheet and now that it's too late to go back and force that behaviour they've built this whole type inference structure (which is, to be fair, less annoying than the SSIS one that only ever looked at the top 1000 rows so would continuously trip you up when doing flat file imports) so now nobody ever defines anything and excel guesses at it and excel steals the csv file association then turbofucks any file you open with it until your just end up quoting every piece of data to force excel to treat it as text Honorable mention is whoever coded RapidSql which i am forced to use and has a "feature" where it will include comma separators in integers for thousands so if you c&p you can get "1,000" back which excel then thinks is a piece of text.
|
# ? Apr 1, 2021 18:38 |
|
12 rats tied together posted:if all you want to know about is requests per time, that's reasonably classifiable as a metric, so you can cut a lot of it out by looking for metrics tooling instead of "observability". prometheus is the latest emerging standard here and to give them some credit its quite easy to instrument your code with it, assuming that your webserver doesn't already have a prometheus exporter for requests/sec i'd always avoided prometheus since when i originally looked into it, it seemed like it required self-hosting - like, even if you used a "cloud" variant you're still supposed to have an on-prem prometheus database that you send stuff to the cloud from - but it looks like grafana labs recently(?) introduced a thing called the "grafana agent" which is "prometheus without the database since you're just sending all your poo poo to the cloud" gonna i guess try their grafana cloud thing since it has a free tier. it's pretty incoherent since it's just a bunch of open source poo poo slammed together, but i think if i just stick with metrics and ignore the other stuff it'll be ok. the web framework i use (javalin) has a micrometer plugin and i can set up micrometer to export to prometheus which gets read by the agent, i think they do also have logs (loki) and traces (tempo, which seems very new) which i might give a go. traces are kinda nice for me because while i only have one service, i do make a lot of requests to external apis, and it'd be nice to keep an eye on how slow those are, though i guess metrics could cover that just as well. and logs would be nice because right now i'm still sshing into my box to look at logs lol
|
# ? Apr 1, 2021 18:40 |
|
Powerful Two-Hander posted:I think the biggest problem with excel is that it doesn't force you to define column types when you actually create a sheet and now that it's too late to go back and force that behaviour they've built this whole type inference structure (which is, to be fair, less annoying than the SSIS one that only ever looked at the top 1000 rows so would continuously trip you up when doing flat file imports) so now nobody ever defines anything and excel guesses at it and excel steals the csv file association then turbofucks any file you open with it until your just end up quoting every piece of data to force excel to treat it as text you can customize how the SSIS column suggestion thing works and also have it process way more rows during discovery. However, if you're sure you know what the columns are its best to set the types manually.
|
# ? Apr 1, 2021 18:42 |
|
|
# ? Apr 25, 2024 01:14 |
|
Shaggar posted:you can customize how the SSIS column suggestion thing works and have it process way more rows during discovery as well. However, if you're sure you know what the columns are its best to set the types manually. yeah tbf last time I used it was like 10 years ago, I mainly liked the soothing way the row counts would move across the flow, that was nice not so nice when it choked on a column length later so maybe the India team were on to something when they just set everything to varchar(max)
|
# ? Apr 1, 2021 18:44 |