Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
feedmegin
Jul 30, 2008

Naar posted:

I don't think this is necessarily good advice. The last time I did assembly/C was in university and I have literally never needed it.

I do both of these (well more reading assembly for debugging) all day every workday. Not all of programming life is a website.

Adbot
ADBOT LOVES YOU

Naar
Aug 19, 2003

The Time of the Eye is now
Fun Shoe
Of course! My point was that saying, "You should definitely learn C and assembly," to a developer just starting out is not really great advice. "You should learn C and assembly if you are interested in them or you want to work in an area that uses them," is more like it, though a bit less pithy.

BurntCornMuffin
Jan 9, 2009


Naar posted:

Of course! My point was that saying, "You should definitely learn C and assembly," to a developer just starting out...

BurntCornMuffin posted:

...if you really want to have a deeper level understanding...as you get comfortable with Java...

I thought I applied enough "if you wants" to not give an incorrect impression, but I'm willing to clarify.

Get good at Java, then break into C if you want to pop the hood open and see how it ticks.

Brain Candy
May 18, 2006

comedyblissoption posted:

I haven't used clojure, but I wrote some tiny programs in Common Lisp. I had the exact same type issues I had as with javascript programs. This was at a time when I thought dynamically typed languages were okay. Lisps are probably bad ityool 2018 specifically because they are dynamically typed. Gradual typing isn't sufficient.

The argument for clojure is that there are still experimental things being added to haskell today; you don't have to wait for a someone to mine out an appropriate formalism.

geeves
Sep 16, 2004

fantastic in plastic posted:

A few years ago the consulting company I worked for had a client whose CTO insisted we use Node and some weird cloud-based database for a greenfield project. A month or so after we started, the client fired the CTO.

Client shouldn't have hired a 23 year old to be CTO.

TooMuchAbstraction posted:

I've never used Node, but I assume most of the backlash is from people that think that Javascript is the worst language in common use today, so why on earth would you use it anywhere you don't absolutely have to?

Also some people can't get over the whole Single Thread thing

geeves fucked around with this message at 14:03 on Apr 14, 2018

BurntCornMuffin
Jan 9, 2009


geeves posted:

Also some people can't get over the whole Single Thread thing

This is also one of the reasons I find it odd that R is used as much as it is for data science. I cannot fathom processing an appreciable amount of data in a reasonable amount of time with just one thread. Even the pissing tiny little bits my last client was processing took drat near an hour daily.

Jaded Burnout
Jul 10, 2004


Just wait til you find out about MatLab.

BurntCornMuffin
Jan 9, 2009


Jaded Burnout posted:

Just wait til you find out about MatLab.

I figured it was much worse when I found out that the app those R scripts were part of was meant to replace MatLab for that organization.

Space Whale
Nov 6, 2014
Can you get locked into contracting? If so how the hell do you break out besides shop for fulltime while contracting?

Edit:
I look for FTE but I've had a bad run where people hem and haw, and when the bills start coming due a contract is waved in front of me so I take it. Having felt the burn of "heh you're disposable but thanks for the help!" one times too many I'm about ready to just say gently caress it or even settle for something lovely just to get some goddamn tenure and know people for more than one holiday at a time.

Space Whale fucked around with this message at 00:18 on Apr 15, 2018

ultrafilter
Aug 23, 2007

It's okay if you have any questions.


BurntCornMuffin posted:

This is also one of the reasons I find it odd that R is used as much as it is for data science. I cannot fathom processing an appreciable amount of data in a reasonable amount of time with just one thread. Even the pissing tiny little bits my last client was processing took drat near an hour daily.

The performance of R code is very sensitive to how you write it. Someone who understands how R works can actually get pretty good performance out of it even on a single thread, and there are a decent number of packages that offer abstractions for parallel computing. Unfortunately there's a learning curve attached to both parts of that, and there aren't that many really good R programmers out there.

The reason why we use it so much is because for a lot of statistical computing, there is no real alternative. Python has decent support for machine learning and it's starting to pick up some statistics functionality, but if you want to use a model that's even a little bit obscure, you're probably only going to find an R implementation.

Tezzeract
Dec 25, 2007

Think I took a wrong turn...

ultrafilter posted:

The performance of R code is very sensitive to how you write it. Someone who understands how R works can actually get pretty good performance out of it even on a single thread, and there are a decent number of packages that offer abstractions for parallel computing. Unfortunately there's a learning curve attached to both parts of that, and there aren't that many really good R programmers out there.

The reason why we use it so much is because for a lot of statistical computing, there is no real alternative. Python has decent support for machine learning and it's starting to pick up some statistics functionality, but if you want to use a model that's even a little bit obscure, you're probably only going to find an R implementation.

This is pretty interesting - I've used a lot of R in undergrad, but when it came to doing anything 'big data' related, I completely fell on my face and had to use some ridiculous simplifying assumptions to wing it... like trying to use ARIMA on millisecond scale power time data over a few months.

Are there any resources on high performance R? Or is this one of those cases where its better to hack an implementation in TensorFlow and use the cloud to cut the problem down to size?

Horse Clocks
Dec 14, 2004


Space Whale posted:

Can you get locked into contracting? If so how the hell do you break out besides shop for fulltime while contracting?

Edit:
I look for FTE but I've had a bad run where people hem and haw, and when the bills start coming due a contract is waved in front of me so I take it. Having felt the burn of "heh you're disposable but thanks for the help!" one times too many I'm about ready to just say gently caress it or even settle for something lovely just to get some goddamn tenure and know people for more than one holiday at a time.

I feel the same way, then I talk to my perm buddies and their woes of annual performance reviews, office politics, and mad upper management leading to choices that contractors will inevitably fix. I think contracting rules!

Then 6 months later new job new team same things to fix and ugh gently caress.

Jaded Burnout
Jul 10, 2004


Space Whale posted:

Can you get locked into contracting? If so how the hell do you break out besides shop for fulltime while contracting?

Edit:
I look for FTE but I've had a bad run where people hem and haw, and when the bills start coming due a contract is waved in front of me so I take it. Having felt the burn of "heh you're disposable but thanks for the help!" one times too many I'm about ready to just say gently caress it or even settle for something lovely just to get some goddamn tenure and know people for more than one holiday at a time.

Maybe it's a differing job market but my clients tend to run for years and practically beg me to go FTE, and not because I'm particularly special.

In the UK Ruby scene contracting is what you do when there's so many unfilled FTE roles that the companies have no choice but to take on contractors.

Keetron
Sep 26, 2008

Check out my enormous testicles in my TFLC log!

Jaded Burnout posted:

Maybe it's a differing job market but my clients tend to run for years and practically beg me to go FTE, and not because I'm particularly special.

In the UK Ruby scene contracting is what you do when there's so many unfilled FTE roles that the companies have no choice but to take on contractors.

Same in the Dutch Java / fullstack domain.

Good Will Hrunting
Oct 8, 2012

I changed my mind.
I'm not sorry.
I'm planning on, within the the next 6 weeks, starting to interview again. I really have no idea how to be polite when describing my current situation and spin the fact that my skill-set has hardly progressed over the course of a year in any sort of positive way. Help?

Sure, I've learned a lot on my own. I've done a ton on my own. But we use such unpopular tech and our method of accomplishing things is so out there (recall my rewriting Kafka post, poo poo like that) that I'm almost embarrassed to describe what I spent a year doing.

ultrafilter
Aug 23, 2007

It's okay if you have any questions.


Tezzeract posted:

This is pretty interesting - I've used a lot of R in undergrad, but when it came to doing anything 'big data' related, I completely fell on my face and had to use some ridiculous simplifying assumptions to wing it... like trying to use ARIMA on millisecond scale power time data over a few months.

Are there any resources on high performance R? Or is this one of those cases where its better to hack an implementation in TensorFlow and use the cloud to cut the problem down to size?

There are limits on what you can do on a single machine, so you will eventually have to use the cloud. R has interfaces for TensorFlow, Hadoop, Spark, and other distributed computing platforms, so that's not as big a jump as it would be otherwise.

The canonical reference on R performance is Hadley Wickham's Advanced R. That's a must-read for anyone who uses R for more than trivial data processing. There's also a book on high performance R that I'm not familiar with, but the Amazon reviews are pretty good, so it might be worth checking out.

Good Will Hrunting posted:

I'm planning on, within the the next 6 weeks, starting to interview again. I really have no idea how to be polite when describing my current situation and spin the fact that my skill-set has hardly progressed over the course of a year in any sort of positive way. Help?

"my job doesn't challenge me" or "I'm not learning" are perfectly good reasons to be looking for a new job.

withoutclass
Nov 6, 2007

Resist the siren call of rhinocerosness

College Slice

Good Will Hrunting posted:

I'm planning on, within the the next 6 weeks, starting to interview again. I really have no idea how to be polite when describing my current situation and spin the fact that my skill-set has hardly progressed over the course of a year in any sort of positive way. Help?

Sure, I've learned a lot on my own. I've done a ton on my own. But we use such unpopular tech and our method of accomplishing things is so out there (recall my rewriting Kafka post, poo poo like that) that I'm almost embarrassed to describe what I spent a year doing.

These are all good reasons to be looking for a job. Poor management decisions (rewriting Kafka), no room to grow as a developer, talk(and show) what you've been learning on the side. It's not too hard to spin without coming off as being overly negative.

BurntCornMuffin
Jan 9, 2009


Good Will Hrunting posted:

I'm planning on, within the the next 6 weeks, starting to interview again. I really have no idea how to be polite when describing my current situation and spin the fact that my skill-set has hardly progressed over the course of a year in any sort of positive way. Help?

Sure, I've learned a lot on my own. I've done a ton on my own. But we use such unpopular tech and our method of accomplishing things is so out there (recall my rewriting Kafka post, poo poo like that) that I'm almost embarrassed to describe what I spent a year doing.

Say what you did, try to spin it on how you benefitted the company. Bonus points if you can pin a dollar value to it. Speak to soft skills and off the job skill building if they really want to talk about that.

Good Will Hrunting
Oct 8, 2012

I changed my mind.
I'm not sorry.
What if my team essentially did nothing but bleed the company of probably $1.5m in cash over the span of 18 months while being poorly managed by our control-freak VP who was a blocker in pretty much every imaginable scenario, and we ended up chopping 1/2 of the features we promised for a V1 that was supposed to launch 6+ months ago but hasn't even gotten to a QA pre-launch phase yet?

ultrafilter
Aug 23, 2007

It's okay if you have any questions.


Try to put a positive spin on it.

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



If someone told me in an interview they reimplemented Kafka at their boss' say-so I'd take that as a signal that they know how to code and that their current place is run by crazy people and they're right to want to leave.

Good Will Hrunting
Oct 8, 2012

I changed my mind.
I'm not sorry.
I'd say the worst part is with things like that, they were so rarely seen to completion. I had to give serious pushback to work on even a few features from start to finish, and sure I have some stuff to show but it's really disconcerting how little I can speak to in terms of a.) ownership b.) business value.

TooMuchAbstraction
Oct 14, 2012

I spent four years making
Waves of Steel
Hell yes I'm going to turn my avatar into an ad for it.
Fun Shoe

Good Will Hrunting posted:

I'd say the worst part is with things like that, they were so rarely seen to completion. I had to give serious pushback to work on even a few features from start to finish, and sure I have some stuff to show but it's really disconcerting how little I can speak to in terms of a.) ownership b.) business value.

Nobody's going to ask to see your in-house implementation, not least because you have no legal way to show it to them. You can talk knowledgeably about the process even if you didn't finish it and its business impact was nil.

TheCog
Jul 30, 2012

I AM ZEPA AND I CLAIM THESE LANDS BY RIGHT OF CONQUEST

Good Will Hrunting posted:

What if my team essentially did nothing but bleed the company of probably $1.5m in cash over the span of 18 months while being poorly managed by our control-freak VP who was a blocker in pretty much every imaginable scenario, and we ended up chopping 1/2 of the features we promised for a V1 that was supposed to launch 6+ months ago but hasn't even gotten to a QA pre-launch phase yet?

"I didn't see eye to eye with the management decisions, and the pace of development was uncomfortably slow, which compelled me to look for better opportunities where I can better use my talents."

Good Will Hrunting
Oct 8, 2012

I changed my mind.
I'm not sorry.
These are all fair, thanks. It's not like I did nothing, it's just that I... yeah, didn't see eye to eye on architecture, pace, cuts to features, mid-dev changes because of lunacy and had absolutely no say (not even my TL did) to do something a certain way even after we started implementation. I just look at this section of my resume vs. the last and grimace:

code:
\item Core contributor on team responsible for porting massive legacy architecture to new cloud-based data pipeline
\item Wrote performant Scala library for adding location, ISP, and user agent data to raw events and bid requests
\item Contributed to 5 Spark batch jobs for processing ~.5 TB of raw Avro data per day
\item Orchestrated Azkaban pipeline for executing and scheduling all Spark jobs on Google Cloud Platform
\item Wrote custom Spark metrics framework for reporting per-executor app-specific metrics via RPC to driver node
\item Integrated Prometheus PushGateway and Grafana for monitoring Spark job results and errors
Also please don't quote that and dox me in case my co-workers are reading, lol

BurntCornMuffin
Jan 9, 2009


Good Will Hrunting posted:

These are all fair, thanks. It's not like I did nothing, it's just that I... yeah, didn't see eye to eye on architecture, pace, cuts to features, mid-dev changes because of lunacy and had absolutely no say (not even my TL did) to do something a certain way even after we started implementation. I just look at this section of my resume vs. the last and grimace:

code:
\item Core contributor on team responsible for porting massive legacy architecture to new cloud-based data pipeline
\item Wrote performant Scala library for adding location, ISP, and user agent data to raw events and bid requests
\item Contributed to 5 Spark batch jobs for processing ~.5 TB of raw Avro data per day
\item Orchestrated Azkaban pipeline for executing and scheduling all Spark jobs on Google Cloud Platform
\item Wrote custom Spark metrics framework for reporting per-executor app-specific metrics via RPC to driver node
\item Integrated Prometheus PushGateway and Grafana for monitoring Spark job results and errors
Also please don't quote that and dox me in case my co-workers are reading, lol

Honestly, you may grimace, but that is a keyword goldmine. If nothing else, you can shotgun your resume and get a lot of hits from companies with cloud/big data capabilities. That is, if that's your thing.

Good Will Hrunting
Oct 8, 2012

I changed my mind.
I'm not sorry.

BurntCornMuffin posted:

Honestly, you may grimace, but that is a keyword goldmine. If nothing else, you can shotgun your resume and get a lot of hits from companies with cloud/big data capabilities. That is, if that's your thing.

I don't feel like I'm beyond maybe a high-junior, low-mid in the big data ecosystem at the moment. I haven't done much with the Spark Streaming APIs (or Kafka obviously :razz:) and we're stuck in Hadoop batch flows which is just baffling to me in 2018 given our use case is textbook streaming appropriate. I joined this company because we were going to build a streaming pipeline, and then this mess happened.

I'm going to look for mostly back-end stuff but probably more towards API/microservice work instead of big data. If I stumble across something where they're seeking someone with a very elementary understanding of Spark and Hadoop I'll give it a shot, but it's a pretty niche area.

Portland Sucks
Dec 21, 2004
༼ つ ◕_◕ ༽つ

Good Will Hrunting posted:

I don't feel like I'm beyond maybe a high-junior, low-mid in the big data ecosystem at the moment. I haven't done much with the Spark Streaming APIs (or Kafka obviously :razz:) and we're stuck in Hadoop batch flows which is just baffling to me in 2018 given our use case is textbook streaming appropriate. I joined this company because we were going to build a streaming pipeline, and then this mess happened.

I'm going to look for mostly back-end stuff but probably more towards API/microservice work instead of big data. If I stumble across something where they're seeking someone with a very elementary understanding of Spark and Hadoop I'll give it a shot, but it's a pretty niche area.

I'm really interested in getting into this stuff and I think I have some decent use cases for it at work, but I'm not 100%. You have any suggestions for good resources? Every time I try to move into doing research regarding data engineering I feel like I'm just swimming in a sea of buzzwords with no definitions.

For some context I work in manufacturing, we have lots of automated equipment that pumps out a ton of real time data via PLCs to data historians that aggregate it into periodic scorecards and I'm being tasked with building a data warehouse to relate all of this data and coming up with a plan for the next generation of our "data driven" manufacturing processes.

I inherited what is basically an ETL process composed of about 2000 python26 scripts and about 100 MSSQL tables with more trash in them than good data.

BurntCornMuffin
Jan 9, 2009


Good Will Hrunting posted:

I don't feel like I'm beyond maybe a high-junior, low-mid in the big data ecosystem at the moment. I haven't done much with the Spark Streaming APIs (or Kafka obviously :razz:) and we're stuck in Hadoop batch flows which is just baffling to me in 2018 given our use case is textbook streaming appropriate. I joined this company because we were going to build a streaming pipeline, and then this mess happened.

Neither do I, but I still get the red carpet treatment just for having it on my resume. The thing is, the vast majority of people really don't seem to know what they're doing in that space, so just touching Hadoop makes you very special.

That said, your only chance at streaming is if the organization is brand new or tapping into social media. A lot of the older players batch because the source is ancient and the very entrenched team maintaining it says its too expensive to make it stream. Also, the people reading the outputs won't care anyway, because they're just going to paste a snapshot into Excel every week to do their own calculations. Also, the Excel pasters boss will be sure to personally come tell you that your stuff is broken every time they don't like the numbers.

BurntCornMuffin
Jan 9, 2009


Portland Sucks posted:

For some context I work in manufacturing, we have lots of automated equipment that pumps out a ton of real time data via PLCs to data historians that aggregate it into periodic scorecards and I'm being tasked with building a data warehouse to relate all of this data and coming up with a plan for the next generation of our "data driven" manufacturing processes.

Try not to do it in house, everyone I've seen try ended up constantly buying servers and running out of space. If you can convince them to invest in cloud (AWS, Azure), do that, the logistics savings will be immense, and you can build out your services from there.

Your data will grow exponentially.

Make sure a quality tracking and alerting process is an intrinsic part of your design, not an afterthought. If a consumer sees numbers they don't like, they will try to blame you first, and this will give you the evidence to say "no, you're just wrong" or "yes, because our source hosed up" quickly.

Offshore support as soon as you can, lest you and your devs accomplish nothing due to chasing the consumers data problems.

Your customer may want to send commands to the PLC later, make sure you can add that to the API.

Every plan to store data must come with a plan to use data, else your project becomes an expensive Dragon Hoard.

Pentest and secure your poo poo. Don't be the guy whose project lands in the secfuck thread because evil wizards told your factory to make poo poo that explodes because of an undetected vulnerability in the exhaust port.

No amount of Hadoop replication will save you from a bad rm that deletes all the duplicates. Have a recovery plan.

Good Will Hrunting
Oct 8, 2012

I changed my mind.
I'm not sorry.

The Coursera courses are quite good, to be honest. All my original knowledge is from them and dicking around in Databricks at my last job just processing massive amounts of user data like a filthy slimeball. At this job, I've basically learned (almost entirely on my own) about how Spark works from a config standpoint, the ecosystem, what happens in the background etc via rocessing request data and loc, isp, org data.

Smugworth
Apr 18, 2003


So many shoulda-woulda-couldas for my team, just replace Hadoop with Kafka. :sigh:

Good Will Hrunting
Oct 8, 2012

I changed my mind.
I'm not sorry.

Smugworth posted:

So many shoulda-woulda-couldas for my team, just replace Hadoop with Kafka. :sigh:

A 40% completed, home-spun version of Kafka.

Portland Sucks
Dec 21, 2004
༼ つ ◕_◕ ༽つ

BurntCornMuffin posted:

Make sure a quality tracking and alerting process is an intrinsic part of your design, not an afterthought. If a consumer sees numbers they don't like, they will try to blame you first, and this will give you the evidence to say "no, you're just wrong" or "yes, because our source hosed up" quickly.

Can you expand on these a little? It seems self evident that this should be important, but other than validating that the incoming values are reasonable (ie. not negative length measurements or air speed sensor measurements indicating the end of the world) what other types of tracking are standard when monitoring data streams when you don't have control over the input data?

BurntCornMuffin
Jan 9, 2009


Portland Sucks posted:

Can you expand on these a little? It seems self evident that this should be important, but other than validating that the incoming values are reasonable (ie. not negative length measurements or air speed sensor measurements indicating the end of the world) what other types of tracking are standard when monitoring data streams when you don't have control over the input data?

Detect sanity, presence of data when expected, quantity of data. Ensure that via logging or metadata, you can trace the life cycle of your data for diagnosis. Have some sort of dashboard that you can show your consumers, and ensure your support people get alerts and act on them within a reasonable SLA and issue public statements for anything that would affect anyone in your orbit.

This is largely to build trust, as you are likely replacing an existing system that people are used to, and you will be the first people blamed if an issue is detected by a lay person. You will need to be prepared to, if it's not your fault, prove that fact before you pass it off. Proactive notifications, especially if you can catch poo poo before a user, and fast resolution helps to build trust in your stuff, which ultimately means less bullshit down the road.

Basically, don't be a scary black box, and try to catch and ideally fix poo poo before a user sees it.

Oh yeah, unrelated: catalogue and document your data offerings. Your present situation is straightforward enough for now that this may be frivolous, but if other data sources are added, you need to know what business value they offer and be able to let your consumers know what you have available. This goes hand in hand with "if you store it, have a plan to use it".

Hughlander
May 11, 2005

Good Will Hrunting posted:

I don't feel like I'm beyond maybe a high-junior, low-mid in the big data ecosystem at the moment. I haven't done much with the Spark Streaming APIs (or Kafka obviously :razz:) and we're stuck in Hadoop batch flows which is just baffling to me in 2018 given our use case is textbook streaming appropriate. I joined this company because we were going to build a streaming pipeline, and then this mess happened.

I'm going to look for mostly back-end stuff but probably more towards API/microservice work instead of big data. If I stumble across something where they're seeking someone with a very elementary understanding of Spark and Hadoop I'll give it a shot, but it's a pretty niche area.

If you are interested in leaving NYC for Seattle PM me

Steve French
Sep 8, 2003

Hughlander posted:

If you are interested in leaving NYC for Seattle PM me

Same except SF instead of Seattle.

Good Will Hrunting
Oct 8, 2012

I changed my mind.
I'm not sorry.
Appreciate it folks, but I'm not looking to relocate at the moment. Old family members living our their last years (including animals!) makes it really hard to leave the East coast. That said, there's absolutely no way I'm staying here for my whole life so maybe if we're still posting on this dead forum a few years down the road...

I'm basically going to blast every API/back-end job I see at places that seem interesting, and maybe some data stuff at interesting companies as well. Data stuff in a field like med might be interesting, and substantially less scummy than ad-tech.

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



Good Will Hrunting posted:

Data stuff in a field like med might be interesting, and substantially less scummy than ad-tech.

Payday lending?

Adbot
ADBOT LOVES YOU

aBagorn
Aug 26, 2004
Just accepted an offer for a Lead role (50/50 split managing/coding). Decent pay rise, have 3 friends in the company already.

Im excited

Good Will Hrunting posted:

Data stuff in a field like med might be interesting, and substantially less scummy than ad-tech.

Huh, whoa that you mentioned this. The team Im going to be building is going to be working with a substantial amount of med data. Basically building our model (and then subsequently apis and apps) for external users and some of the ML teams inside our division.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply