Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
echinopsis
Apr 13, 2004

by Fluffdaddy
don’t close the thread

Adbot
ADBOT LOVES YOU

Bored Online
May 25, 2009

We don't need Rome telling us what to do.

i only say this cause i already have. it seems cool but realistically i prob wont pursue any bug bounties

now i gotta pretend to learn a new lang for a week or two

carry on then
Jul 10, 2010

by VideoGames

(and can't post for 10 years!)

Bored Online posted:

burp and web app security is the current thing im half heartedly learning rn but ill get farther than the op at least

burp and web app security piano

Captain Foo
May 11, 2004

we vibin'
we slidin'
we breathin'
we dyin'

Bored Online posted:

i only say this cause i already have. it seems cool but realistically i prob wont pursue any bug bounties

now i gotta pretend to learn a new lang for a week or two

the p in p lang stands for pretend

Necronomoticon
Aug 14, 2007

Millennial 80's business guy

styls trill epic posted:

why wouldn't i do this in excel op. SWEET LIKE SOME HONEYBUNS

you can do a lot of this in Excel, if you get a very stupid add-in like Frontline Analytic Solver and cough up the $300-$1000 or so for the license. Even then, you'll have a lot of arbitrary limitations depending on the license you get and lol if an employer is gonna accommodate this instead of telling you to learn R or Python like everyone else. that said, you might have a better time with Anaconda/Jupyterlab, since it'll give you a more interactive approach and you'll get your tables back.

data transformations are important because a lot of models make certain assumptions about the data and perform better with some prep. So typically for any data set you want to:

1. Remove columns with near-zero variance. These columns have little predictive info and add a lot of garbage to your modeling.
2. Analyze columns for skewness and apply the BoxCox transformation if the skewness stat is < -1 or > 1. This will determine the appropriate log or power transformation to reshape the data into a normal distribution, which is important for the validity of regression models. This also help constrain the impact of outliers on your model.
3. Center and rescale the data. This means subtracting the mean from each value and dividing by the standard deviation. This makes each column have a mean of 0 and a standard deviation of 1, producing better predictive performance on models that use the variance. such models are biased towards predictors with larger variances, so this helps normalize the data.
4. Correct for collinearity. Datasets with highly correlated columns introduce a lot of noise and instability to your model. Some models like decision trees are robust against this, but even then there are advancements, like random forest, that are sensitive to it. The simplest way to remove correlated columns is to visualize them into a correlation heatmap, order by some sort of hierarchical clustering, and look for hotspots of multicolinearity. Drop redundant columns, and repeat until you've got a cleaner heat map. Another is to compress your columns into uncorrelated components with PCA, but makes everything more difficult to interpret and may take a while to wrap your head around. PLS and PLSDA are methods for regression and classification, respectively, that use a similar strategy to correct for multicollinearity but the results are more tailored to what you're actually trying to predict.
5. Dimensional Reduction. This means just dropping columns and doing feature selection. generally, fewer columns = less complexity = more consistent models = less noise and better interpretability.

in R, doing all this can be done with one line of code: https://www.rdocumentation.org/packages/caret/versions/6.0-90/topics/preProcess
or better yet, the data preprocessing, model fitting, tuning, and cross validation can be done in one line of code: https://www.rdocumentation.org/packages/caret/versions/4.47/topics/train

i like python more, but that's more because R programmers are loser academics, seldom interested in writing clean, readable code and the language itself is a patchwork of a bunch of different libraries with no shared conventions. Caret and Tidyverse rule tho.

Necronomoticon fucked around with this message at 02:42 on Feb 17, 2022

Necronomoticon
Aug 14, 2007

Millennial 80's business guy
i'm getting my MS in data science from some public school in the south. it's online. frankly, there's nothing I couldn't have learned with free materials and self-study, so you're doing it because you get the resume candy of a master's degree and I guess the financial equivalent of someone holding a gun to your head to read an extremely dry textbook. if it wasn't for that, I'd probably be pissing away time on learning a new language like scala or something.

the modern grad school experience is depressingly exploitive, whether it be a cash-cow professional program like mine, or a forced-poverty academic program. high cost, and high promises that it'll somehow help you succeed in the meritocratic hellscape of the USA. most of the people in my program are wildly incompetent and would've benefited more from a code academy subscription. although it's undeniably it's helped out my career a ton.

Necronomoticon fucked around with this message at 02:40 on Feb 17, 2022

MrQueasy
Nov 15, 2005

Probiot-ICK
bro, he’s gone

Necronomoticon
Aug 14, 2007

Millennial 80's business guy
no!!

jemand
Sep 19, 2018


i liked ur post anyways

Necronomoticon
Aug 14, 2007

Millennial 80's business guy
ty I'm still learning.

interesting to know that there's some demand int he ML Pipeline space tho. I should gently caress with MlFlow or Sage Maker more.

MrQueasy
Nov 15, 2005

Probiot-ICK

jemand posted:

i liked ur post anyways

shitposting aside, I did too, but our dear Original ThreadMaker has moved on to more different clout.

Necronomoticon posted:

ty I'm still learning.

interesting to know that there's some demand int he ML Pipeline space tho. I should gently caress with MlFlow or Sage Maker more.

Make sure you try to figure out the underlying stuff like feature engineering and whatever. The technology surrounding everything is good to know, but the only guarantee right now is that it will probably shift quite a bit in the next decade. As long as you can figure out how to find the core stuff no matter if they call it foo or bar or ThroatWarblerMangrove, you should be golden.

As far as jobs go... network where you can at least until you find your first job... then it's onward and upward from there! I'm just an algorithms guy, but there's 100+ lovely ML people out there for every good one, and there's still not enough people who know a regression from a hole in the ground.

Necronomoticon
Aug 14, 2007

Millennial 80's business guy
the class i’m doing right now is for Applied Predictive Modeling, and I’m getting quite a bit of mileage out of it. It’s split up into preprocessing, validation, regression performance metrics, linear regression models, non-linear regression, classifier performance metrics, linear classification, and non-linear classification. The book is pretty accessible and is focused on predictive performance, so it covers the strengths and weaknesses of various models and plots their performance across a range of tuning parameters for comparisons. my next class is going to be “heterogenous data fusion”, which is supposed to be about holistic data blending and security.

i started the MS program because i originally got my degree in econ, hated it, and spent quite a bit of time self-teaching when i realized that the extent of my technical/analytical skills computer touching would be regulated to Excel/VBA hell despite knowing Java. the trouble i had is that i had no experience for the types of jobs that I’d actually want without just starting over at the entry-level altogether. I somehow got an architect title at the company I’ve been at for a while, but having little actual professional development or data practice makes me nervous to approach the job market. An MS in Data Science/Engineering just seemed like a better use of time & money than a code bootcamp (which I’d already be too advanced for) or an MBA (lol). Better to launder self-teaching into a degree than to lean more on a path that I’m not even that interested in to begin with.

Necronomoticon fucked around with this message at 05:51 on Feb 17, 2022

echinopsis
Apr 13, 2004

by Fluffdaddy
indeed

Bored Online
May 25, 2009

We don't need Rome telling us what to do.
ended up buying a book on c++ for some reason

echinopsis
Apr 13, 2004

by Fluffdaddy
some reason? we all know why

Moo Cowabunga
Jun 15, 2009

[Office Worker.




rip op

MrQueasy
Nov 15, 2005

Probiot-ICK

He's chasing that clout on a farm upstate, now.

RIP OP.

El Mero Mero
Oct 13, 2001

ban all data scientists

echinopsis
Apr 13, 2004

by Fluffdaddy
I still want to do the odin project full stack course

but of course once the novelty wore off there was little hope for me

MrQueasy
Nov 15, 2005

Probiot-ICK

El Mero Mero posted:

ban all data scientists

Ban 'em all and let God apply the K-means clustering.

Insanite
Aug 30, 2005

echinopsis posted:

I still want to do the odin project full stack course

but of course once the novelty wore off there was little hope for me

i'm speedrunning through the odin courses because i need to do some webdev for work and i haven't touched the stuff seriously in about 6 years

feels good/bad

TOP as a thing seems really nicely put together, though

Bored Online
May 25, 2009

We don't need Rome telling us what to do.

El Mero Mero posted:

ban all data scientists

Insanite
Aug 30, 2005


should've subbed my stats classes for data science ones so that i could wear a white lab coat

Adbot
ADBOT LOVES YOU

echinopsis
Apr 13, 2004

by Fluffdaddy

Insanite posted:

i'm speedrunning through the odin courses because i need to do some webdev for work and i haven't touched the stuff seriously in about 6 years

feels good/bad

TOP as a thing seems really nicely put together, though

its sort of ingenious to basically steal the content from everywhere in the web, rather than re-invent the wheel, and put their energy into curating the content rather than the content itself

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply