|
don’t close the thread
|
![]() |
|
![]()
|
# ? Nov 30, 2023 17:06 |
|
Captain Foo posted:oh yeah?? i only say this cause i already have. it seems cool but realistically i prob wont pursue any bug bounties now i gotta pretend to learn a new lang for a week or two
|
![]() |
|
Bored Online posted:burp and web app security is the current thing im half heartedly learning rn but ill get farther than the op at least burp and web app security piano
|
![]() |
|
Bored Online posted:i only say this cause i already have. it seems cool but realistically i prob wont pursue any bug bounties the p in p lang stands for pretend
|
![]() |
|
styls trill epic posted:why wouldn't i do this in excel op. SWEET LIKE SOME HONEYBUNS you can do a lot of this in Excel, if you get a very stupid add-in like Frontline Analytic Solver and cough up the $300-$1000 or so for the license. Even then, you'll have a lot of arbitrary limitations depending on the license you get and lol if an employer is gonna accommodate this instead of telling you to learn R or Python like everyone else. that said, you might have a better time with Anaconda/Jupyterlab, since it'll give you a more interactive approach and you'll get your tables back. data transformations are important because a lot of models make certain assumptions about the data and perform better with some prep. So typically for any data set you want to: 1. Remove columns with near-zero variance. These columns have little predictive info and add a lot of garbage to your modeling. 2. Analyze columns for skewness and apply the BoxCox transformation if the skewness stat is < -1 or > 1. This will determine the appropriate log or power transformation to reshape the data into a normal distribution, which is important for the validity of regression models. This also help constrain the impact of outliers on your model. 3. Center and rescale the data. This means subtracting the mean from each value and dividing by the standard deviation. This makes each column have a mean of 0 and a standard deviation of 1, producing better predictive performance on models that use the variance. such models are biased towards predictors with larger variances, so this helps normalize the data. 4. Correct for collinearity. Datasets with highly correlated columns introduce a lot of noise and instability to your model. Some models like decision trees are robust against this, but even then there are advancements, like random forest, that are sensitive to it. The simplest way to remove correlated columns is to visualize them into a correlation heatmap, order by some sort of hierarchical clustering, and look for hotspots of multicolinearity. Drop redundant columns, and repeat until you've got a cleaner heat map. Another is to compress your columns into uncorrelated components with PCA, but makes everything more difficult to interpret and may take a while to wrap your head around. PLS and PLSDA are methods for regression and classification, respectively, that use a similar strategy to correct for multicollinearity but the results are more tailored to what you're actually trying to predict. 5. Dimensional Reduction. This means just dropping columns and doing feature selection. generally, fewer columns = less complexity = more consistent models = less noise and better interpretability. in R, doing all this can be done with one line of code: https://www.rdocumentation.org/packages/caret/versions/6.0-90/topics/preProcess or better yet, the data preprocessing, model fitting, tuning, and cross validation can be done in one line of code: https://www.rdocumentation.org/packages/caret/versions/4.47/topics/train i like python more, but that's more because R programmers are loser academics, seldom interested in writing clean, readable code and the language itself is a patchwork of a bunch of different libraries with no shared conventions. Caret and Tidyverse rule tho. Necronomoticon fucked around with this message at 01:42 on Feb 17, 2022 |
![]() |
|
i'm getting my MS in data science from some public school in the south. it's online. frankly, there's nothing I couldn't have learned with free materials and self-study, so you're doing it because you get the resume candy of a master's degree and I guess the financial equivalent of someone holding a gun to your head to read an extremely dry textbook. if it wasn't for that, I'd probably be pissing away time on learning a new language like scala or something. the modern grad school experience is depressingly exploitive, whether it be a cash-cow professional program like mine, or a forced-poverty academic program. high cost, and high promises that it'll somehow help you succeed in the meritocratic hellscape of the USA. most of the people in my program are wildly incompetent and would've benefited more from a code academy subscription. although it's undeniably it's helped out my career a ton. Necronomoticon fucked around with this message at 01:40 on Feb 17, 2022 |
![]() |
|
bro, he’s gone
|
![]() |
|
no!!
|
![]() |
|
i liked ur post anyways
|
![]() |
|
ty I'm still learning. interesting to know that there's some demand int he ML Pipeline space tho. I should gently caress with MlFlow or Sage Maker more.
|
![]() |
|
jemand posted:i liked ur post anyways shitposting aside, I did too, but our dear Original ThreadMaker has moved on to more different clout. Necronomoticon posted:ty I'm still learning. Make sure you try to figure out the underlying stuff like feature engineering and whatever. The technology surrounding everything is good to know, but the only guarantee right now is that it will probably shift quite a bit in the next decade. As long as you can figure out how to find the core stuff no matter if they call it foo or bar or ThroatWarblerMangrove, you should be golden. As far as jobs go... network where you can at least until you find your first job... then it's onward and upward from there! I'm just an algorithms guy, but there's 100+ lovely ML people out there for every good one, and there's still not enough people who know a regression from a hole in the ground.
|
![]() |
|
the class i’m doing right now is for Applied Predictive Modeling, and I’m getting quite a bit of mileage out of it. It’s split up into preprocessing, validation, regression performance metrics, linear regression models, non-linear regression, classifier performance metrics, linear classification, and non-linear classification. The book is pretty accessible and is focused on predictive performance, so it covers the strengths and weaknesses of various models and plots their performance across a range of tuning parameters for comparisons. my next class is going to be “heterogenous data fusion”, which is supposed to be about holistic data blending and security. i started the MS program because i originally got my degree in econ, hated it, and spent quite a bit of time self-teaching when i realized that the extent of my Necronomoticon fucked around with this message at 04:51 on Feb 17, 2022 |
![]() |
|
indeed
|
![]() |
|
ended up buying a book on c++ for some reason
|
![]() |
|
some reason? we all know why
|
![]() |
|
rip op
|
![]() |
|
Moo Cowabunga posted:rip op He's chasing that clout on a farm upstate, now. RIP OP.
|
![]() |
|
ban all data scientists
|
![]() |
|
I still want to do the odin project full stack course but of course once the novelty wore off there was little hope for me
|
![]() |
|
El Mero Mero posted:ban all data scientists Ban 'em all and let God apply the K-means clustering.
|
![]() |
|
echinopsis posted:I still want to do the odin project full stack course i'm speedrunning through the odin courses because i need to do some webdev for work and i haven't touched the stuff seriously in about 6 years feels good/bad TOP as a thing seems really nicely put together, though
|
![]() |
|
El Mero Mero posted:ban all data scientists
|
![]() |
|
should've subbed my stats classes for data science ones so that i could wear a white lab coat
|
![]() |
|
![]()
|
# ? Nov 30, 2023 17:06 |
|
Insanite posted:i'm speedrunning through the odin courses because i need to do some webdev for work and i haven't touched the stuff seriously in about 6 years its sort of ingenious to basically steal the content from everywhere in the web, rather than re-invent the wheel, and put their energy into curating the content rather than the content itself
|
![]() |