Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
redleader
Aug 18, 2005

Engage according to operational parameters

Suspicious Dish posted:

dumb serious question. why do we use nonlinear functions instead of linear ones if they're all dumb poo poo like ReLU which seems like it barely works. I guess second question: is there any theoretical basis for why ReLU works at all?

i thought the state of the art with why some activation functions work better than other functions in some cases is essentially :iiam:

Adbot
ADBOT LOVES YOU

Carthag Tuek
Oct 15, 2005
Probation
Can't post for 4 hours!
could be based on imperical observation & adjustment, ie tweak the nn input until you get the desired output

Nomnom Cookie
Aug 30, 2009



nonlinear sounds cooler

Suspicious Dish
Sep 24, 2011

2020 is the year of linux on the desktop, bro
Fun Shoe
You'd think training on a linear activation function would be a whole lot easier to optimize for

ninepints
Sep 7, 2017
four and a half quarts

Suspicious Dish posted:

dumb serious question. why do we use nonlinear functions instead of linear ones if they're all dumb poo poo like ReLU which seems like it barely works. I guess second question: is there any theoretical basis for why ReLU works at all?

I know this one. the problem with linear activation functions is that they collapse the representative power of the whole network down to the representative power of a single layer. the final outputs become a linear combination of linear combinations...of linear combinations of the inputs, which can be simplified to a single linear combination of the inputs.

on ReLU, I can say why it works better than e.g. a sigmoid function, but I don't have much of a math background so no difficult followup questions

the problem with the sigmoid is that these networks are trained using gradient descent, which depends on the existence of a gradient to descend. if a neuron's output barely changes as the weights are adjusted, it's hard to choose an adjustment that will move the output in the right direction. the sigmoid function flattens out for high-magnitude input values, so if, during training, a neuron's weights shift such that the value going in to the (sigmoid) activation function is huge, the sigmoid "saturates" and kills the gradient and that part of the network gets stuck

ReLU doesn't have that problem because there's always a gradient for positive inputs. neurons can still get stuck when the activation function receives large negative inputs, but in practice the weights in the rest of the network move around enough during training that stuck neurons usually get unstuck. people have also experimented with "leaky ReLU" aka f(x) = max(0.01*x, x) but last I heard it wasn't much of an improvement

Cybernetic Vermin
Apr 18, 2005

ninepints posted:

I know this one. the problem with linear activation functions is that they collapse the representative power of the whole network down to the representative power of a single layer. the final outputs become a linear combination of linear combinations...of linear combinations of the inputs, which can be simplified to a single linear combination of the inputs.

one can for intuition frame this as not having "decisions"; if a layer wants to differentiate between constant values x and y, while being reasonably robust to small errors, it can have parameters which place x and y into a part of the activation function where such fuzziness is truncated out; e.g. scale x so values in its neighborhood lands in the flat 0 of ReLU, or x and y on the bottom and top of a sigmoid for a cleaned up {x -> 0, y -> 1} mapping.

that is, the layer can take a value z=0.9*x+0.1*y, and simulate deciding that z is actually x by outputting z'=0.99999*x+0.00001*y (in the sigmoid case). with only linear activation functions you cannot achieve the same thing, since the output will just be some linear scaling of z, retaining both its x and y "parts".

it just collapsing to one layer is the more powerful argument, but it might be useful to think about how non-linearity lets the network tweak what information is retained.

animist
Aug 28, 2018
there's also the universal approximation theorem which states that single-layer (shallow) NNs can approximate functions over reasonable chunks of R^n. that's been proved for sigmoid and ReLU. In practice we use deep networks instead, i think there's some results about how they're much more efficient in terms of representation power.

there's also some work with "verifying" ReLU-based networks, which rely on the simple structure of ReLU to prove geometric properties of the network. in practice these can only prove generalities, though. Stuff like "if you make a small change to the input, the output can only change by this amount." they can't prove deeper specs about network correctness, because, if we could specify exactly what operation we wanted the network to do, we wouldn't need a neural network, now would we?


one other way to think about the linear stuff is just to intuit that a linear operation followed by another linear operation is linear. so, stacking linear layers, you're really only training a single linear transformation.

you don't generally have that problem with nonlinear activations by definition; that property is pretty unusual. and the algorithms can optimize through pretty much whatever operations you want, so, the nonlinearity isn't a problem. ReLU is used because it works very very well in practice (and is cheap in hardware), idk why it works so well though.

ultrafilter
Aug 23, 2007

It's okay if you have any questions.


https://twitter.com/dril_gpt2/status/1208788034407292930

https://twitter.com/dril_gpt2/status/1208854462984577025

https://twitter.com/dril_gpt2/status/1208454587054735361

https://twitter.com/dril_gpt2/status/1208419178442641408

SpaceAceJase
Nov 8, 2008

and you
have proved
to be...

a real shitty poster,
and a real james
https://www.youtube.com/watch?v=1zZZjaYl4AA


lol

akadajet
Sep 14, 2003


hahaha. he made a lovely course that cost $200 saying he'd personally mentor at most 500 students, then opened multiple slack accounts and loaded them up.
https://medium.com/@gantlaborde/siraj-rival-no-thanks-fe23092ecd20

akadajet
Sep 14, 2003

oh my god...
https://www.youtube.com/watch?v=2GwzlT2M59A

ultrafilter
Aug 23, 2007

It's okay if you have any questions.


https://twitter.com/lindsey/status/1211698750759944193

lol

(Also read the thread the quoted tweet is in. Dude knows what's up.)

Feisty-Cadaver
Jun 1, 2000
The worms crawl in,
The worms crawl out.
lol

https://twitter.com/bschulz5/status/1212198171436310533?s=21

Arcteryx Anarchist
Sep 15, 2007

Fun Shoe

A NN classifier for fizz buzz but unironically

Carthag Tuek
Oct 15, 2005
Probation
Can't post for 4 hours!
this guy might be a dumbass


https://twitter.com/Bschulz5/status/1202631062410674176

redleader
Aug 18, 2005

Engage according to operational parameters
it's a pity that nlp and cv have fallen to the ml gremlins

Sagebrush
Feb 26, 2012


don't mind me, just informing the director of the cornell CS department about machine learning

ultrafilter
Aug 23, 2007

It's okay if you have any questions.


https://twitter.com/Bschulz5/status/1206577850234658817

That's really deep, man.

Nomnom Cookie
Aug 30, 2009



I love the idea of our entire universe being the waste heat from gods GPU while they’re playing 20-dimensional call of duty. it makes at least as much sense as any other explanation too

Agile Vector
May 21, 2007

scrum bored



Nomnom Cookie posted:

I love the idea of our entire universe being the waste heat from gods GPU while they’re playing 20-dimensional call of duty. it makes at least as much sense as any other explanation too

hell is amd cpu cooling

Cybernetic Vermin
Apr 18, 2005

redleader posted:

it's a pity that nlp and cv have fallen to the ml gremlins

a lot of ml research is dumb and bad (almost as dumb and bad as "tech visionaries" ideas of what ml can do), amounting to a phd student twiddling various parameters until a model seems to learn something, with no deeper analysis or insight. however, it is not right to view ml as having been bad for computer vision and natural language processing, not least ml is pretty simple, so it is not a hard tool to apply.

i've historically been in nlp myself, and while its been a weird decade ever since statistical n-gram methods broke the backs of the chomsky'ites things are looking better and better now. it has gotten clear which ml bits are indispensable and possible to analyse (e.g. word embeddings) and the field is just getting a lot more high-level thanks to ml bits handling a lot of the nitty-gritty (e.g. making it a lot easier to make stuff robust against grammatical mistakes). it has chilled interest in formal grammars and automata solutions a lot, but there was so much theoretical navel-gazing there that i don't think that's bad (and that was my primary research area).

i am also currently pretty excited about a new research program from the research group i primarily affiliate with, where they are embracing the bias-soaking nature of ml to study gender bias in written text. that is, as a very first step, looking at the word embeddings for ostensibly non-gendered words in a given publication and see how orthogonal those feature vectors are from gendered vectors. a ton of tricky issues there (e.g. it matters a lot how the dimensionality reduction, i.e. the ml, works), but a bright new phd student (affiliated both with us at cs and the dept. of gender studies) is working on it, and i think it'll be extremely interesting research no matter the exact outcome.

well, that's a long post. tl,dr: ml *in* research often good.

animist
Aug 28, 2018

Cybernetic Vermin posted:



well, that's a long post. tl,dr: ml *in* research often good.

it's definitely a handy hammer to have. not everything's a nail though.

also lol at the legions of prospective PhD students whose life aspiration is to just twiddle knobs until they can finally get all those overpaid truckers fired

animist
Aug 28, 2018
https://twitter.com/dril_gpt2/status/1212951044143104001

fart simpson
Jul 2, 2005

DEATH TO AMERICA
:xickos:

Cybernetic Vermin posted:

a lot of ml research is dumb and bad (almost as dumb and bad as "tech visionaries" ideas of what ml can do), amounting to a phd student twiddling various parameters until a model seems to learn something, with no deeper analysis or insight. however, it is not right to view ml as having been bad for computer vision and natural language processing, not least ml is pretty simple, so it is not a hard tool to apply.

i've historically been in nlp myself, and while its been a weird decade ever since statistical n-gram methods broke the backs of the chomsky'ites things are looking better and better now. it has gotten clear which ml bits are indispensable and possible to analyse (e.g. word embeddings) and the field is just getting a lot more high-level thanks to ml bits handling a lot of the nitty-gritty (e.g. making it a lot easier to make stuff robust against grammatical mistakes). it has chilled interest in formal grammars and automata solutions a lot, but there was so much theoretical navel-gazing there that i don't think that's bad (and that was my primary research area).

i am also currently pretty excited about a new research program from the research group i primarily affiliate with, where they are embracing the bias-soaking nature of ml to study gender bias in written text. that is, as a very first step, looking at the word embeddings for ostensibly non-gendered words in a given publication and see how orthogonal those feature vectors are from gendered vectors. a ton of tricky issues there (e.g. it matters a lot how the dimensionality reduction, i.e. the ml, works), but a bright new phd student (affiliated both with us at cs and the dept. of gender studies) is working on it, and i think it'll be extremely interesting research no matter the exact outcome.

well, that's a long post. tl,dr: ml *in* research often good.

is nlp still basically only done in english

Cybernetic Vermin
Apr 18, 2005

fart simpson posted:

is nlp still basically only done in english

the available good *structured* data is mostly english (e.g. the penn treebank and the amr semantics bank), and of course the papers are written in english. so to some extent it always remains the case.

when doing formal grammar/automata work one habitually invokes random languages for having tricky grammatical structures. like swiss german for having cross-serial dependencies (they exist in english only in contrived cases, like "the coffee, cake, and biscuit cost $2, $3 and $4, respectively", but are a normal grammatical feature in swiss german). that is mostly a matter of motivating the navel-gazing however, keeping alive research in "mildly context-sensitive" grammatical formalisms for ages without them ever really demonstrating any practical usefulness.

dumb statistical and ml models have luckily improved the situation a fair bit, since it doesn't matter nearly as much how much painstakingly cleaned and hand-annotated data you have. initially it was all n-gram work entirely devoid of grammar, but most research now mix things a bit, with a bit of grammatical structure both induced by statistical models and fed to other statistical models, in a way that generalizes pretty easily to most languages. it also now seems obvious that this is the only way to do it, the idea that humans have some inherent grammar which is not hopelessly intermingled with general intelligence seeming hopelessly naive. well, to me. this is plenty controversial.

carry on then
Jul 10, 2010

by VideoGames

(and can't post for 10 years!)

Sagebrush posted:

don't mind me, just informing the director of the cornell CS department about machine learning

"smartest guy in the room syndrome" is an epidemic among nerds

ultrafilter
Aug 23, 2007

It's okay if you have any questions.


https://twitter.com/joose_rajamaeki/status/1096397000520749056

fart simpson
Jul 2, 2005

DEATH TO AMERICA
:xickos:

i was looking around for some software to segment chinese text into words and it seems anything with more than 90% accuracy is like cutting edge university research algorithms

the most popular one people in china actually use is so bad that i, a non native speaker, can find errors in basically every sentence i throw it at

fart simpson fucked around with this message at 02:26 on Jan 4, 2020

Cybernetic Vermin
Apr 18, 2005

i don't know chinese, but i presume that word segmentation is a mostly artificial idea in it. that is, it does not exist explicitly on the page and is not central in the mind of the people communicating in chinese (who will compound concepts as they see useful), making the problem pretty ill-defined. a lot of nlp problems suffer badly from an (elitist) normative view of language, e.g. grammar checkers defined entirely by a certain kind of person going "well that's not *proper* english" over and over.

statistical methods are unlikely to do a better job segmenting text, rather they are used to scan through the text, extracting the relevant concepts and components (an abstract paraphrase in a sense), hopefully leaping over a bunch of hurdles like "incorrect" segmenting, compounding, typos, unexpected typography, etc. (in effect by looking at a larger context with a bit more "understanding"). ye olde nlp systems would just get this wrong in step 1 of a cascade of transformations, and then never recover.

in many ways this is precisely the kind of thing the thread (rightly) hates, in that it takes something that used to be about a strict syntactic understanding of something important, and then throws ml at it muddling parts into a pile of incomprehensible statistics. the difference imho is that here the strict understandable solution never existed, at least not in any human brain, and that the ml bits are well-defined in both extent and purpose.

fart simpson
Jul 2, 2005

DEATH TO AMERICA
:xickos:

idk maybe? i dont really know anything about linguistics but i do know you can give a chinese speaker a sentence and they can easily split it into component words. i mean it is more complicated in chinese because words can have sorta layered meanings in a way so maybe that means it's an artificial idea? but like, what seems to be the most popular tool that people in china use for this is a python library called jieba. i downloaded jieba and the very first sentence i threw into it was the first sentence from the chinese wikipedia article for "flower":

花是被子植物的繁殖器官
(flowers are the reproductive organs of angiosperms)

if you asked any chinese reader they'd come up with:
花 / 是 / 被子植物 / 的 / 繁殖 / 器官
flowers / are / angiosperm / (posessive marker) / reproductive / organs

jieba segmented this as the obviously nonsensical:
花是 / 被子植物 / 的 / 繁殖 / 器官
flowersare / angiosperm / (possessive marker) / reproductive / organs

actually google translate does word segmentation too and also fails even worse than jieba on this sentence although it gets the overall meaning correct so i guess i see your point about the statistical methods thing

Cybernetic Vermin
Apr 18, 2005

should note that i am neither a linguist, a chinese speaker, nor all that successful a nlp researcher, so everything with a grain of salt.

it is pretty interesting though. from a quick google 是 is common enough in compounds, but extremely common in this copular verb form. despite cynicism about the problem in general i'd have expected one of these dictionary-driven things to manage this much, maybe still some flag/parameter set particularly poorly?

fart simpson
Jul 2, 2005

DEATH TO AMERICA
:xickos:

yeah i was surprised too. i think there's probably some settings i can adjust because i found a javascript reimplementation of jieba that gets this sentence correct. but yeah i'm playing around with a dataset i found of 300k news articles written in chinese and just this usage of "是" as a verb makes up 1% of the entire body of text of the dataset. it's probably either the 1st or 2nd most commonly used verb in chinese.

google's segmenter got that word correct but totally butchered the segmentation of angiosperms into 3 separate words which would translate as like, "blanket seed plants" or something which i guess is kinda what angiosperms are anyway? especially because the word 被 can be a noun meaning blanket or a verb meaning to cover

suffix
Jul 27, 2013

Wheeee!
WASHINGTON (Reuters) - The Trump administration took measures on Friday to crimp exports of artificial intelligence software as part of a bid to keep sensitive technologies out of the hands of rival powers like China.

i'm glad theyre finally doing the responsible thing and making machine learning illegal

ultrafilter
Aug 23, 2007

It's okay if you have any questions.


https://twitter.com/VPrasadMDMPH/status/1212840987363442689

Pinterest Mom
Jun 9, 2009

is the reason "the outcome you're looking for (presence of cancer) is not the same as outcome you're training on (diagnosis of cancer among women tested)"

redleader
Aug 18, 2005

Engage according to operational parameters
is breast cancer such a popular target because the image sets are large and well-labelled?

echinopsis
Apr 13, 2004

by Fluffdaddy
perhaps it’s unique amongst screenings in that it uses radiographic data rather than values 🤷‍♂️

distortion park
Apr 25, 2011


Pinterest Mom posted:

is the reason "the outcome you're looking for (presence of cancer) is not the same as outcome you're training on (diagnosis of cancer among women tested)"

The outcome you're looking for is improved quality adjusted life years*, and the relationship between that and seeing a tumour in a screen is so complex that you have to measure that, not just if you can find a tumour.

*Sometimes just a cost reduction is ok too.

Main Paineframe
Oct 27, 2010

Pinterest Mom posted:

is the reason "the outcome you're looking for (presence of cancer) is not the same as outcome you're training on (diagnosis of cancer among women tested)"

according to the tweets, the outcome they're actually looking for is "distinguishing cancer that's aggressive enough to be dangerous but not so aggressive that it's uncurable". meanwhile, the algorithms are just being trained on "presence of cancer"

apparently there's some concern these days that cancer screening may be driving overdiagnosis and overtreatment, because individual cancer cases vary wildly in behavior, ranging from "spreads rapidly and grows aggressively" to "just sits there growing a little and doesn't really do anything". the former extreme is often incurable even if treated early, while the latter extreme is fine even if it's left alone, and there's a sweet spot somewhere in the middle that gets by far the most benefit from treatment. so cancer researchers are now less interested in identifying the presence of cancer and more interested in finding a way to identify its aggressiveness level

Adbot
ADBOT LOVES YOU

ultrafilter
Aug 23, 2007

It's okay if you have any questions.


Yes, that's the major issue. And the only way you can get the data that you'd need to train a classifier is to identify potentially harmful cancers and not treat them, which flies in the face of how medicine is practiced.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply