|
instead of continuing to fill up the idiot spare time projects thread with my literal nonsense, it was suggested i make a new thread about it so here it is inspired by sarah palin's most recent word salad, i decided to make a markov text bot to generate virtual sarah palin quotes. turns out this is an idea a million other people have had, because anyone who knows both palin and markov text bots sees the rather obvious connection. however, this is only step 1 in our voyage. oh before we get started here's what a markov text generator is https://en.wikipedia.org/wiki/Markov_chain posted:A Markov chain (discrete-time Markov chain or DTMC), named after Andrey Markov, is a random process that undergoes transitions from one state to another on a state space. It must possess a property that is usually characterized as "memorylessness": the probability distribution of the next state depends only on the current state and not on the sequence of events that preceded it. This specific kind of "memorylessness" is called the Markov property. Markov chains have many applications as statistical models of real-world processes. okay, so basically the procedure is this: you take a source text, you break it down into n-grams. an n-gram is a string of words of length n. so basically you just take every pairing of sequential words, stick them in a table, and count how often they happen. using this n-gram table, you just pick a random starting point. then you calculate the probability of the next n-gram based solely on the current n-gram, select the next n-gram based on an RNG and that table of probabilities, and just keep doing that until you get tired. all very simple, but i'm still lazy as a motherfuck so i'm just using the ngram package in R. that looks like this: code:
quote:"“pay-to-play.” Between bailouts for Wall Street cronies and stimulus projects or, as someone put it, this was all about Denali, mom, dad, ungulate eyeballs, slaying salmon on the floor of the world really works in order to accomplish after he's done turning back the waters and healing the planet? The answer is to challenge the status quo has got to call the devastation that a bill wouldn't be signed into law before we probably even got that first revolution.” We are the ones, right? You’re the ones who pay the bills in our enemies — proving peace through strength. In that respect, I applaud the president and his American dream endures. He knew the best of America are open, unfortunately though, some would want you to succeed too. And that we love. We’re here to stop that they inherited. Real reform never sits well with entrenched interests and power brokers. " so it's basically perfect NEXT UP: FROM SHITPOSTER TO TWITPOSTER
|
# ¿ Feb 13, 2016 23:33 |
|
|
# ¿ Apr 29, 2024 04:09 |
|
okay it's fun to generate nonsense for your friends and relatives, but what you really want to do is spread your garbage far and wide. of course that means you need twitter, the web's primary outlet for meaningless garbage. luckily R has a p great package for accessing the twits, called twitteR. first you need to register an app on twitter and get your keys though. that means: 1. register an account for your garbage bot 2. while logged into your garbage bot account, go to apps.twitter.com 3. click "create new app" 4. fill in some details here: 5. click "i agree" to the user agreement, which probably says that you're not going to do the things we're about to do. i don't know for sure because i never read it, i just like agreeing to stuff 6. go to the tab that says "keys and access tokens". you'll need to generate a token. then you need to copy down the gibberish after Consumer Key (API Key), Consumer Secret (API Secret), and then Access Token and Access Token Secret, lower down on the page. now twitteR can talk to twitter code:
code:
result: mostly tedious gibberish but sometimes something entertaining comes out https://twitter.com/markov_palin/status/690819879537131520 https://twitter.com/markov_palin/status/690818464269869056 https://twitter.com/markov_palin/status/690856828675104771 https://twitter.com/markov_palin/status/693176875078660096 and sometimes something chilling https://twitter.com/markov_palin/status/690823441524604928 might as well make a markov trump while we're at it, it's basically just a matter of plugging in a new text file result: markov trump is feeling romantic https://twitter.com/markov_trump/status/693132349282787329 but not so romantic that he can't still be a brutal dictator https://twitter.com/markov_trump/status/693011475791790081 https://twitter.com/markov_trump/status/692935871893471232 NEXT UP: ROOTING THROUGH THE TRASH Trig Discipline fucked around with this message at 00:02 on Feb 14, 2016 |
# ¿ Feb 13, 2016 23:33 |
|
okay how can we get that extra dose of twitter realness? answer: harvest twitter itself turns out we can again do that super-easy with the twitteR package, just by using the userTimeline function code:
https://twitter.com/markov_trump/status/691789588885544960 now let's get sarah in on the action https://twitter.com/markov_palin/status/693288342524284928 https://twitter.com/markov_palin/status/693282630360432640 https://twitter.com/markov_palin/status/693244859478515712 https://twitter.com/markov_palin/status/693169316821233664 UP NEXT: KEEPING UP WITH CURRENT EVENTS Trig Discipline fucked around with this message at 00:08 on Feb 14, 2016 |
# ¿ Feb 13, 2016 23:33 |
|
okay, so we're generating meaningless chaos, but we still want to keep up with the hottest trends and news items. we can use the tm text mining packages and its various derivatives to see what's going on in the news we're going to do this in the context of generating prophecies. we're going to mix Revelations, The Necronomicon, The Egyptian Book of the Dead, and Nostradamus with today's hot news and hashtags code:
code:
code:
https://twitter.com/markov_thebeast/status/697993864607498241 https://twitter.com/markov_thebeast/status/698103558290280448 https://twitter.com/markov_thebeast/status/698043117576957953 NEXT UP: THE INTERNET BARFS UP ITS OWN rear end in a top hat Trig Discipline fucked around with this message at 00:21 on Feb 14, 2016 |
# ¿ Feb 13, 2016 23:34 |
|
for our (current) final iteration, we're going to turn twitter into a literal echo chamber. in response to the above, cheese-cube posted the followingcheese-cube posted:has anyone made a twitter bot that makes markov chains using the tweets of users who follow it? idk could either be terrible or funny. which is a frickin' genius idea. turns out to be pretty easy to do, too! this one is named markov_polov polov runs as two separate scripts. one just grabs all of the followers of the twitter account, then scrapes their tweets and does some regex stuff to get rid of special characters. it also strips URLs so it doesn't end up reposting whatever weird porn you guys are passing around on twitter. it waits five minutes between searches so that twitter doesn't boot it off. then, after it's done one pass of all of the users, it writes their tweets to a text file. the second script just reads that text file every ten minutes, builds an ngram table, and spouts some bullshit code:
https://twitter.com/markov_polov/status/698366265946038272 https://twitter.com/markov_polov/status/698489628798488577 https://twitter.com/markov_polov/status/698604570893660160 https://twitter.com/markov_polov/status/698651563330383873 particularly when it catches someone who doesn't know wtf is going on https://twitter.com/Pleasure__Kevin/status/698621578817343489 bonus: it has already passed the australian turing test by becoming self-aware enough to complain about telstra https://twitter.com/Telstra/status/698479305781698560 Trig Discipline fucked around with this message at 00:45 on Feb 14, 2016 |
# ¿ Feb 13, 2016 23:34 |
|
side note: markov trump's followers are mostly actual trump supporters now who seem to have no idea that it's a bot. loving amazing
|
# ¿ Feb 14, 2016 00:34 |
|
oh yeah, a few notes: * once it hits an ngram from a statistically unusual sentence, it has a tendency to repeat the rest of the sentence verbatim. the longer the corpus gets (i.e., the more users in markov polov's case), the less this happens * the ngram package in R is buggy as gently caress, so every one of these bots just dies and hard-crashes R at random intervals. since all of the ngram processing is done via C calls and since i am both (1) lazy and (2) a poo poo C programmer, i am just restarting the bots when they die instead of fixing the issue. i suppose i could just write my own ngram package for R, but see point (1) * because of the way the twitscraper script works for markov polov, the twitscraping gets five minutes slower for each new user. if you follow the bot, it may be a few hours or even days before your tweets get incorporated Trig Discipline fucked around with this message at 00:59 on Feb 14, 2016 |
# ¿ Feb 14, 2016 00:55 |
|
oh i've also been thinking that i might wait until palin and trump got a hundred followers or so and then gradually start feeding other texts into them. i'm thinking a handmaid's tale fed a chapter at a time into palinbot would be fun. not sure about trump, though. the wife suggested a combination of yosemite sam quotes and mein kampf, but i don't think there's that much text for the former
|
# ¿ Feb 14, 2016 01:05 |
|
well i'll be damned
|
# ¿ Feb 14, 2016 01:07 |
|
PCjr sidecar posted:mein kampf translated through the Simple English vocuabulary ooooh definitely want to wait until he gets more followers though. i'm getting 2-5 new people a day
|
# ¿ Feb 14, 2016 01:11 |
|
markov polov seems to have decided to just rip on Pleasure Kevin today https://twitter.com/markov_polov/status/698669209794949121
|
# ¿ Feb 14, 2016 01:49 |
|
big scary monsters posted:have you had anyone @ed by them try to respond to the palin/trump bots? anything good? a lot of retweets, but no actual engagement. if and when that happens, i'm just going to run the babbler locally and keep pasting replies until they realize what's up
|
# ¿ Feb 14, 2016 02:23 |
|
craisins posted:does it go through historical posts and add them to the markov bot? or only new posts from its followers? all posts, up to 3200 posts for each user. it rescrapes on a regular basis, at intervals determined by how many friends it has
|
# ¿ Feb 14, 2016 02:33 |
|
O_O well it seems like they mainly just wanted him to turn it off, and i would definitely do that if it came to that as it is, it's just endorsing alternative medicine https://twitter.com/markov_polov/status/698681810432053248
|
# ¿ Feb 14, 2016 02:39 |
|
cheese-cube posted:https://www.youtube.com/watch?v=t-7mQhSZRgM&t=17s it's a killer idea. should i credit your twitter handle instead of your forums handle? also that video is magical
|
# ¿ Feb 14, 2016 03:52 |
|
drat yospos y'all some nasty tweeters https://twitter.com/markov_polov/status/698701970811392001
|
# ¿ Feb 14, 2016 03:56 |
|
i'm thinkin yeah
|
# ¿ Feb 14, 2016 04:14 |
|
yeah seriously how many of those are there ffs
|
# ¿ Feb 14, 2016 04:51 |
|
Trig Discipline posted:drat yospos y'all some nasty tweeters more https://twitter.com/markov_polov/status/698719606412738561
|
# ¿ Feb 14, 2016 05:08 |
|
i was thinking about filtering those out but then https://twitter.com/markov_polov/status/698727164904996864
|
# ¿ Feb 14, 2016 05:40 |
|
by definition, yes
|
# ¿ Feb 14, 2016 05:47 |
|
okay the next time the scraper rolls around i'm going to implement the fishmech filter. stripping out any post containing @waze, #runescape, and #SoundHound. i'm keeping the goodreads quotes tho
|
# ¿ Feb 14, 2016 06:58 |
|
for posterity, here is the fishmech filtercode:
fishmech congratulations on being a literal living edge case, at this point i'm starting to suspect that you yourself are an elaborate script running on a server farm somewhere
|
# ¿ Feb 14, 2016 07:16 |
|
oh wow poo poo just got real
|
# ¿ Feb 14, 2016 12:26 |
|
seriously wtf in lighter news https://twitter.com/markov_palin/status/698915319788638208 https://twitter.com/markov_palin/status/698907767227027456
|
# ¿ Feb 14, 2016 22:17 |
|
bump
|
# ¿ Feb 14, 2016 23:44 |
|
so is markov polov https://twitter.com/markov_polov/status/699030578201374721
|
# ¿ Feb 15, 2016 01:57 |
|
Sniep posted:Markov Polov send's his Valentine's day sentiments went a bit further than that https://twitter.com/markov_polov/status/699055788690550784 https://twitter.com/markov_polov/status/699030578201374721
|
# ¿ Feb 15, 2016 05:49 |
|
BooLoo posted:He got a bit over excited this morning. don't engage, i think he's an mra https://twitter.com/markov_polov/status/699176095124295680
|
# ¿ Feb 15, 2016 11:32 |
|
gonna be the first twitter bot to shoot up his high school
|
# ¿ Feb 15, 2016 11:37 |
|
cat thread yeah but i don't post in cjs much anymore since i started actually working. i am very much a creature of megathreads tho
|
# ¿ Feb 15, 2016 11:50 |
|
Necc0 posted:oh you didn't write your own markov algo? shame on you, trig. all of my _ebooks bots are bespoke hand crafted garbage oh hell no did you miss the several times where i mentioned how lazy i am
|
# ¿ Feb 15, 2016 20:50 |
|
horse mans posted:does this go back through follower's TLs for a more complete corpus? or just start including data from the time of following onward? it scrapes everything they've tweeted, up to 3200 tweets per follower
|
# ¿ Feb 15, 2016 20:50 |
|
horse mans posted:i wish i had a better method for selecting more coherent sentence-like things. i was looking into maybe some kind of language processing on my tweet archive and then somehow try and integrate common sentence structures into what it finds, like, even going so far as to perform word classification on all of my tweets, but i don't know. seems like a lot of effort. mine is literally "ends with a ?, ., or !, begins with a capital letter, is less than 140 characters". again, i am lazy side note: if anyone else starts doing this in R, you should know that the version of the ngram package on CRAN is hella unstable. the version from github isn't exactly rock-solid but it crashes about 1/10 as often so far
|
# ¿ Feb 15, 2016 20:53 |
|
Mandelbulb posted:scraped SA, guess the subforum idk which subforum but it's magical
|
# ¿ Feb 16, 2016 01:36 |
|
https://twitter.com/markov_polov/status/699393314311639040
|
# ¿ Feb 16, 2016 01:47 |
|
in other news https://twitter.com/markov_trump/status/699399643155705856
|
# ¿ Feb 16, 2016 02:50 |
|
markov polov spitting...something? https://twitter.com/markov_polov/status/699700759537930240 https://twitter.com/markov_polov/status/699710848994938880 the beast is trying to decide whether life of pablo makes tidal worth it https://twitter.com/markov_thebeast/status/699706640136671232
|
# ¿ Feb 16, 2016 23:48 |
|
infernal machines posted:nice, your markov bot is cyber-bullying already yeah it's basically a horrible rear end in a top hat at this point. thanks goons e: https://twitter.com/markov_palin/status/699766798783123456 Trig Discipline fucked around with this message at 02:31 on Feb 17, 2016 |
# ¿ Feb 17, 2016 02:17 |
|
|
# ¿ Apr 29, 2024 04:09 |
|
cjs https://twitter.com/markov_polov/status/699919091360870400
|
# ¿ Feb 17, 2016 12:44 |