Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Roargasm
Oct 21, 2010

Hate to sound sleazy
But tease me
I don't want it if it's that easy
This is great thank you for the writeup. My bot is Thomas Paine but he talks about ISIS instead of Britian

https://twitter.com/nvcommonsense/status/699457952424595456

Adbot
ADBOT LOVES YOU

Roargasm
Oct 21, 2010

Hate to sound sleazy
But tease me
I don't want it if it's that easy
https://twitter.com/markov_trump/status/699716969914720256

paine bot spitting fire all day too

Roargasm
Oct 21, 2010

Hate to sound sleazy
But tease me
I don't want it if it's that easy
Does the ngram package generate stats by explicitly pairing Apple, Banana, Carrot, Dill as {Apple, banana}, {carrot, dill} when you specify (in, 2)? i.e. does removing the first word from a text file generate a totally different set of data?

Roargasm
Oct 21, 2010

Hate to sound sleazy
But tease me
I don't want it if it's that easy
done with work and I'm writing an html scraper for azlyrics.com. What the gently caress is wrong with me

Roargasm
Oct 21, 2010

Hate to sound sleazy
But tease me
I don't want it if it's that easy
lmao azlyrics autobans you if you make too many connections but my poo poo works. Powershell v3+

I was able to get 1.3 megs of lyrics.txt before they got me on my VPN

http://www.azlyrics.com
PHP code:
$webClient = New-Object System.Net.WebClient
$webClient2 = New-Object System.Net.WebClient
$artistURL = Read-Host -Prompt "enter URL of lyrics to grab (like http://www.azlyrics.com/n/nickiminaj.html)"
$outFile = 'c:\files\rihanna.txt'


$artistBase = $webClient.DownloadString("$artistURL")
$artistOne = $artistBase.Split()
$artistTwo = $artistOne | Select-String -Pattern 'h:"../lyrics' | out-string


$artistTwo = $artistTwo.Replace('h:"..','http://www.azlyrics.com')
$artistTwo = $artistTwo.Replace('",','')

$lyricURLs = $artistTwo.Split()



foreach ($URL in $lyricURLs) {
    try{   
    $lyricBase = $webClient2.DownloadString("$URL") 
     
    $lyricStart = $lyricBase.LastIndexOf('Sorry about that. -->')
    $lyricEnd = $lyricBase.LastIndexOf('<br><br>')
    $totalChars = $lyricEnd - $lyricStart
    $songLyrics = $lyricBase.Substring($lyricStart, $totalChars)
    $clearedLyrics = $songLyrics.replace('</div>','').Replace('<br>',"`n").Replace('<div>','').Replace('<i>','') `
	.Replace('</i>','').Replace('Sorry about that. -->',"`n")
    echo $clearedLyrics >> $outFile
    }
    catch {}
    }

Roargasm fucked around with this message at 04:32 on Feb 23, 2016

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply