Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Locked thread
KICK BAMA KICK
Mar 2, 2009

Or a collections.defaultdict if the case is more general than a Counter. Anytime you're asking "is there a way to do..." the answer is either in collections or itertools.

Adbot
ADBOT LOVES YOU

Hughmoris
Apr 21, 2007
Let's go to the abyss!
Thanks for the Baseball Calendar scraping ideas. I'm going to try them out tonight and see what I can come up with.

onionradish posted:

All of the HTML on that site is a horrible mess. You're definitely going to want to use that second link, the one that gives you a list of tables.

I use LXML to parse rather than BeautifulSoup, so you''ll need to translate the syntax of the below:

Python code:
from selenium import webdriver  
import lxml.html

print "browser started"
browser = webdriver.Firefox()  
browser.get('http://www.milb.com/milb/stats/stats.jsp?t=t_sch&cid=4124&sid=t4124&stn=true')  
html_source = browser.page_source  
browser.quit()

# Should save/load the HTML here so you can do iterative development
# on your parser without re-scraping
    
doc = lxml.html.fromstring(html_source)

month_tables = doc.cssselect('.dataTableClass')

for month in month_tables:
    
    month_text = month.cssselect('tr.titleRow td')[0].text_content().strip()
    print month_text
    
    games = month.cssselect('tr td.dataCell:first-child')
    for game_details in games:
        print game_details.text_content().strip()
You'll need to do some plain text parsing to break the game_details results into times, opponents, and home/away, but that's something that regex is actually good for.

This works like a champ, thanks.

Hughmoris fucked around with this message at 03:34 on Apr 8, 2015

Edison was a dick
Apr 3, 2010

direct current :roboluv: only

hooah posted:

How come?

Because then it effectively defaults to having the keys filled in defaulting to zero.
So you can then just increment the counts exactly as you wanted to in your question, rather than having to confuse matters with extra logic at counting time.

BabyFur Denny
Mar 18, 2003
I have a few scripts in python that I want to run regularly, so I scheduled a task in windows that runs a small python script that calls all the other scripts. Right now I am doing it like this:

code:
try:
    import xlssp
except:
    pass
Which is quite horrible. I intend to do some more logging, especially on the exception side, but apart from that, is there a proper way of doing it? Creating a new class for each of the small files with a function to call would be quite a lot of work I think.

JawnV6
Jul 4, 2004

So hot ...
Is there a reason you're not using subprocess?

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



onionradish posted:

gently caress, I can't tell whether responses are real or just cruel obfuscation and code golf anymore....

I'm going back to LXML.

E: Dominoes' joke is the only one I got...

I was being serious. The example on the Soupy page is what you get if you don't read the BS docs. It's like a bitcoiner's example of a credit card transaction that includes steps like 'fish wallet out of jorts' and 'remove card from wallet (ugh!)'. Obviously you don't have to use a generator but you also don't have to do convoluted poo poo with isinstance to get text out of BS like they're pretending you do.

SurgicalOntologist
Jun 17, 2004

And I honestly think that parsing web pages in a functional style using a pipe makes a lot of sense. For the record. And a functional style is the claimed goal of soupy but I'll take pipe over chained method calls, personally.

I find toolz to be a great library that brings a functional toolbox into Python without breaking conventions. If you want to see a functional programming library that legitimately prompts a "is that still Python?" reaction, here's one: https://github.com/kachayev/fn.py

Python code:
from fn import F, _
from fn.iters import filter, range

func = F() >> (filter, _ < 6) >> sum
assert func(range(10)) == 15

SurgicalOntologist fucked around with this message at 18:06 on Apr 8, 2015

Cingulate
Oct 23, 2012

by Fluffdaddy
I run iPython notebooks remotely that often take days to run. So far I've been occasionally logging into the remote server to check top and see if it's still crunching, but that's a bit daft isn't it? I'd like to be informed when the process finishes. I tried sending myself an email, but the default example requires storing passwords in plaintext. That's not really what I want for notebooks I'll share with other people often. I found a recommendation for the getpass module, but that's not really what I'm looking for either. Any ideas? Mustn't be email, I'd just like to get a notification somehow.

xpander
Sep 2, 2004

Cingulate posted:

I run iPython notebooks remotely that often take days to run. So far I've been occasionally logging into the remote server to check top and see if it's still crunching, but that's a bit daft isn't it? I'd like to be informed when the process finishes. I tried sending myself an email, but the default example requires storing passwords in plaintext. That's not really what I want for notebooks I'll share with other people often. I found a recommendation for the getpass module, but that's not really what I'm looking for either. Any ideas? Mustn't be email, I'd just like to get a notification somehow.

Most passwords for web applications are stored as environment vars these days, I believe. Once set, this is as simple as:

code:
import os
email_pass = os.environ['SMTP_PASSWORD']
This makes some assumptions about who has access to what on a shared system, but at least it's not sitting in a text file somewhere!

Hughmoris
Apr 21, 2007
Let's go to the abyss!

onionradish posted:

All of the HTML on that site is a horrible mess. You're definitely going to want to use that second link, the one that gives you a list of tables.

I use LXML to parse rather than BeautifulSoup, so you''ll need to translate the syntax of the below:

Python code:
Extracting stuff from webpage...
You'll need to do some plain text parsing to break the game_details results into times, opponents, and home/away, but that's something that regex is actually good for.

Just to follow up, I've been toying with this exercise in Perl as well (cause I have no life). A nice fellow in the #perl IRC showed me how to utilize the network tool in Chrome to get my hands on the good stuff. Taking a quick peek behind the curtains on the team's website led me to find a JSON output of a TON of great information, easily parsed.

Website: http://www.milb.com/schedule/index.jsp?sid=t4124
Discovered JSON: http://www.milb.com/lookup/json/nam...special=%27Y%27

Pretty neat stuff!

duck monster
Dec 15, 2004

SurgicalOntologist posted:

And I honestly think that parsing web pages in a functional style using a pipe makes a lot of sense. For the record. And a functional style is the claimed goal of soupy but I'll take pipe over chained method calls, personally.

That might be so, but its *completely* unidiomatic python, and harms readability immensely.

The thing with python is, if theres a clever way to do something, and an obvious way to do something, you go with the obvious way. Thats the python way.

Dominoes
Sep 20, 2007

Or have fun/get poo poo done!

Edison was a dick
Apr 3, 2010

direct current :roboluv: only

xpander posted:

Most passwords for web applications are stored as environment vars these days, I believe. Once set, this is as simple as:

code:
import os
email_pass = os.environ['SMTP_PASSWORD']
This makes some assumptions about who has access to what on a shared system, but at least it's not sitting in a text file somewhere!

Being set in an environment variable is only barely an improvement. A rogue program that gains sufficient permissions can still read it out of your process, but at least it's not passed on the command line, where any process can see it.

tbh, I'd be happier if people just used getpass.getpass(), even if python doesn't have an interface to locked memory, to prevent the password being accidentally written to disk when the process is swapped out.

SurgicalOntologist
Jun 17, 2004

duck monster posted:

That might be so, but its *completely* unidiomatic python, and harms readability immensely.

The thing with python is, if theres a clever way to do something, and an obvious way to do something, you go with the obvious way. Thats the python way.

Well, maybe my cleverness meter is busted, because to me pipe is about equally clever as an object that returns itself from all its methods. Of course, using a pipe where every step is an attrgetter or methodcaller is an awful example---it doesn't even get rid of the method-chaining cleverness, it just adds to it.

I find pipe, properly used, far less unidiomatic than, say, pandas.DataFrame columns being accessible as descriptors (at least I assume that's how they do it). Or, for another controversial opinion, currying is less clever than the Django descriptor magic.

Dominoes
Sep 20, 2007

Hey dudes, trying to parse 2007+ Excel files, ie .XLSX. Using openpyxl. openpyxl doesn't like passworded files; How can I programatically remove the pw?

I found This command line tool from MS, but it's Windows only.

Dominoes fucked around with this message at 17:42 on Apr 9, 2015

Dominoes
Sep 20, 2007

Dominoes posted:

Hey dudes, having an issue with Pypi. I'm following this tutorial. I registered and activated accounts on pypi and pypi test.
Resolved. It looks like the name 'quick' is protected on PyPi (Although not fully on its test site), despite being unused.

Renamed to 'brisk' and uploaded. Speaking of which, if anyone's interested in faster drop-in replacements for basic numerical functions, check it out here. Suggestions / addition requests encouraged.

fletcher
Jun 27, 2003

ken park is my favorite movie

Cybernetic Crumb

Dominoes posted:

Hey dudes, trying to parse 2007+ Excel files, ie .XLSX. Using openpyxl. openpyxl doesn't like passworded files; How can I programatically remove the pw?

I found This command line tool from MS, but it's Windows only.

Spin up a Windows VM?

Nippashish
Nov 2, 2005

Let me see you dance!

Dominoes posted:

Suggestions encouraged.

Batch versions that operate on all the rows of a matrix.

hooah
Feb 6, 2006
WTF?
I'm working through a Python book in preparation for TA-ing an intro programming course this summer, and just got to the "make your own classes" section. From what I've read so far, it seems that anyone using any class could add new attributes and methods as he or she pleases. Is this true? If so, why?! That seems like a horrible decision for the language!

Cingulate
Oct 23, 2012

by Fluffdaddy

Edison was a dick posted:

Being set in an environment variable is only barely an improvement. A rogue program that gains sufficient permissions can still read it out of your process, but at least it's not passed on the command line, where any process can see it.

tbh, I'd be happier if people just used getpass.getpass(), even if python doesn't have an interface to locked memory, to prevent the password being accidentally written to disk when the process is swapped out.
Thanks - while this doesn't really make me happy, it seems to be the preferred option.

My coworkers married to MATLAB actually use a script with hard-coded plaintext passwords, so ...

Cingulate
Oct 23, 2012

by Fluffdaddy

Dominoes posted:

Resolved. It looks like the name 'quick' is protected on PyPi (Although not fully on its test site), despite being unused.

Renamed to 'brisk' and uploaded. Speaking of which, if anyone's interested in faster drop-in replacements for basic numerical functions, check it out here. Suggestions / addition requests encouraged.
Extending OLS to multiple regression! Though as noted, it's not clear there is much to gain.

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

hooah posted:

I'm working through a Python book in preparation for TA-ing an intro programming course this summer, and just got to the "make your own classes" section. From what I've read so far, it seems that anyone using any class could add new attributes and methods as he or she pleases. Is this true? If so, why?! That seems like a horrible decision for the language!

Yes. It's almost fundamental to the philosophy of python.

What, specifically, do you think is bad about it?

hooah
Feb 6, 2006
WTF?

Thermopyle posted:

Yes. It's almost fundamental to the philosophy of python.

What, specifically, do you think is bad about it?

I guess coming from a C++ background, you don't let programmers gently caress with your objects in unintended ways. A bit further on in that chapter the authors introduced the philosophy "we're all adults here", and I can see how allowing a programmer to add things to objects falls under that. It's just weird to me, and seems unsafe. But it's obviously worked for Python for however long.

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe

hooah posted:

I guess coming from a C++ background, you don't let programmers gently caress with your objects in unintended ways. A bit further on in that chapter the authors introduced the philosophy "we're all adults here", and I can see how allowing a programmer to add things to objects falls under that. It's just weird to me, and seems unsafe. But it's obviously worked for Python for however long.

Yeah, the approach taken by Python is just to say that if someone interferes with the internals of your classes/modules, or bypasses the interface you try to offer them, then fine, but they better know what they're doing, and if they screw something up that's their fault. This is of course different to the approach taken by a language like C++, but it's not necessarily better or worse, just different.

Python has conventions for you to indicate when something is to be regarded as internal; you put a leading underscore on methods and attributes that aren't intended to be part of the public interface.

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

hooah posted:

I guess coming from a C++ background, you don't let programmers gently caress with your objects in unintended ways. A bit further on in that chapter the authors introduced the philosophy "we're all adults here", and I can see how allowing a programmer to add things to objects falls under that. It's just weird to me, and seems unsafe. But it's obviously worked for Python for however long.

With great power comes great responsibility! Or something.

There's some programmers who just think Python and similar languages are just poo poo because of this type of stuff, and others who think it's great because of this type of stuff, and then there's reasonable programmers who see that stuff like we're talking about here is all tradeoffs and compromises.

good jovi
Dec 11, 2000

'm pro-dickgirl, and I VOTE!

hooah posted:

you don't let programmers gently caress with your objects in unintended ways

They're my objects now.

QuarkJets
Sep 8, 2008

hooah posted:

I guess coming from a C++ background, you don't let programmers gently caress with your objects in unintended ways. A bit further on in that chapter the authors introduced the philosophy "we're all adults here", and I can see how allowing a programmer to add things to objects falls under that. It's just weird to me, and seems unsafe. But it's obviously worked for Python for however long.

It's not like C++ actually stops you from modifying those variables, though. Someone using your class in all likelihood also has access to the source code, and nothing stops them from modifying it to make those variables public. All that "private" is doing is telling future developers "hey you shouldn't mess around with this variable".

Python replicates the spirit of private variables by mangling the names of variables that use __ as a prefix (such as "__foo"). When you find this kind of variable, that means "hey you shouldn't mess around with this variable", and if you try to access the variable directly from outside of the class then it simply won't work (because Python mangled the name). With enough work an intrepid developer could still modify the contents of a mangled variable directly, but that's true of C++ private variables, too.

hooah
Feb 6, 2006
WTF?
I'm using pickle to serialize some stuff so I don't have to process a bunch of files every time I run the program. Here's my code to un-pickle things:
Python code:
input_file = open('pickleout.p', 'rb')
tag_counts = pickle.load(input_file)
tag_pair_counts = pickle.load(input_file)
entity_tags = pickle.load(input_file)
word_features = pickle.load(input_file)
total_tag_num = pickle.load(input_file)
PyCharm claims the arguments to pickle.load() all unexpected arguments. What's it talking about?

fritz
Jul 26, 2003

QuarkJets posted:

It's not like C++ actually stops you from modifying those variables, though. Someone using your class in all likelihood also has access to the source code, and nothing stops them from modifying it to make those variables public. All that "private" is doing is telling future developers "hey you shouldn't mess around with this variable".


"#define private public"

KICK BAMA KICK
Mar 2, 2009

hooah posted:

PyCharm claims the arguments to pickle.load() all unexpected arguments. What's it talking about?
Does the code work? PyCharm does that for a few standard library things I've seen; the alternate way of constructing an Enum is another example.

hooah
Feb 6, 2006
WTF?
Yeah, it works fine. Just trying to make sure my code is Python-esque rather than C++-esque.

'ST
Jul 24, 2003

"The world has nothing to fear from military ambition in our Government."
It's common in Python to use context managers when dealing with files. Context managers are pretty cool because they abstract try/finally and other setup/teardown for objects that implement the context manager interface.

Python code:
with open('pickleout.p', 'rb') as input_file:
    tag_counts = pickle.load(input_file)
    tag_pair_counts = pickle.load(input_file)
    entity_tags = pickle.load(input_file)
    word_features = pickle.load(input_file)
    total_tag_num = pickle.load(input_file)

hooah
Feb 6, 2006
WTF?
How would you know if something went wrong?

SurgicalOntologist
Jun 17, 2004

Same as without the with statement---Python exits with an exception. Only, this time it will close the file first.

What's going on there, anyway? Why load the same file a handful of times?

Nippashish
Nov 2, 2005

Let me see you dance!

SurgicalOntologist posted:

Same as without the with statement---Python exits with an exception. Only, this time it will close the file first.

What's going on there, anyway? Why load the same file a handful of times?

The file gets closed anyway when python exits with an exception, you don't need a context manager to make that happen. The context manager auto-closing is useful if you catch that exception further up the callstack though.

That code is reading several pickled objects out of a single file.

SurgicalOntologist
Jun 17, 2004

Nippashish posted:

The file gets closed anyway when python exits with an exception, you don't need a context manager to make that happen. The context manager auto-closing is useful if you catch that exception further up the callstack though.

That code is reading several pickled objects out of a single file.

Cool, I didn't know you could store pickles in the same file sequentially.

And yeah on the closing, though I think that depends on the implementation/platform so it's good practice not to rely on it---close the file on your own terms.

hooah
Feb 6, 2006
WTF?
Yeah, I didn't know anything about pickling before yesterday, but it's really nice. I never really looked into serialization in C++ since it seemed like a huge bother, but it's really nice! I also thought it was cool to be able to pickle more than one object into the same file. It's nice and compact.

I swear, the more I use Python, the more I like it.

Nippashish
Nov 2, 2005

Let me see you dance!
Good luck finding an os that doesn't close files when the process exits. But yes, closing your files is good practice, and context managers make this easy to do without thinking about it.

SurgicalOntologist
Jun 17, 2004

Oh yeah, it's not the file closing when the process exits that's implementation-dependent, it's the file closing when the reference count hits zero.

Adbot
ADBOT LOVES YOU

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

I haven't had to serialize something in Python for a long time, but this made me think of a neat standard library module a lot of people don't know about : shelve.

quote:

A “shelf” is a persistent, dictionary-like object. The difference with “dbm” databases is that the values (not the keys!) in a shelf can be essentially arbitrary Python objects — anything that the pickle module can handle. This includes most class instances, recursive data types, and objects containing lots of shared sub-objects. The keys are ordinary strings.

Python code:
with shelve.open('spam') as db:
    db['eggs'] = 'eggs'

  • Locked thread