|
Or a pandas DataFrame.
|
# ¿ Oct 28, 2014 21:01 |
|
|
# ¿ May 1, 2024 12:26 |
|
Jose Cuervo posted:I want to take the information present in a database table (.accdb) and read it into a Pandas dataframe. Is this possible? From Googling and looking on Stackoverflow I cannot find a way but I have never used a database before so maybe I am missing something.
|
# ¿ Nov 21, 2014 14:12 |
|
Since I'm a bad programmer, I very often do something that's basicallycode:
code:
But imagine I'm not construction a dict, but adding to a Pandas data frame or such things. Eg., code:
E: fixed the dict comprehension Cingulate fucked around with this message at 15:08 on Nov 27, 2014 |
# ¿ Nov 27, 2014 12:39 |
|
Ah yes, that's a much smarter implementation for the specific Pandas case that'll actually improve about half of my scripts. But is there a general answer? What if I want to assign bar(y) to some foo specified by x without a loop: code:
|
# ¿ Nov 27, 2014 15:05 |
|
Ah, knew there'd be some map-like thing in there. Thanks.
|
# ¿ Nov 27, 2014 15:38 |
|
This is basically the only thing I'm saying in this thread, but TMH, have you considered using Pandas?
|
# ¿ Dec 10, 2014 15:29 |
|
Thermopyle posted:Also Guido wants to bring something like mypy into core python. In fact, you can use mypy with type annotations right now! I understand that that's not the focus and the expected main benefit though.
|
# ¿ Dec 15, 2014 19:48 |
|
Is there a simple explanation (rule) for why (or when) in such contexts, dicts are preferable over objects (ie., dict[key] vs object.attribute)?
|
# ¿ Dec 18, 2014 20:20 |
|
My main python installation is Python3 from anaconda, and python calls the anaconda Py3k. I want to install a package that is Python 2.x only, and asks for python to call a Python 2.x. What's the best way for going about this? Also, it seems readthedocs.org is down right now? Edit: I'm on a Mac, if this matters. Cingulate fucked around with this message at 15:31 on Dec 22, 2014 |
# ¿ Dec 22, 2014 15:20 |
|
I'm using Anaconda for my Python distribution. To use the Intel MKL on Python, I either have to by Anaconda's optimizer package thing, or compile Numpy manually, correct? And the latter option is probably highly dispreferred as I'll have a good chance messing up Anaconda's infrastructure?
|
# ¿ Jan 27, 2015 11:06 |
|
Yes, I already have the MKL and actually just compiled R to make use of it (... instead of going the comfortable route and downloading Revolution Analytics' R distribution). I've also set up a few conda envs (thanks to this thread) - though may I ask in this context how I can remove an entire environment at once over the CLI? I'm just wondering if a mostly computer-illiterate person such as I should even bother trying to get MKL and an Anaconda Python to play along nicely by hand (e.g. by manually compiling Numpy), or if I should just go for my credit card. I understand correctly you're working for continuum.io? Cingulate fucked around with this message at 19:51 on Jan 27, 2015 |
# ¿ Jan 27, 2015 19:47 |
|
dear bigreddot please get the conda statsmodels package to the recent (0.6.1) version I need the ordinal GEE api Also I'll ask my Edit: BigRedDot posted:I work for Continuum, and I wrote the original version of conda I've started using Anaconda for all my Python needs and have just recommended our newest PhD students to set up their systems using Anaconda Cingulate fucked around with this message at 10:23 on Jan 28, 2015 |
# ¿ Jan 28, 2015 10:20 |
|
BigRedDot posted:It looks like it already is?
|
# ¿ Jan 28, 2015 18:52 |
|
I don't get parallel for loop syntax. At all. Like, I keep starring at the documentation to multiprocessing or whatever, and it's all Greek. I just want to do something like parfor x in range(0,10): my_list[x] = (foo(x)) FWIW, this is on Python 2.7. IPython, that is.
|
# ¿ Feb 2, 2015 20:39 |
|
vikingstrike posted:This article may be helpful: http://chriskiehl.com/article/parallelism-in-one-line/ In particular, using the map() function on a multiprocessing Pool() object at the very end. salisbury shake posted:What are you trying to accomplish?
|
# ¿ Feb 6, 2015 01:42 |
|
Okay, I think I basically got the multiprocessing thing now - my problem was that I was trying to avoid a functional style, but once I stopped trying to make everything be a for loop, it started making sense. However - the thing I want to parallelise is already parallelised. What I mean is, I have a function that inherently utilises 10 or so of our (>100) cores. I want to run multiple instances of that function in parallel, to get closer to utilising 50 or so of our cores (and no, I can't really make the functions themselves able to parallelise more efficiently). Basically, I want to apply a large decomposition to large datasets. I have 20 independent large datasets and want to process them in parallel. But the decomposition function is already mildly parallelised. When I simply do what's explained in vikingstrike's link (multiprocessing.dummy.pool), I actually make everything much slower because the individual sessions only utilise 1 core. Can I somehow parallelise parallelised functions (execute multiple instances of a parallelised function in parallel)? Am I making sense?
|
# ¿ Feb 11, 2015 01:03 |
|
I did conda update --all on my iMac and it's been doing thisquote:Fetching package metadata: .. Could take a while indeed.
|
# ¿ Feb 17, 2015 20:00 |
|
What would the original, list-comprehension one be called? What ”style" is that? (My Python is like 75% list comprehensions these days.)
|
# ¿ Mar 19, 2015 21:10 |
|
SurgicalOntologist posted:The only thing "wrong" with a functional style is it's not mainstream Python, so your typical Python programmer is not likely to have encountered it and will be confused. BigRedDot posted:I haven't used map or filter in years. List comprehensions and generator expressions
|
# ¿ Mar 20, 2015 22:03 |
|
Bob Morales posted:We use 1 tech per state right now. [state_to_tech.__setitem__(state, new_technician) for state, technician in zip(state_to_tech.keys(), state_to_tech.values()) if state_to_tech[state] == old_technician]
|
# ¿ Mar 20, 2015 22:11 |
|
Dominoes, I don't think one could do something similar for e.g. the linear solvers in Numpy though? (Compared to the Intel MKL build of Numpy.) As, probably, they're already ran as compiled stuff in C or Fortran anyways.
|
# ¿ Mar 23, 2015 21:01 |
|
Dominoes posted:Faster GLM/RLM would be nice though; coincidentally it's the limfac in one of my projects.
|
# ¿ Mar 24, 2015 00:43 |
|
QuarkJets posted:This is the first time that I've ever heard someone suggest MATLAB in response to "I need this to run faster" Though if anybody knows of a linear system solver faster than MATLABs mldivide, please please do tell me.
|
# ¿ Mar 26, 2015 11:41 |
|
Dominoes posted:Prepend return to the function's last line.
|
# ¿ Mar 26, 2015 11:42 |
|
QuarkJets posted:This is a dirty lie (unless you're choosing a very narrow definition of "basic") and you shouldn't spread it
|
# ¿ Mar 26, 2015 21:41 |
|
Yes, basically mldivide is seemingly very good at picking what specific package to call (e.g., suitesparse/UMFPACK), and of course Mathworks pays the cash for licensing the MKL and so on. So, it's state of the art.
|
# ¿ Mar 26, 2015 23:13 |
|
QuarkJets posted:I'm talking about all of the other "basic features" of MATLAB that are notoriously slow and archaic as gently caress. The features of Matlab that were inherited from older projects work great. Basically, any feature that can't be vectorized with a function written in Fortran before 1990 is complete garbage. Mathworks can't even provide basic list functionality without making it an O(n) operation, for gently caress's sake mldivide is a basic MATLAB functionality, and it is state of the art, top of the line. I assume the same goes for e.g. dot products or matrix inversions. Thus, many of MATLAB's basic capabilities are state of the art, top of the line. I've never actually compared the numpy/scipy versions to MATLAB; I'd assume with default installations, they're somewhat to noticeably slower.
|
# ¿ Mar 27, 2015 12:21 |
|
QuarkJets posted:And I agreed with you, MATLAB's features that were built pre-1990 in Fortran by someone other than Mathworks run really well, so long as you can completely vectorize the operation And I guess this must be my final contribution to this slightly silly derail.
|
# ¿ Mar 27, 2015 22:36 |
|
Hm. In that case, I'm going to make my suggestion to Dominoes more clear: if solving linear systems is a limiting factor in your stuff, maybe take a look at MATLAB's mldivide, which 1. is somewhat well documented (and its parameters for every call can be laid bare) - e.g. SO's link 2. makes calls to the best actual number crunchers, so you can learn what the best actual number crunchers are 3. is pretty good at deciding what number crunchers to call based on the properties of the input (sparse, square etc.)
|
# ¿ Mar 28, 2015 11:25 |
|
Dominoes posted:Specifically, I'm running a few operations on millions of dataset combinations. I'm able to optimize most of them well with Numba, or they're quick enough not to be an issue. The holdup is using a linear regression (specifically statsmodels' GLM at the moment) to find residuals. It runs slow compared to the rest of my calculations. - check if there is a special property of the matrices you can exploit - are they e.g. sparse, or square? - are you using the best algorithm for solving that kind of problem? I assume statsmodels calls scipy or numpy for its linear systems, and IIRC neither ships UMFPACK - do you repeatedly solve y = B*x for the same x? In that case, you can store and recycle the factorisation for massive speed boosts For me, the optimal solution turned out to actually be making everything sparse and calling MATLAB's "\" once for all my observation matrices sharing the same predictor matrix, which then made solving the linear system actually the fastest part, much faster than building the predictor matrices in the first place - to some extent because I was using, and too lazy to avoid using, MATLAB For loops. I eventually switched over to building the predictors in Python. So this is my story, hope you liked it.
|
# ¿ Mar 28, 2015 17:29 |
|
Hughmoris posted:
Python code:
|
# ¿ Apr 3, 2015 12:55 |
|
Dominoes posted:Different, probably less-readable approach. Once you introduce lambda, I'm usually out though. Though maybe that's practice, I've used it before. If I wanted exactly the same output, I'd probably also have done that as a list comp, like [print(movie['title']) for ... ] or something like that.
|
# ¿ Apr 3, 2015 13:35 |
|
Hammerite posted:While we're bikeshedding (yes I know I started it), this list(map(print, ...)) stuff is bananas IMO. Argument-unpacking is your friend! Also you could do a list comprehension instead and avoid the sep = thing, [print(movie) for movie in movie_titles]. Edison was a dick posted:That's because map with a lambda is inferior when compared to list comprehensions or generator expressions. I wish there was a parallel list comprehension, then I'd never ever optimise code ever again and instead spend half my time apologising for crashing the server by filling up all the memory.
|
# ¿ Apr 3, 2015 14:22 |
|
Edison was a dick posted:FFS! Let's just Python code:
Dominoes posted:I prefer map if I don't need to use lambda or a a list comp with it. Ie the function already exists. In this example, I might prefer it if the input list was already set up; it it didn't need the ['title'] lookup. One I found googling just now was you can maybe more easily switch between parallel and serial implementations by doing an optional map = multiprocessing.pool.map ...
|
# ¿ Apr 3, 2015 15:12 |
|
Hammerite posted:Generating a list as a side effect is ugly and makes it harder to appreciate at a glance what's happening. Shortness is secondary to clarity. Although I think that in this case, shortness is clarity as shortness comes from not introducing additional words (functions), like join or map.
|
# ¿ Apr 3, 2015 15:21 |
|
Yeah I've done some reading and I'm getting the point. Great, now I'm gonna rewrite like half my code.
|
# ¿ Apr 3, 2015 16:19 |
|
In case either of you feels this is all semantics: I'm definitely learning things.
|
# ¿ Apr 3, 2015 18:43 |
|
I run iPython notebooks remotely that often take days to run. So far I've been occasionally logging into the remote server to check top and see if it's still crunching, but that's a bit daft isn't it? I'd like to be informed when the process finishes. I tried sending myself an email, but the default example requires storing passwords in plaintext. That's not really what I want for notebooks I'll share with other people often. I found a recommendation for the getpass module, but that's not really what I'm looking for either. Any ideas? Mustn't be email, I'd just like to get a notification somehow.
|
# ¿ Apr 8, 2015 20:18 |
|
Edison was a dick posted:Being set in an environment variable is only barely an improvement. A rogue program that gains sufficient permissions can still read it out of your process, but at least it's not passed on the command line, where any process can see it. My coworkers married to MATLAB actually use a script with hard-coded plaintext passwords, so ...
|
# ¿ Apr 11, 2015 13:30 |
|
|
# ¿ May 1, 2024 12:26 |
|
Dominoes posted:Resolved. It looks like the name 'quick' is protected on PyPi (Although not fully on its test site), despite being unused.
|
# ¿ Apr 11, 2015 13:31 |