Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
QuarkJets
Sep 8, 2008

So anaconda comes with an assload of its own libraries and binaries, and sometimes this causes problems for me. For instance, I can't just set my PATH to the anaconda/bin folder because then I'm overriding the default path for various tools in a build environment where the versioning is actually pretty important.

The only solution to this is to manually downgrade anaconda packages one at a time to match the system configuration. This isn't easy or convenient. I wish there was a way to tell anaconda that I want it to use and link against the system libraries for everything that it can find

I'm not really asking a question, just venting over some issues that I've been having lately

Adbot
ADBOT LOVES YOU

QuarkJets
Sep 8, 2008

I have not, but now I will look into it

QuarkJets
Sep 8, 2008

What if you're deploying code to multiple users on a common system? I usually write a shell wrapper that inline modifies the path and ld_library_path to use anaconda, if it's necessary (such as if my application links against the specific version of pythonxx.so that comes with anaconda).

Things can also become very complicated if your project is using something like cmake to specify your Python area

QuarkJets
Sep 8, 2008

Thanks for the Miniconda recommendation, it's nice knowing that there is a minimalist version of anaconda out there. But for some reason I'm still having trouble with cmake; it claims to be using the HDF5 libraries and headers in my miniconda area, but it's apparently grabbing my system headers because when I launch my application I get a big gently caress-off warning about a library/header version mismatch (HDF5 at least gives a very thorough warning, as it specifies all of the paths and shows that system headers were used and that miniconda libraries are what is being linked against). This is probably going to require getting a lot more intimate with cmake and/or moc than I'd like

QuarkJets
Sep 8, 2008

Dominoes posted:

I like being able to get an idea of what the function does by looking at its signature.

Also makes compiling a function with Numba a good deal easier having the signature right there

QuarkJets
Sep 8, 2008

Boris Galerkin posted:

Because you write code to do things for you, not the other way around. Here's my lovely jumping through the hoops example:

https://repl.it/GP2U/0

code:
class Belgium(object):

    _names = {'en': 'Belgium',
             'de': 'Belgien',
             'nl': 'Belgie',
             'fr': 'Belgique',
             'default': 'Belgium'}

    def __init__(self):

        class Name(dict):
            def __repr__(self):
                return self['default']

        self.name = Name(self._names)

country = Belgium()

print('My name is {}'.format(country.name))
# >>> My name is Belgium

print('My name is also {}'.format(country.name['de']))
# >>> My name is also Belgien
I can understand if this isn't something that exists in Python. I can understand that this isn't "pythonic." But I want to understand why it's a bad idea to do this? I mean ignore the fact that I defined Name inside the init or whatever. I have no idea what __repr__ does but from a quick search it seems to be the thing that gets called when I just type the name of the class.

Typing "country.name_in_german" sounds like more of a hoop, and honestly making it a method and having to call "country.name()" is just ugly when 99% of the times I just need the default value. Plus the () implies function so isn't it semantically wrong? I'm not computing anything, I'm just getting something that already exists.

e: nevermind this only half works. __repr__ returns an actual Name object so I can't do things like "country.name+country.name['de']". Sad.

What you're asking for is for .name to sometimes be a function call if you provide an argument, or a string if you don't call it like a function. That's only going to make your code obtuse and difficult to understand. The code that breaks wrote in the first part of his post looks great and does what you want, but you have to type () when you want to get the default name, which is fine

QuarkJets
Sep 8, 2008

Boris Galerkin posted:

Because it was just an example.

e: And also because I come from a procedural/imperative programming background (my first language was Fortran back when it was still FORTRAN [77]) and I've never learned anything else other than modern versions of Fortran and MATLAB so honestly I have no clue what the point of objects are or when and why I should use them.

I just think it's pretty neat to be able to do:

code:
new_tensor = old_tensor.rotate()
Instead of

code:
new_tensor = rotate_tensor(old_tensor, rotation_matrix)
But honestly the only reason I think the first one is neat is because it looks better to me. I really wasn't joking when I said I liked "foo.thing" better than "foo.thing()" because it looks better without the parenthesis (but mostly because I don't think it's semantically correct to call a function to return a value that I've already computed).

Your points on the first codeblock are very good, and indeed that does look better and work better and it makes sense. But then consider this:

code:
new_tensor = old_tensor.rotate
new_tensor2 = old_tensor.rotate(rotation_matrix2)
The above code block is basically what you were asking for before, in that "rotate" maps to two different things (a matrix or a function) depending on usage. That's confusing and not actually any cleaner than:

code:
new_tensor = old_tensor.rotate()
new_tensor2 = old_tensor.rotate(rotation_matrix2)
In this code there's no question over what's going on here; rotate() is clearly a method that returns a new tensor, and it accepts an optional input in the form of a new rotation matrix. This is clean, simple, and easy to understand. The previous code block is ambiguous and definitely not any cleaner or simpler to understand

quote:

(but mostly because I don't think it's semantically correct to call a function to return a value that I've already computed)

That's called a get method, those are common in object-oriented languages but in Python you don't have to use them.

QuarkJets
Sep 8, 2008

Boris Galerkin posted:

code:
def rotate_tensor(tensor, rotation_matrix):
    A = tensor
    Q = rotation_matrix
    return Q*A*transpose(Q)

new_tensor = rotate_tensor(old_tensor, some_matrix)
vs

code:
class Tensor(object):

    def __init__(self, tensor, rotation_matrix):
    self.A = tensor
    self.Q = rotation_matrix

    def rotate(self, rotation_matrix=None):
    Q = rotation_matrix or self.Q
    return Q*self.A*transpose(Q)

old_tensor = Tensor(a_numpy_ndarray, rotmat)
new_tensor = old_tensor.rotate()
They both do the same thing. It's just more elegant looking to be able to call a rotate() method I think, but at the end of the day all I really care about is that the rotate method or rotate_tensor function gives me back a properly rotated tensor. Plus it's neat that I can assign the rotation matrix to the object already and not have to worry about passing the rotation matrix around anymore. That's the really cool part for me.

Yup, they both do the same thing and the decision to use one implementation vs the other is going to come down to some combination of personal preference, conforming to some preexisting style guide, and specific project details (for instance if you're writing high-performance computing code then a functional form is often better, since it's easier to port that to C/Fortran or to compile it with Numba, but if you're interacting with Java for some god-awful reason then Java is classes-only)

QuarkJets
Sep 8, 2008

Everything is a dictionary anyway so why pretend otherwise, just use dictionaries for everything

(I'm lying, don't actually do this)

Eela6 posted:

I am of this opinion.

This is a good video on the subject:
[video type=""]https://m.youtube.com/watch?v=o9pEzgHorH0[/video]

Classes are useful when they make your code easier to reason about. Too many classes are often a signal of code without forethought.

I agree as well; too many times I've seen unnecessary class implementations of things that really seemed like the developer was just looking for excuses to create classes rather than using classes to make the code easier to understand/use. I think it's easy for developers coming over from C++/Java to fall into that trap, whereas developers coming from Matlab/Fortran have the opposite problem (everything should be a function!)

QuarkJets
Sep 8, 2008

The Gunslinger posted:

I decided to pick up Python, I have a fair amount of experience with Perl and C++ but want to update my skillset a bit. My laptop that runs Linux is having hardware problems so I decided to fart around on my Windows gaming PC with this. I'm working through a tutorial book but just encountered an issue.

They want you to mess around with turtle which is part of the standard library. If I call my script or invoke Python in a command shell from my base user directory (C:\users\TGS) I can import turtle and work with it just fine. If I try to import it from a folder on my desktop, I can't. I just get an unknown attribute error since it can't find the library. I've checked the PATH and the installation directory for Python is there so uh, anyone know what gives?

You'll need to modify your PYTHONPATH to point to the folder where your script lives. Print sys.path to see what your PYTHONPATH looks like; if your file lives in a folder that isn't in sys.path then Python doesn't know about it

https://docs.python.org/2/using/windows.html#configuring-python

If that doesn't work then I misunderstood the problem and you should probably copy-paste the actual error

Did you install vanilla Python or did you install Anaconda? If you're going to work in Windows then it's strongly recommended that you uninstall vanilla Python and install Anaconda, which will likely circumvent tons of headaches. Once you have Anaconda installed you can use the Spyder IDE (which comes with Anaconda) for your development / playing around. If you want to get serious, then you could look into installing the PyCharm IDE.

QuarkJets
Sep 8, 2008

Try using a comma instead of a colon in the typehinting

QuarkJets
Sep 8, 2008

Some stuff you may be able to pull out of __future__; according to the docs List Comprehensions were in 2.4, so presumably you can just run a list comprehension and then convert to a set

QuarkJets
Sep 8, 2008

KingNastidon posted:

Just getting started with pandas, so this is probably a really naive question. I'm trying to replicate a really basic algorithm from VBA for oncology forecasting. There is a data file with new patient starts and anticipated duration of therapy in monthly cohorts. Each new patient cohort loses a certain amount of patients each time period using an exponential function to mimic the shape of a kaplan-meier progression curve. The sum of a column represent the total remaining patients on therapy across all cohorts. Below is the code I'm using:

code:
for x in range(0, nummonths):
    for y in range(x, nummonths):
        cohorts.iloc[x,y] = data.ix[x,'nps'] * math.pow(data.ix[x,'duration'], y-x)
Calculating this n by n array with n>500 is instantaneous in VBA, but is taking 10+ seconds in pandas with jupyter. Any thoughts on what makes this so slow in pandas or better way to go about this?

Basically I was going to say what Foxfire_ said; for loops in Python are slow and usually aren't what you want to use for bulk-computation. VBA is a compiled language, but Python is not.

The first-order speedup you can apply to code like this is to use vectorized numpy operations instead of for loops.
code:
import numpy as np
for x in range(0, nummonths):
    cohorts.iloc[x, x:nummonths] = data.ix[x, 'nps'] * np.power(data.ix[x, 'duration'], np.array(range(nummonths-x)))
That's one loop removed and should be significantly faster. It's possible to get rid of the outer loop as well using something like meshgrid to take care of the indexing but it's too late and I'm too tired to type it all out

QuarkJets
Sep 8, 2008

Malcolm XML posted:

This has little to do with python being compiled or not: pandas has a ton of overhead if you don't do things the correct way

Well there you go, I didn't know that because I don't use pandas (like at all)

QuarkJets
Sep 8, 2008

shrike82 posted:

HumbleBundle is offering a bunch of Python books (mostly newbie/intermediate level) on the cheap -
https://www.humblebundle.com/books/python-book-bundle

Seems like a good stack of stuff for up to $15. I checked a couple of the prices on Amazon and they were all at least $20 each, albeit an ebook sometimes isn't as convenient as a paperback

QuarkJets
Sep 8, 2008

Dominoes posted:

Explicit is better than implicit... ;)

Wouldn't that necessitate also specifying that the step size is 1, then?

QuarkJets
Sep 8, 2008

[i for i in range(1, 100, 1)]

QuarkJets
Sep 8, 2008

sorry i meant

[_ for _ in range(0,1,1)]

QuarkJets
Sep 8, 2008

Cingulate posted:

For real though, how did you oldsters ever live with Python 2 and its leaking list comps?

I'm not sure that it has ever mattered to me; I just kind of assumed that the last value of the comprehension variable would be available in the local scope and didn't realize that this was actually undesired behavior.

e: I still use Python2, because I can't choose what our production systems run. A surprising number of people are in this boat

QuarkJets fucked around with this message at 22:45 on Apr 12, 2017

QuarkJets
Sep 8, 2008

Methanar posted:

I'm not actually working with strings. It's a list.

A list of strings?

QuarkJets
Sep 8, 2008

Methanar posted:

The data is actually more structured like this. I gave the replace snippet a shot, but it didn't do anything.

code:
{ 
 "1": [
    "2.example.com",
    "2.example.com"
  ],
  "2": [
    "2.example.com"
  ],
  "3": [
    "3.example.com",
    "4.example.com"
  ]
} 
Now what I'm thinking is the json output is actually a string (I think), so maybe I can do my string manipulation after writing to json, but that's not going so well either.

code:
 
        groups = {key : filterIP(list(set(items))) for (key, items) in groups.iteritems() }

        s = self.json_format_dict(groups, pretty=True)
#       print(s)

        def filterSub(fullList2):
        return re.sub(r"example.com$", "sub.example.com", fullList2)

        print(filterSub(s))

What you have is a dictionary where the keys are strings and the values are lists of strings, and what you want to do is do a replace in all of those lists of strings. Tables-breaking pseudocode:

Python code:
s_replaced = {key : [domain.replace('example.com', 'sub.example.com') for domain in val] for key, val in s.iteritems()}
It's preferrable to use string operations instead of regex whenever you can, there are far fewer pitfalls and it's usually simpler and faster

QuarkJets fucked around with this message at 01:09 on Apr 15, 2017

QuarkJets
Sep 8, 2008

Vivian Darkbloom posted:

A central canvas, a few lines of text underneath, and a few buttons horizontally arranged beneath the text. The canvas would need to be updated to recolor different areas on the board. Don't really need the ability to read clicks from the canvas or interact with any fancy widgets -- just whipping up a demo.

e: I am thinking tkinter will be ok for what I need.

Can tkinter handle those canvas operations you described? That sounds more like something you'd use Qt for (which is multi-platform but would create a pyqt dependency)

QuarkJets
Sep 8, 2008

Tigren posted:

That's an actual marketable skill with virtually infinite learning resources vs some weird, arcane language on top of a language that produces interfaces from 1995.

These descriptions both describe Flask

QuarkJets
Sep 8, 2008

Probably execution is supposed to stop at that point, once the error is raised? Definitely throw a raise at the end of that except, if so

QuarkJets
Sep 8, 2008

Eela6 posted:

A catch of Exception (or worse yet, BaseException) without a re-raise is a huge red flag.

That's often true, but there are circumstances where it can be okay, such as in a process that's supposed to live forever.

e: Which I'd caveat with "you should log the stack trace yourself in that case"

QuarkJets fucked around with this message at 23:55 on May 9, 2017

QuarkJets
Sep 8, 2008

Miniconda should have everything that you need to create a conda environment. If you have an internet-connected system with similar hardware, then you could just fully build your environment there and then move the whole anaconda directory to your not-connected system, saving you some time if anything goes wrong.

You should be able to just build both of the packages, in the right order, without creating multiple environments. Create the environment, build package1, build package2

QuarkJets
Sep 8, 2008

Boris Galerkin posted:

To the first point, on my laptop anaconda was installed to $HOME/anaconda3. I can just upload this entire folder up to the other machine and add it to the PATH?

Yup, the anaconda directory is portable, it can be moved to a different similar-enough platform and it will still work. The only caveat is that some of the files in anaconda3/bin will use $HOME/anaconda3/bin/python in their shebangs, so to get everything working perfectly you either need to A) install to $HOME/anaconda3 on the remote machine or B) install locally to the same path that you intend to use on the destination machine. Or I guess you can write a script to modify all of the shebangs to point to your final destination path

QuarkJets
Sep 8, 2008

huhu posted:

code:
snip
code:
snip
Is there a cleaner way to not have so many os.path.join() or would that require knowing the operating system I'm working with?

The cleanest way is to import join from os.path, then you can just call join instead of os.path.join. You could use glob and/or walk to help generalize

Python code:
from os.path import abspath, dirname, join
from os import walk

rootPath = join(dirname(abspath(__file__)), 'temp')
for rootdir, dirnames, filenames in walk(rootPath):
    if 'x' in filenames:
        with open(join(rootdir, 'x'), 'r') as fi:
            for line in fi:
                 print fi

QuarkJets
Sep 8, 2008

Dominoes posted:

I tried everything! At one point I had all 3 versions installed at once, but couldn't get pip to work with 3.6. Somehow broke my system beyond repair (ie fixing would be more difficult than starting fresh) troubleshooting it.

I have no idea how you do that without intentionally installing into system paths. The installer defaults to your home area and will create a standalone anaconda distribution that doesn't mess with any system paths and will definitely just work

It sounds like you maybe tried to replace the system python with anaconda, which is a bad idea

QuarkJets
Sep 8, 2008

I think in general it's ill-advised to try and upgrade the system Python. So many things will break because they were made to use the specific version of Python that came with the system, so trying to replace it will cause problems

QuarkJets
Sep 8, 2008

I create functions for common code chunks that I either have reused before or know I will reuse later. For instance the file permission features that come bundled with Python are pretty simplistic, and at a time when I was working often with file permissions I wrote a little recursive chmod function. Now whenever I want to recursively change the permissions and ownership of a directory I just import that function.

If I've forgotten how something works I either google for it or just play around with it in a terminal. Say if I forgot whether os.walk iterated over (rootname, dirs, filenames) or (filenames, dirs, rootname); opening a Python session and invoking os.walk will show you right away what's going on. If I completely forgot that os.walk was even a thing then googling around should pull it up, since Python is an extremely popular language

QuarkJets fucked around with this message at 00:57 on Jun 9, 2017

QuarkJets
Sep 8, 2008

Malcolm XML posted:

:psyduck:

Did u think that 64 bits was enough to represent all of the reals?

That has nothing to do with floating point errors though; exact Decimal types exist without having to represent all of the reals

QuarkJets
Sep 8, 2008

Boris Galerkin posted:

It's more of a hardware limitation than an actual thing. In real life 0.3 is exactly 0.3, but we lose some accuracy when we represent it in a computer.

I imagine for most people none of this has any direct relevance, but if you're doing any sort of numerical/computational work then this type of stuff comes up all the time. Doing simulations for example we already accept that there is an inherent error from the actual math equations and simplifications to them, but we also accept that there are errors from discretization (dividing the problem up into multiple smaller parts), and we also must be aware of problems in roundoff/precision due to the computer. Lots of fields use 64 bit floats for example because it gives us more precision (more decimals).

I remember one of my earliest courses in this field our professor intentionally misled us to use MATLAB to write our first finite difference solver to solve a problem that produced nonsense results because the default floating point precision in MATLAB (at that time? not sure if it's the case still) was 32bit. Due to error propagation these errors eventually ruined the solution. Telling MATLAB (or using C/Fortran double precision numbers) to use 64 bit reals on the other hand fixed the problem because these errors never had a chance to propagate.

I don't think that it's accurate to call that a hardware limitation, otherwise Decimal couldn't exist

QuarkJets
Sep 8, 2008

Malcolm XML posted:

but the point is that no finite representation can represent real numbers w/o some loss and imprecision, fixed point decimal does it differently than IEEE-754


arbitrary precision is a different game but still limited by practicality

And I agree with that, just not the vaguer post that you made earlier (because you can crank up the precision and use clever software to circumvent many of the common problems in floating point arithmetic without having to represent all real numbers; for instance financial software can use the Decimal type instead of float, but Decimal itself does not represent all reals with arbitrary precision)

QuarkJets
Sep 8, 2008

KernelSlanders posted:

As it should be (open to beginners). My issues was not with him, it was with the guy who thinks an acceptable answer is there's something wrong with you if you think you have to worry about the 64th decimal place, and the guy who thinks theres something wrong with needing to know arcane implementation details.

Some of us work on projects where we do care about the 64th decimal place and we do care about messy internals.

Nippashish is not saying either of those things. He gave an extremely reasonable rule of thumb for how to think of floats and suggested that most people don't have to read the Floating Point Arithmetic chapter of a CS textbook just to use floats effectively

QuarkJets
Sep 8, 2008

funny Star Wars parody posted:

I'm assuming that you should use doubles but please expand on this

Back when bitcoin was still newer a bunch of people learned first-hand that floating-point arithmetic has precision issues.

Say I have $2 in my account and I deposit an additional $0.10 cents. Because my financial software is bad and written by someone with almost no coding experience, now my account balance reads $2.09 (in memory it reads as 2.099999....). But even if your website is smarter than that and chooses to round up, these precision errors will slowly accumulate over time and the balance will be different than if someone had gone through with a calculator and tried to calculate the balance by hand.

Similar example, compound interest calculations are performed repeatedly and can result in numerous floating point errors compounding into a significant deviation between what's reported in the account and what should actually be in the account.

QuarkJets
Sep 8, 2008

Malcolm XML posted:

Actually this is the worst possible takeaway because Numerical analysis is a) hard and b) for even most practical purposes requires an understanding of when it'll fail on you. Catastrophic cancellation and loss of precision can lead to cases where your component will fail and fail hard. Unless you don't want to engineer robustly, sure you can ignore it. I have run into many cases where poor or no understanding of float approximations have lead to pernicious bugs in production systems costing lots of money

While I don't expect people to understand IEEE-754 the standard in its entirety, it is immensely unhelpful to present it as a real number/exact fraction abstraction since it's leakier than a sieve (but designed in such a way to be useful for those in the know) and frequently beginners will smash their heads against the wall for days until they are told how the semantics of floats work when they run into "random" "non-deterministic" errors that are 100% deterministic.


For example, addition is commutative but not associative. This isn't trivial nor particularly expected.

>>> 1e16 + 1 == 1e16
True

A way of salami slicing

Perfectly fine if your algorithms are insensitive to things like that (i use floats in quant finance optimization models) but try simulating a chaotic system and watch as floats are completely useless

It gets better when you have no idea how chaotic your system is!


Malcolm XML posted:

Floats are 100% deterministic. They will do 100% of what you tell them to do, but that's a pretty complex operation involving rescales and possibly higher precision intermediaries.

If you are calculating mathematical functions, use a library or look up a numerically stable algorithm.

This is the level of worrying that Nippashish said a newbie shouldn't be at, and he was 100% right. "Why is this single line of floating-point arithmetic giving me a funny-looking answer" requires a very basic understanding of the issue, not "read this essay on the difficult subject of numerical analysis"

QuarkJets fucked around with this message at 21:58 on Jun 11, 2017

QuarkJets
Sep 8, 2008

Malcolm XML posted:

Newbies shouldn't worry about how to use lathes, just have them lose a finger or break a few thousand parts instead of having them learn how to use their tools

Is writing bad analogies required for numerical analysis, now?

You're seriously fighting for the position that says "a person new to programming has to have a solid understanding of the difficult topic of numerical analysis before writing a program that uses floats". Is that really the hill that you want to die on?

QuarkJets
Sep 8, 2008

Malcolm XML posted:

well it keeps me in a job, so pragmatically no


but yes you should understand your tools before you use them, maybe read the fine manual as well

Malcolm XML posted:

You're wrong and should feel wrong. If you're advocating writing scientific software or numerical software without understanding what's going on, please let me know what you make so I can never use it

Not everyone programs fart apps exclusively so its worth knowing when you can and when you can't ignore abstractions


OP go read http://floating-point-gui.de/basic/

We're not talking about scientific software or numerical software. You keep saying that a careful analysis of numerical precision is necessary before writing any code at all if there's a chance that it will have a double variable somewhere in it. You're basically preaching premature optimization of anything more complex than a Hello World program

Adbot
ADBOT LOVES YOU

QuarkJets
Sep 8, 2008

jusion posted:

I think you have to be intellectually dishonest to think something like 0.1 + 0.2 == 0.3 would be rare in any sort of program. If you want to program you have to understand boolean logic. If you want to program you have to at least understand the basics of floating point math and its pitfalls. This isn't really an extreme position. Maybe Malcolm was a little mean, but come on...

No no no, I'm arguing for an understanding of the basics. Malcolm is arguing for a far more extreme position than that

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply