Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Dr Subterfuge
Aug 31, 2005

TIME TO ROC N' ROLL

Your line:

code:
folium.Marker(position, icon=folium.Icon(color=cl)).add_to(map_1)
is only firing once because it isn't part of the for loop. The for loop only acts on things that are indented more than it, so what your program is doing is reassigning your color variable until your are done iterating over your column, exiting the for loop, and applying the last value it computed to the map. Just indent the folium line and you should be golden.

What you are doing is like this:

code:
num = 0
for i in range(5):
    num += i
print num
The behavior you want is like this:

code:
num = 0
for num in range(5):
    num += i
    print num

Adbot
ADBOT LOVES YOU

Dr Subterfuge
Aug 31, 2005

TIME TO ROC N' ROLL
Here's the documentation for the re module itself. And a how to that's a bit more digestible.

Dr Subterfuge
Aug 31, 2005

TIME TO ROC N' ROLL
Pretty sure this would also work:

Python code:
import re

for song in songs:
    # This regex substitutes matches of a pattern with a supplied string and returns the result.
    lyrics = re.sub(r'(\[.*\])', '', song['lyrics'])
    song['lyrics'] = lyrics   

Dr Subterfuge fucked around with this message at 19:31 on Jan 25, 2018

Dr Subterfuge
Aug 31, 2005

TIME TO ROC N' ROLL

Baronash posted:

Is there a good way to have the extra series added as columns, rather than additional rows?

I've never actually used pandas, but I can tell you that you should be working with a DataFrame as your result. Series are one-dimensional arrays, which is why appending one to another only extends the length full_list.

Dr Subterfuge
Aug 31, 2005

TIME TO ROC N' ROLL

Thermopyle posted:

Here's a neat thing if you're one of the cool people like me who can mostly move most of their projects to keep up with the latest python versions. Coming in Python 3.7 are data classes!


Say I have a program that uses a dictionary to pass data between functions. But at this point it's getting unwieldy, and attribute access sounds more appealing. Is there something in PyCharm that makes that refactoring less painful?

Dr Subterfuge
Aug 31, 2005

TIME TO ROC N' ROLL

Thermopyle posted:

I can't think of anything specific to PyCharm that would make that easier other than maybe some fancy regex search/replace.

You can write a class and override the appropriate dunder methods for attribute access...or you can use a library like Box which allows you to access your dict via attribute or item access. That would allow you to keep your functions as is and slowly convert them over to dotted access.

That being said, I'm not convinced it's a good idea. On the one hand, I also find dotted access smoother and less irritating to use, but on the other hand the way you access dicts and the way you access objects is an established thing in Python-land.

Whatever you do, I don't think there's an easy way to refactor this.

I've done the method where I implement the appropriate dunder methods on a class to let me access a dict via attribute and the bad part when it comes to PyCharm is that you don't get any help on autocompletion since the attributes are dynamic in nature.

Well you've given me the idea of just running a regex script on the files themselves that subs out a bracketed key in the code for the dotted equivalent. Shouldn't be too hard to make a pattern for, at least for the majority of cases. Hopefully. At least it gives me an excuse to mess with functional replacement in re.sub. Thanks!

Dr Subterfuge
Aug 31, 2005

TIME TO ROC N' ROLL

wasey posted:

but I can't seem to properly assign the info. Thanks for helping out

How have you been trying to do it so far?

Dr Subterfuge
Aug 31, 2005

TIME TO ROC N' ROLL

Wallet posted:

If I have a list of numbers, like so:
[0, 14, 18, 20, 36, 41, 62, 70, 72]

And I want to extract all longest possible lists from it where no two consecutive items are more than, say, 5 apart:
[[14, 18, 20], [36, 41], [70, 72]]

It's not hard to construct a loop to do this with a few conditionals, but is there an elegant python solution?

Unless that first term is always supposed to be ignored, it would seem that your output list should start with a [0].

e: No 62 either. Are you just ignoring all singleton lists?

Dr Subterfuge fucked around with this message at 23:57 on Feb 8, 2018

Dr Subterfuge
Aug 31, 2005

TIME TO ROC N' ROLL
map sounds like what you're talking about?

Dr Subterfuge
Aug 31, 2005

TIME TO ROC N' ROLL

Cingulate posted:

baka is, in fact, literally talking about map:

:doh:

Trying to python from my phone was a bad idea.

Dr Subterfuge
Aug 31, 2005

TIME TO ROC N' ROLL
json.loads is turning your pandas json file into a nested dict. You're getting yelled at because "collector_key" is in fact a key whose value is a dict, and you can't turn a dict into an int.

E: It looks like your code is assuming that you have a collector key for each number, but what you actually have is one collector key with a bunch of numbers.

E2: But that still wouldn't work with your code now, because you're accessing "collector_key" from the same outer dict that you are trying to iterate over.

Dr Subterfuge fucked around with this message at 02:44 on Mar 3, 2018

Dr Subterfuge
Aug 31, 2005

TIME TO ROC N' ROLL
It puts everything from the imported package into your global namespace, so you can do things like call shuffle() directly instead of calling random.shuffle(). Practically its advantage is it cuts down on typing. Maybe there are other reason to do it that I am not aware of. It's generally not a good idea though because it imports everything implicitly, which makes it harder to understand where something like shuffle is defined, and it could cause hidden conflicts if you have something else with the same name in your global namespace. You can get the same behavior more explicitly by doing "from random import shuffle as shuffle" and you only get what you want.

Dr Subterfuge
Aug 31, 2005

TIME TO ROC N' ROLL
There are also ways to automate Excel file creation from python if you haven't already gone that route.

Dr Subterfuge
Aug 31, 2005

TIME TO ROC N' ROLL
You'll want to put an __init__.py everywhere so can import your folders as packages

which will allow you to do this

Python code:
import output.run_1.code.input_generation as ig

Dr Subterfuge
Aug 31, 2005

TIME TO ROC N' ROLL
It's not clear to me that

code:
import input_generation as ig
would work even in your first case, unless maybe /scripts is already in your sys.path somehow? Adding __init__.py to other subfolders shouldn't change that behavior one way or another.

This will work with an __init__.py in /scripts though:

code:
import scripts.input_generation as ig
E: added link to sys.path and /scripts import

Dr Subterfuge fucked around with this message at 22:26 on Mar 14, 2018

Dr Subterfuge
Aug 31, 2005

TIME TO ROC N' ROLL
__init__.py is basically a flag that tells python the folder that contains it is a package that can be imported. It can optionally contain some code to do some setup work (but it's going to be called every time you import anything in that package, so things should only go there if they really are general for the whole package). Python files can also be imported directly and are known as modules.

The lookup behavior for python is determined by an environment variable called PYTHONPATH, which is basically a list of places on your system to look through when python reaches an import statement, plus the current directory (where you launched your script). The search precedence starts with your current directory and works down the list in PYTHONPATH. The search stops as soon as it finds something, so if you have a local module called math, python will import that instead of the default one because your current directory takes precedence over everything else. Any module that is in a folder pointed to in your list of search paths can be imported directly, which means, for example, you if you have main.py in the same folder as foo.py, you can just import foo from main like this:

code:
import foo
Python stores its list of search paths at runtime in a variable in the sys module called path, which you can see like this:

code:
import sys

print(sys.path)
Python is actually looking through sys.path when doing the imports, so you can change where your script searches and when to your heart's content. Note that just because you can doesn't mean that you should, so it's best to work with the default behavior unless you have a good reason not to.

Dr Subterfuge fucked around with this message at 19:32 on Mar 15, 2018

Dr Subterfuge
Aug 31, 2005

TIME TO ROC N' ROLL
Yeah I mostly just trying to present a more complete picture of what was going on. The simplest way is making /scripts into a package and... not doing whatever it is you think is making editing sys.path necessary.

Dr Subterfuge
Aug 31, 2005

TIME TO ROC N' ROLL
Say I have two dataframes:
df1
code:
      A    B 
0    'a'   NaN   
1    'b'   NaN   
2    'c'   NaN  
3    'd'   NaN
and df2:
code:
      A    B 
0    'a'  'one'   
1    'c'  'two'
2    'd'  'three'
How do I replace the column B values in df1 with df2?
code:
      A    B 
0    'a'  'one'   
1    'b'   NaN
2    'c'  'two'  
3    'd'  'three'  
This seems to work as a solution to the example, but it feels like I'm just smashing pieces together until something works.
Python code:
df1 = pd.DataFrame({'A': ['a', 'b', 'c', 'd'],
                    'B': ['.']*4},
                    index=[0, 1, 2, 3])
df2 = pd.DataFrame({'A': ['a', 'c', 'd'],
                    'B': ['one', 'two', 'three']},
                    index=[0, 1, 2])
df1.set_index('A', drop=False, inplace=True)
df2.set_index('A', inplace=True)
df3 = pd.concat([df1, df2], axis=1)
df4 = df3.iloc[:,[0, 2]]
print(df4)
It doesn't preserve the initial indexing, but that's recoverable.

Dr Subterfuge fucked around with this message at 10:31 on Mar 18, 2018

Dr Subterfuge
Aug 31, 2005

TIME TO ROC N' ROLL

Cingulate posted:

You mean like this?

code:
df1 = pd.DataFrame({'A': ['a', 'b', 'c', 'd'],
                    'B': ['.']*4},
                    index=[0, 1, 2, 3])
df2 = pd.DataFrame({'A': ['a', 'c', 'd'],
                    'B': ['one', 'two', 'three']},
                    index=[0, 1, 2])
df1b = df1.set_index("A")
df1b["B"] = df2.set_index("A")["B"]

print(df1b)
       B
A       
a    one
b    NaN
c    two
d  three
(The assignment aligns on index.)

That feels much better. Somehow I was fixated on directly building the df I wanted instead of relying on col assignment. On the other hand


vikingstrike posted:

That's a one liner with pd.merge().

Oh hell. So it is.

Python code:
df1['B'] = df1.merge(df2, how='left', on=['A'])['B_y']
Thanks to you both. I need to get better at joins.

Dr Subterfuge
Aug 31, 2005

TIME TO ROC N' ROLL

baka kaba posted:

While we're on this, this is the best one of those visualisation videos I've seen

https://www.youtube.com/watch?v=sYd_-pAfbBw

The hue is where the dot should be in the array, and the closeness to the centre is how out of position it is. As each one gets sorted it flies out to its position on the circle edge. Some wild stuff happens in there :eyepop:

Holy poo poo Radix is amazing.

Dr Subterfuge
Aug 31, 2005

TIME TO ROC N' ROLL
That's what json.loads has done. Printing it just displays message (which is now a python dict) as string so it can be displayed.

Dr Subterfuge fucked around with this message at 00:53 on Mar 30, 2018

Dr Subterfuge
Aug 31, 2005

TIME TO ROC N' ROLL

SnatchRabbit posted:

Right, but what I need is message to be this portion, in JSON, so I can use individual values (strings) like resourceID, and put that in a statement later. Not sure that makes sense, like I said I'm terrible with python.

code:
{"awsAccountId":"#######","configRuleName":"restricted-sshv2","configRuleARN":"arn:aws:config:us-west-2:########:config-rule/config-rule-htirkv","resourceType":"AWS::EC2::SecurityGroup","resourceId":"sg-######","awsRegion":"us-west-2","newEvaluationResult":{"evaluationResultIdentifier":{"evaluationResultQualifier":{"configRuleName":"restricted-sshv2","resourceType":"AWS::EC2::SecurityGroup","resourceId":"sg-#######"},"orderingTimestamp":"2018-03-29T02:03:20.664Z"},"complianceType":"NON_COMPLIANT","resultRecordedTime":"2018-03-29T02:47:53.647Z","configRuleInvokedTime":"2018-03-29T02:47:53.207Z","annotation":null,"resultToken":null},"oldEvaluationResult":null,"notificationCreationTime":"2018-03-29T02:47:54.468Z","messageType":"ComplianceChangeNotification","recordVersion":"1.0"}

Like Data Graham said getting resourceID from message would just be message['resourceID']. If you need to turn it back into a JSON string you would do json.dumps(message)

Dr Subterfuge
Aug 31, 2005

TIME TO ROC N' ROLL
You have a typo in image 2.

Dr Subterfuge
Aug 31, 2005

TIME TO ROC N' ROLL
Instead of explicitly searching intersections it would probably be easier (at least from a coding perspective) to use pandas and merge dataframes.

Dr Subterfuge
Aug 31, 2005

TIME TO ROC N' ROLL

Cingulate posted:

Thread title

I had this feeling I had read some joke about how often pandas was recommended in this thread. Somehow I actually hadn't known about it until I read about it here. Intersections between tabular data sounds like a pretty good use case at least!

Dr Subterfuge
Aug 31, 2005

TIME TO ROC N' ROLL
You can't open a list of file paths. You have to do it one at a time.

Dr Subterfuge
Aug 31, 2005

TIME TO ROC N' ROLL

Seventh Arrow posted:

Yikes, really? So if it generated a hundred files, I'd be pretty hosed. I guess for now I'll just be thankful that the exercise only has five files.

So should I just do five "for img in img_url:" passes, each with a different filename?

You can iterate over your urls and locations at the same time with zip.

Python code:
for img, loc in zip(img_url, locs):
    response = requests.get(img, stream=True)
    with open(loc, "wb") as handle:
        for chunk in response.iter_content(chunk_size=512):
            if chunk:
                handle.write(chunk)
Also, I changed your open() call to the preferred syntax. It handles all the background stuff of making sure the file closes and whatnot for you.

Dr Subterfuge
Aug 31, 2005

TIME TO ROC N' ROLL
You don't have to have any more infrastructure than what I posted. In your code there isn't actually any reason to want to open a bunch of files simultaneously anyway, since each url is accessed sequentially, and each url corresponds to one location.

Dr Subterfuge
Aug 31, 2005

TIME TO ROC N' ROLL
I've been trying to figure out decorators, but one thing that I've always had trouble with is composition of functions. Basically, I realized that I was creating a bunch of methods that were each supposed to modify a different class attribute. I was getting the attribute from self, doing whatever I needed to change the value, and then setting it back. This seems like an ideal use case for a decorator, but I'm struggling to write one that does this.

It seems like I should be able to change this:
code:
class A(object):
    def __init__(self, data):
        self.a = [1]
        self.data = data

    def self.update_a(self):
        a = self.a
        a.append(self.data)
        self.a = a
to something like this:
code:
class A(object):
    def __init__(self, data):
        self.a = [1]
        self.data = data

    @modify_self('a')
    def update_a(self, a=[]):
        a.append(self.data)
        return a
where
code:
>> A(2)
>> A.update_a()
>> A.a
[1, 2]
I know I can use inspect to find the class that called update_a (this seems promising), and then use getattr(cls, 'a') to get the value to inject. I think my questions are
1. how to get arbitrary arguments out of a decorator call and
2. how to use functools.update_wrapper to inject the new value. I think I actually need functools.partialmethod?

On the other hand, if there is a better pattern than this class structure to go about updating a bunch of fields (in different ways) from a given input, that would be good to know, too. I'm well past the point where my Python abilities have outstripped my design abilities.

E: Well, getting the class is proving to be difficult. The object the decorator says it's wrapping it is the function A.update_a, which breaks the linked class sniffer because the module __main__ doesn't have an attribute A

E2: I think this works. Classes to the rescue! Modified code from here.

Python code:
class modify_self(object):
    """
    Injects the supplied  class attributes into the decorated method and updates those attributes
    with the method's output
    
    usage:
    @modify_self('a'):
    def update_a(self, a=None)
        #do stuff
        return a,
        
    is equivalent to
    def update_a(self):
        a = self.a
        #do stuff
        self.a = a
    """
    def __init__ (self, *args):
        # store arguments passed to the decorator
        self.args = args

    def __call__(self, func):
        def newf(*args):
            #the 'self' for a method function is passed as args[0]
            slf = args[0]

            # get the passed attributes from the containing class
            kwargs = {attr: getattr(slf, attr) for attr in self.args}

            # call the method
            result = func(slf, **kwargs)

            # put things back
            for field, value in zip(self.args, result):
                setattr(slf, field, value)

        newf.__doc__ = func.__doc__
        return newf

class A(object):
    def __init__(self, data):
        self.a = [1]
        self.b = [2]
        self.c = [3]
        self.data = data

    @modify_self('a', 'b')
    def update_a_and_b(self, a=None, b=None):
        a.append(self.data)
        b.append(self.data)
        return a, b

    @modify_self('c')
    def update_c(self, c=None):
        c.append(self.data)
        return c,


>> test = A('test')
>> test.update_a_and_b()
>> print(test.a)
[1, 'test']
>> print(test.b)
[2, 'test']
>> test.update_c()
>> print(test.c)
[3, 'test']
Works as long as the return value of the modified function is always a tuple eg return c, instead of return c.

E3: using functools.wraps(func) as a decorator on newf is probably superior to just making the __doc__ attributes equal?

E4: If someone could explain how the args in the __init__ and __call__ methods are magically different (and why args in __init__ doesn't consume func) I'd be really interested to know. Because this still feels like witchcraft to me.

Dr Subterfuge fucked around with this message at 23:57 on Apr 7, 2018

Dr Subterfuge
Aug 31, 2005

TIME TO ROC N' ROLL

breaks posted:

I don't comprehend at all what you're trying to accomplish. What you've got there at the moment seem to be one hell of a way to write self.some_list.append(some_stuff). On the other hand I've probably never manged to successfully understand a post in this thread when reading it at 2AM (or maybe ever) so I'll just answer these two specific questions:

Yes, if you're wrapping a function and want to make your wrapper appear to be what it's wrapping, use wraps.

The decorator syntax is only a shortcut for something that's otherwise ugly:

code:
def f(*args):
    print(*args)

f = decorator(f)  # If there was no @decorator syntax you would just write this
There is nothing magical about @decorator(a, b). Writing it out by hand results in a very literal translation:

code:
f = decorator(a, b)(f)
# or to really spell it out
actual_decorator = decorator_factory(a, b)
f = actual_decorator(f)


Disregarding multiple decorators and within the rather restrictive limits of what Python's grammar allows after the @, @X before f is the same as f = X(f) after it.

Does this help you understand what is going on? A decorator always gets called with one argument, which is the thing it's decorating. A "decorator with arguments" is a callable that returns a decorator. I think it's more clear to think of that as a decorator factory, but for whatever reasons the common terminology is just to lump it all together as "decorator". In what you wrote, __init__ is called as part of the process of instantiating the class. That's the factory part. Once the instance is created it's then called, which is possible since you defined __call__, and that's the decorator part.

Basically, I have some huge scraper functions that I'm trying to refactor into smaller components (with maybe the eventual goal of messing around with asyncio), and the structure I came up with was to turn each scraper into a class (or rather a subclass of a base scraper) and use attribute access in class methods. So I'm not really just appending things. It was mostly to illustrate that I want to be able to modify an existing value. I'm aware there are whole scraping packages like scrapy that have already solved most of these problems. I'm mostly just messing around with rolling something myself to see if I can learn anything in the process. (But one of those is getting better at designing things, so if there's a better way to go about updating a bunch of different fields in a data structure that would be cool to know.)

Looking at my code again and seeing your examples, I can see now how the two sets of args are different. The first args come from the arguments of the class, and the second args come from the fact that newf replaces the decorated function and gets called on the decorated function's arguments. The big thing I was missing was how the call that modifies the decorated function works.

Dr Subterfuge
Aug 31, 2005

TIME TO ROC N' ROLL
It looks like Sharepoint doesn't like your authentication method. I don't know anything about Sharepoint, and there are conflicting pieces of information online about how to properly authenticate your session, but I can say that it doesn't look like basic authentication is the way to go here.

Dr Subterfuge
Aug 31, 2005

TIME TO ROC N' ROLL
Generally speaking it would seem like using query instead of loc doesn't give you a whole lot since you lose pretty much all the niceties of your editor by moving everything into a string that needs to be parsed eventually anyway. Am I missing something? F-strings maybe? Which might not be a good idea of you care about security?

Dr Subterfuge
Aug 31, 2005

TIME TO ROC N' ROLL

vikingstrike posted:

Sure. Glad it’s working!

Query can be easier to use when method chaining and I believe on larger frames it gets processed using pandas’ eval() which speeds things up. I haven’t timed it recently though so I could be misremembering. I think this is actually discussed in the docs. I normally just use it when chaining methods. I would only worry about the input attack vector if this is public facing code. For personal use I don’t believe it matters. I’m also unsure what editor features you’re referring too but that could be an issue if you rely on something particular.

I'm thinking mostly about basic autocomplete and syntax checking. I make stupid typos way too often and sometimes get screwed over when I'm using strings because PyCharm doesn't know any better.

Dr Subterfuge
Aug 31, 2005

TIME TO ROC N' ROLL

Zero Gravitas posted:

I'm trying to get an example of exactly how long it takes for the youtube algorithm to start pitching far right wing videos at a new user who views soft right or entryist political material.

This is an interesting problem. Something to keep in mind is YouTube keeps track of how much of a video their users watch, so an agent that jumps quickly between videos isn't going to register the same way that an attentive listener (someone who watches the whole thing) would. Same thing with liking videos, of course. Probably comments as well. Probably neither of those are things that you want to be doing with the videos of the ever more deranged parts of the political spectrum, though.

Dr Subterfuge
Aug 31, 2005

TIME TO ROC N' ROLL

baka kaba posted:

Another thing you could try is having a list containing the last 10 seconds' of IPs (filtered to only include the request types you're looking for). So when you get a new IP, add it to the list, then pop off the entries from the front of the list that have a timestamp > 10s earlier than your new entry. Then you can scan your list for matches on that new IP and see if you get 3+

Like a moving window, basically

This is basically the ideal use case for a deque and more efficient than the general purpose list.

Dr Subterfuge
Aug 31, 2005

TIME TO ROC N' ROLL
I learned about deques from Fluent Python, which I picked up because of reading this thread. So cheers all around. :)

Dr Subterfuge
Aug 31, 2005

TIME TO ROC N' ROLL
Should be this:

code:
#This is a Guess the Number game
import random

print('Hello, what is your name?')
name = input()

print('Well ' + name + ' I am thinking of a number between 1 and 20.')
secretNumber = random.randint(1, 20)

for guessesTaken in range (1, 7):
    print('Take a guess.')
    guess = int(input())

    try:
        if guess > secretNumber:
            print('Your guess is too high.')
        elif guess < secretNumber:
            print('Your guess is too low.')
        else:
            break #This condition is for the correct guess
    except ValueError:
        print ('You did not enter a number.')

if guess == secretNumber:
    print('Good job ' + name + '!') 
else:
    print('Nope. The correct guess was ' + str(secretNumber))

print('You took ' + str(guessesTaken) + ' guesses')
Each time the program gets a number, it needs to process it to see if it is correct. That's what is happening in the try block, so it needs to start at the same indentation as everything else in the for loop. Also, except is part of the try, so it needs to be at the same indentation. (Like how else needs to have the same indentation as the if it is a part of.) The break statement needs to be in a for loop because all it does is stop a for loop from iterating further. (In this case you want this to happen so the program stops asking for guesses when the right one has been reached.)

Dr Subterfuge
Aug 31, 2005

TIME TO ROC N' ROLL
Whoops. Yeah. Guess I was too busy looking at the indents to notice the guess.

Dr Subterfuge
Aug 31, 2005

TIME TO ROC N' ROLL
pretty sure this would work

Python code:
from c1_algorithmic_toolbox.utilities.timing import timing

Adbot
ADBOT LOVES YOU

Dr Subterfuge
Aug 31, 2005

TIME TO ROC N' ROLL
And I just realized the print statement isn't a function, so this is python 2. I know some import behavior changed between 2 and 3, but I don't know if that's relevant here.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply