My health care software platform is written in POLIO
|
|
# ? Mar 23, 2021 15:57 |
|
|
# ? Apr 25, 2024 13:30 |
|
xtal posted:I honestly don't know that much Yeah, couldn't tell at all.
|
# ? Mar 23, 2021 16:36 |
Should be TypeScript obviously
|
|
# ? Mar 23, 2021 16:36 |
|
To be clear, this is a research data set which lives on a HIPAA compliant server. It is currently a bunch of excel files that were pulled by the IT team who run the health system databases. I would like to turn the excel files into a database so that things are much more organized and I can learn about databases. Hence why I was asking for help.Da Mott Man posted:Its been a while working with sqlalchemy but if I remember correctly your id columns should have autoincrement=True and for speed of lookups you should have an index=True for any column you would use to select records. I have changed the column to DateTime, that was a typo. Thanks for the keyword 'cascade' - that allowed me to look up what I wanted. I think the new definition of Patient with cascade='all, delete' will accomplish the desired behavior, right? Python code:
Hollow Talk posted:I'm not a fan of pro-forma integer id columns. I see these columns often, and they are useless more often than not (auto-incrementing columns are only really useful if you want to check for holes in your sequence, i.e. for deleted data, and that can be solved differently as well). As I elaborated above I am trying place the data stored in Excel files into a database. I expect at some point in the future to have more data to add to the database. The data timestamps are only specific to the date (i.e., I only know that a lab value arrived on 2020/03/21, and not at a specific time on that date). Would the following change to the Diabetic class be along the lines of what you are suggesting? Python code:
M. Night Skymall posted:You should probably not use the MRN as the identifier throughout your database. It's generally best practice with PHI to isolate out the MRN into a de-identification table and use an identifier unique to your application to identify the patient. It's easy enough to do a join or whatever if someone wants to look people up by MRN, but it's useful to be able to display some kind of unique identifier that isn't immediately under all the HIPAA restrictions as PHI. Even if it's as simple as someone trying to do a bug report and not having to deal with the fact that their bug report must contain PHI to tell you which patient caused it. Not vomiting out PHI 100% of the time in error messages, things like that. Just make some other ID in the patient table and use that as the foreign key elsewhere. Would you have this same concern given the use case (a research database where the only people which access to it are on the IRB protocol - currently just me for now)?
|
# ? Mar 23, 2021 17:46 |
|
Jose Cuervo posted:To be clear, this is a research data set which lives on a HIPAA compliant server. It is currently a bunch of excel files that were pulled by the IT team who run the health system databases. I would like to turn the excel files into a database so that things are much more organized and I can learn about databases. Hence why I was asking for help. There's a pervasive attitude on this website and elsewhere throughout the internet that "important software" should not be written in anything dynamically typed. Usually no consideration is given to the awful paradigms/language sets that previous versions of "important software" have been written in.
|
# ? Mar 23, 2021 17:52 |
|
Personally I'd go the extra mile and import it all into whatever schemaless JSON store is popular on hackernews right now, just to really rub it in. (Also work on healthcare data w/ Python, its completely fine.)
|
# ? Mar 23, 2021 18:16 |
|
Jose Cuervo posted:Would you have this same concern given the use case (a research database where the only people which access to it are on the IRB protocol - currently just me for now)? It doesn't matter what the data is used for. The history of MRN as PHI is pretty dumb in my opinion, but per guidance from the government your MRN is as much PHI as your DOB or name. Having a de-identification table isn't a big deal, you can still store all your PHI in the patients table along with your new unique identifier. It's really *just* to remove the awkwardness of having everything in your DB keyed to a piece of PHI. I mean you're right, it's just you and it probably won't affect much now. But making good decisions about your schema is much..much easier now than it is later, and there's basically no way you will live to regret de-identifying your data in advance, and many ways you can live to regret spreading the MRN all over your database. Anyone who's looking at this going "No no not python!" is just forcing this data to be sucked into a PowerBI tool instead. Don't make this poor person use PowerBI, that's just mean. Edit: MRN is problematic because it's specifically listed as an identifier. If you have an internal ID in use for patients besides MRN that'd also be better than using the MRN directly. One more dumb thing about MRNs, they're specific to the hospital EMR so they aren't guaranteed to be unique. M. Night Skymall fucked around with this message at 18:46 on Mar 23, 2021 |
# ? Mar 23, 2021 18:17 |
|
Is there a good library/framework for creating command line interactive programs?
|
# ? Mar 23, 2021 19:09 |
bobmarleysghost posted:Is there a good library/framework for creating command line interactive programs? click. Unless you mean TUI type apps in which case urwid is pretty solid.
|
|
# ? Mar 23, 2021 19:15 |
|
Jose Cuervo posted:Would the following change to the Diabetic class be along the lines of what you are suggesting? Yes, that would be it. Essentially, the reason why I dislike auto-increment integer columns is that they serve no particular purpose, and they provide no guarantees -- if you want to enforce constraints (i.e. to avoid duplicate data if you run your tool against the Excel files more than once), auto-incrementing columns don't help, because the only constraint for the database is that the primary key is unique (which a sequence ensures), plus any foreign key relations if you specify them. If you insert the same data again, the original option will happily do so, given this data: Python code:
Bash code:
Unrelated: If you can avoid it, try to stick to lowercase column names, so use mrn instead of MRN. Many databases don't care, but for those cases where they do, this saves you from lots of quoting (or general incompatibility). *: This does not apply for data warehousing, where constraints might not be feasible. Or if you use something like Amazon Redshift, because "we'll just ignore all constraints because that will make things faster" is certainly one way to speed up inserts.
|
# ? Mar 23, 2021 19:25 |
|
Bundy posted:click. click definitely looks like what I want, thanks!
|
# ? Mar 23, 2021 19:44 |
|
M. Night Skymall posted:It doesn't matter what the data is used for. The history of MRN as PHI is pretty dumb in my opinion, but per guidance from the government your MRN is as much PHI as your DOB or name. Having a de-identification table isn't a big deal, you can still store all your PHI in the patients table along with your new unique identifier. It's really *just* to remove the awkwardness of having everything in your DB keyed to a piece of PHI. I mean you're right, it's just you and it probably won't affect much now. But making good decisions about your schema is much..much easier now than it is later, and there's basically no way you will live to regret de-identifying your data in advance, and many ways you can live to regret spreading the MRN all over your database. I am on board with doing this, but I am having trouble wrapping my head around how I would accomplish this. Do I define a new column, say pid as the primary_key for the Patient model as below, and use pid as the foreign_key in the other models as below for the Diabetic model? Python code:
|
# ? Mar 23, 2021 20:03 |
|
Hollow Talk posted:Yes, that would be it. Got it. The case of trying to insert data which already exists in the database is something I was thinking about, and the way you suggested defining the compound key neatly solves that issue. As a follow-on question, would you wrap the insert statement in a try-except block, and catch the "sqlalchemy.exc.IntegrityError" exception to deal with not crashing when trying to insert data which is already in the database?
|
# ? Mar 23, 2021 20:10 |
|
nvm
|
# ? Mar 23, 2021 20:11 |
|
M. Night Skymall posted:You should probably not use the MRN as the identifier throughout your database. It's generally best practice with PHI to isolate out the MRN into a de-identification table and use an identifier unique to your application to identify the patient. It's easy enough to do a join or whatever if someone wants to look people up by MRN, but it's useful to be able to display some kind of unique identifier that isn't immediately under all the HIPAA restrictions as PHI. Even if it's as simple as someone trying to do a bug report and not having to deal with the fact that their bug report must contain PHI to tell you which patient caused it. Not vomiting out PHI 100% of the time in error messages, things like that. Just make some other ID in the patient table and use that as the foreign key elsewhere. There's actually a more immediately practical reason - MRNs are assigned per hospital. Therefore, if a patient gets an exam at hospital 1, and goes to hospital 2, the MRN will not match in images taken at hospital 1 vs hospital 2.
|
# ? Mar 23, 2021 20:32 |
|
Jose Cuervo posted:Got it. The case of trying to insert data which already exists in the database is something I was thinking about, and the way you suggested defining the compound key neatly solves that issue. That's an option, yes -- logging the respective rows or writing them into a debug table are both options depending on what is easiest to deal with. Using the engine as a context manager as above takes care of committing or rolling back the current transaction (and it starts a transaction, which can be nested etc.) so that you don't end up with partially written data, but it still raises the exception, so I'd also wrap that if you care most about not crashing.
|
# ? Mar 23, 2021 21:24 |
|
Jose Cuervo posted:I am on board with doing this, but I am having trouble wrapping my head around how I would accomplish this. Do I define a new column, say pid as the primary_key for the Patient model as below, and use pid as the foreign_key in the other models as below for the Diabetic model? You have a meaningless-outside-your-application patient ID that uniquely identifies a patient. All of your deidentified tables reference that for knowing what patient a row is about. Then you have a table that maps from the the patient ID to all the PHI about that patient (MRN, names, DOB, ...). That table could be either inside the same database or elsewhere if you wanted to segregate the dataset into PHI and deidentified parts. You might also need multiple PHI tables for some stuff, like if you want to store multiple different-system/different-site MRNs for one actual person. For each table in the schema, you should be able to classify it as PHI or not and shouldn't be mixing PHI and not-PHI together in the same table. If you've done stuff right, you ought to be able to hand someone all your non-PHI tables and they'd be able to follow information about a particular subject, but not have any PHI about them
|
# ? Mar 23, 2021 21:25 |
|
Jose Cuervo posted:I have changed the column to DateTime, that was a typo. That looks correct to me.
|
# ? Mar 23, 2021 23:54 |
|
Anyone else noted how awful it is googling for anything python-related these days? SEO means the first couple pages of results are all geeksforgeeks.com and other awful beginner tutorial spam. (Which, like, even for beginners, seem bad.) Anyway, here's my actual question: in matplotlib.pyplot.plot there's this concept of a format string, so 'r--' means a red dashed line, etc. Somewhere in matplotlib, I assume, there has to be a format string parser that takes in 'r--' and spits out {color: 'r', linestyle: '--'} or whatever. Point me in the right direction, please? Whenever I write convenience plotting methods I end up fighting to balance the convenience of the very-compact format string style and actual structured argument handling. Zoracle Zed fucked around with this message at 06:32 on Mar 24, 2021 |
# ? Mar 24, 2021 06:29 |
|
As/for someone with very basic programming knowledge who wants to look into programming as a career: Is this Udemy course https://www.udemy.com/course/100-days-of-code/ worth looking into? Seems like a good deal for 63 hours of content, but if it's not good I don't want to put my time into it.* * Specifically, it's on sale right now for $12.99 which sounds like a good deal, but that price for the bold claims they make sounds kinda sus to me. Edit: there's also this one https://stacksocial.com/sales/the-2021-premium-python-certification-bootcamp-bundle from StackSocial that claims to be 41 hours of content worth $2585 for only 34.99 and again, sus. Framboise fucked around with this message at 13:02 on Mar 24, 2021 |
# ? Mar 24, 2021 12:55 |
Talk Python has some good practical courses. Can't speak about udemy but I did some Treehouse content a couple of years back and that was solid.
|
|
# ? Mar 24, 2021 13:38 |
|
Zoracle Zed posted:Anyone else noted how awful it is googling for anything python-related these days? SEO means the first couple pages of results are all geeksforgeeks.com and other awful beginner tutorial spam. (Which, like, even for beginners, seem bad.) You want to know the kwargs alternative to the format string, or the location of the parser module? If the latter why not just call it with an invalid string and see where the exception gets thrown from?
|
# ? Mar 24, 2021 13:45 |
|
Framboise posted:As/for someone with very basic programming knowledge who wants to look into programming as a career: Is this Udemy course https://www.udemy.com/course/100-days-of-code/ worth looking into? Seems like a good deal for 63 hours of content, but if it's not good I don't want to put my time into it.* Udemy courses are always "on sale" for like 10-15 dollars or something, regardless of what their claimed price is, so judge all the courses based on their standard pricing being 12 bucks. No idea about that particular one, I've done some Udemy courses and they were useful for getting up to speed on stuff. Highly rated ones like the one you linked tend to be good, but also MIT has a pretty good python course that at least used to be free. I think this one? You can take it on Edx, maybe this one. The edx course should be free if you just want to watch the videos and receive the assignments, you might have to click around a bit to convince it not to charge you. I don't know, it was a lot more obviously free when I did it many years ago. Either way, Udemy typically just provides video lectures and problems without much interaction/grading, so for something as general as Python you can almost assuredly do just as well with the free offerings from coursera or edx or something I'm pretty sure. In the end it's going to be up to you to actually do the projects and learn things, programming's much more a skill you have to practice than knowledge you memorize or take notes on through lectures.
|
# ? Mar 24, 2021 13:53 |
|
DoctorTristan posted:You want to know the kwargs alternative to the format string, or the location of the parser module? If the latter why not just call it with an invalid string and see where the exception gets thrown from? the latter, and that's exactly what I needed, hah!
|
# ? Mar 24, 2021 15:44 |
|
Zoracle Zed posted:Anyone else noted how awful it is googling for anything python-related these days? SEO means the first couple pages of results are all geeksforgeeks.com and other awful beginner tutorial spam. (Which, like, even for beginners, seem bad.) I can’t tell if I’m just getting old and pining for the imaginary good ol’ days or what, but I definitely feel this. It goes beyond anything Python related, Google takes your search string and returns mostly trash for just about any subject now. Anyway, I usually use site:stackoverflow.com with my Google searches which helps filter out the fluff (and potentially some other good stuff). The main annoyance here is that the results are not sorted by date so you’ll get answers from like 2009 which usually aren’t all that helpful.
|
# ? Mar 24, 2021 15:46 |
|
punished milkman posted:I can’t tell if I’m just getting old and pining for the imaginary good ol’ days or what, but I definitely feel this. It goes beyond anything Python related, Google takes your search string and returns mostly trash for just about any subject now. The thing annoying me about this recently is the top 1 or 2 hits are just pages that poorly scrape the top SO answer and put it on a page filled with shady ads. If you're used to just quickly clicking the first or second result, since it's almost always SO...surprise!
|
# ? Mar 24, 2021 16:45 |
|
Yeah. Nobody's feeling lucky these days.
|
# ? Mar 24, 2021 17:53 |
|
Foxfire_ posted:You have a meaningless-outside-your-application patient ID that uniquely identifies a patient. All of your deidentified tables reference that for knowing what patient a row is about. Then you have a table that maps from the the patient ID to all the PHI about that patient (MRN, names, DOB, ...). That table could be either inside the same database or elsewhere if you wanted to segregate the dataset into PHI and deidentified parts. I guess I am struggling to understand how I would structure this. Based on what has been said I would only need the MRN of the patient and the health system the patient belongs to to uniquely identify them. In the definition of the Patient model below I have set the primary key to be a compound key (pid, mrn, and health_system). Python code:
1. Where do I get the value for pid from? Do I need to generate that separately each time I want to enter a new patient? 2. Suppose I now have data from a given patient (I know their MRN and health system). In order to get their pid, do I first perform a search in Patient to find the pid, and then use the pid to insert their data into the relevant tables? EDIT: If the above is correct, I just tried adding data to the database while adding "autoincrement=True" to the pid: Python code:
Jose Cuervo fucked around with this message at 20:55 on Mar 24, 2021 |
# ? Mar 24, 2021 20:41 |
|
I need a confirmation that my hunch is right here, as it's Python-related but involves some stuff I don't normally have to think about : I teach a Python web development course that uses bcrypt - CFFI is a dependency of bcrypt. I have a student on the new M1 Mac who, when running a Django project where bcrypt gets imported in views, gets a nice long exception that terminates with this: code:
(it feels like the obvious answer is "yes, she's boned" but I also feel like I'm missing something here)
|
# ? Mar 24, 2021 21:46 |
|
Yup. Im surprised a package like bcrypt has a dep that hasn't been packaged for the m1 yet, but without that you'll have to build it yourself and hope it doesn't need any changes to compile and work.
|
# ? Mar 24, 2021 21:58 |
|
I guess there's also having her run it under Rosetta but that's kind of tedious
|
# ? Mar 24, 2021 22:00 |
|
Jose Cuervo posted:I guess I am struggling to understand how I would structure this. Based on what has been said I would only need the MRN of the patient and the health system the patient belongs to to uniquely identify them. In the definition of the Patient model below I have set the primary key to be a compound key (pid, mrn, and health_system). I haven't been following in detail, but I suppose the answers to your questions are: 1) If (MRN, Health System) are how you uniquely identify a Patient, then you would have a (MRN, Health System) → PID mapping that is stored its own table. That mapping lives in its own table and the Patient table only includes your PID. You create the PID however you like. If you needed to use the MRN or Health System variables in something, you would join that PHI table into the non-PHI data on the PID mapping you've created. 2) You would lookup the correct PID for the Patient in the table that contains the mapping.
|
# ? Mar 24, 2021 22:34 |
|
vikingstrike posted:However, this does not automatically add a new unique integer to the pid column each time a new patient is added. Not sure what I am doing incorrectly here. I also have not been following that closely but you may want to check this out since my recollection is that you're using SQLite.
|
# ? Mar 25, 2021 00:54 |
|
What you're trying to seems generally right to me. Would toss in a unique constraint on pid if you want to use (MRN, HealthSystem) as the primary key. Also, MRNs probably should be strings, not integers. "012345" and "12345" are probably distinct MRNs. For the autoincrement, like Wallet said, SQLite has weird autoincrement behavior and specific workarounds in sqlalchemy. Since you don't particularly care that the values are ascending integers as long as they're unique, a reasonable portable thing would be to externally make a UUID and use that. Or you could beat sqlite into giving you a unique integer, but you'll have something somewhat db engine dependent. e: also be aware that foreign key constraints in SQLite by default do nothing. Foxfire_ fucked around with this message at 05:07 on Mar 25, 2021 |
# ? Mar 25, 2021 05:04 |
|
Postgres even has a UUID type, but sqlite doesn't support that.
|
# ? Mar 25, 2021 18:20 |
|
Hollow Talk posted:Postgres even has a UUID type, but sqlite doesn't support that. Ok, so to get around the autoincrement issue in SQLite I decided to simply generate the unique ID (integer) myself (see check_add_patient below). I am now working on trying to put everything together with code to add data to the database: Python code:
In the add_dose_data function, should I only perform a session.commit() call once I have added all the rows of data? I have not been able to figure out how to use context managers with this code - the documentation seems to say I should be using a context manager... Any pointers? Finally, is there try-except block a standard way to deal with the IntegrityError which is thrown when I try to add in a row of data which has previously been added? Thanks for all of the help so far.
|
# ? Mar 25, 2021 19:59 |
|
Simplest way to implement a context manager: open a single session outside of your for loop with one. Example:Python code:
To get even finer-grain than that, you could pass the Session (e.g. the session maker) and create sessions on the spot with a context manager using Session.begin(); then when the context closes the transaction is also committed for you, or a rollback occurs if an exception is raised. Much more slick than doing your own commits or rollbacks. Example: Python code:
|
# ? Mar 25, 2021 23:55 |
|
QuarkJets posted:Simplest way to implement a context manager: open a single session outside of your for loop with one. Example: So (I think) I tried what you suggested, by wrapping the call to check_add_patient in a context manager and wrapping the .add() in a context manager as well: Python code:
|
# ? Mar 26, 2021 01:44 |
|
From the docs, "When you write your application, the sessionmaker factory should be scoped the same as the Engine object created by create_engine(), which is typically at module-level or global scope". So maybe I made the mistake of having you pass the sessionmaker to a function, when instead it should just be called directly; my bad So since your engine is global scope, the sessionmaker should be global scope as well. Wherever you instantiate the engine, also instantiate the sessionmaker. Then just have your functions invoke the session_maker in a context manager whenever a session is needed. Their example looks like this: Python code:
I'm also spotting an append() being used with data returned from a closed context manager; be careful with that, I think that's likely the source of your error. Try this: Python code:
QuarkJets fucked around with this message at 02:21 on Mar 26, 2021 |
# ? Mar 26, 2021 02:18 |
|
|
# ? Apr 25, 2024 13:30 |
|
Anyone ever futz with extending Python using C or Rust to gain performance? I've been reading some articles on it and the idea seems intriguing but I'm sure the reality can be a headache.
|
# ? Apr 4, 2021 13:43 |