|
I'm kind of surprised we don't have one of these. This thread can be the general catch-all for NoSQL tech discussion and questions. There seems to be a bit of confusion around what NoSQL actually is so this OP will be a crash course. What IS NoSQL? In a nutshell, NoSQL is any database tech that stores information in any manner besides tables formed of rows and columns linked together by joins. This is a very broad category because needless to say- there's a lot of ways to store data besides in tables! Such as what? There's four primary flavors of NoSQL that makes up most of what's out there:
Additional Reading: Check out NoSQL For Dummies. You can also find digital versions of this for free if you Google around a bit. PIZZA.BAT fucked around with this message at 11:06 on Aug 27, 2019 |
# ¿ Dec 3, 2018 05:01 |
|
|
# ¿ Apr 29, 2024 06:48 |
|
rarbatrol posted:Part of my job a couple years ago involved evaluating a bunch of NoSQL technologies for integration with our current stack, for the better part of a year. One of the candidates on my personal short list was FoundationDB. It's a high-performance key-value store with ACID transactions, and a plugin system they call "layers" that allow you to map more complex models down to the KV store. Frustratingly enough, the company was bought by Apple halfway through our project and taken off the market. Well apparently they've open-sourced it earlier this year: https://github.com/apple/foundationdb, and have just released a document store layer in the last few days. Ha I used to be able to brag about how the only other ACID NoSQL DB on the market was gobbled up and blackholed by Apple. Good to know they're kind of back though this will be interesting to play with
|
# ¿ Dec 4, 2018 03:52 |
|
My experience with MarkLogic is that it can handle documents of any size pretty well going up to its limit of half a gig. Also full disclosure for this thread: I work for MarkLogic so I'm gonna have a bit of a bias itt
|
# ¿ Dec 5, 2018 01:42 |
|
Scaramouche posted:Closest I've gotten to NoSQL is wacking together Lucene/SOLR search implementations but I won't lie and say I had a deep understanding of the underlying. From my quick googling this seems to be solely an API tool rather than a storage solution. The wiki article on graph DBs is actually pretty thorough and does a good job describing them: https://en.m.wikipedia.org/wiki/Graph_database
|
# ¿ Dec 5, 2018 20:26 |
|
Naar posted:When I worked at a certain well-known broadcasting corporation, we gave MarkLogic a shitload of money. Apparently they did buy some of the team very expensive tequila, though? Kudos Rex-Goliath, it actually works pretty well (though please stop making those cringeworthy videos). lol yeah our marketing is all around bad and everyone here knows it. it’s bad enough that every client has told me it sucks straight to my face. ¯\_(ツ)_/¯
|
# ¿ Dec 6, 2018 19:40 |
|
Amazon just fired a broadside into MongoDB’s hull: https://aws.amazon.com/blogs/aws/new-amazon-documentdb-with-mongodb-compatibility-fast-scalable-and-highly-available/ Not only did they make a full document database that largely mimics MongoDB- they straight up cloned the API. IMO this is really going to hurt Mongo’s ability to move forward as their customer base is going to cement on the APIs that allow them to switch back and forth with Amazon’s service. But maybe not.
|
# ¿ Jan 14, 2019 14:48 |
|
Abel Wingnut posted:oh hell yes. been hoping this thread would pop up. Razzled posted:Anyone have any good reading or class suggestions for data modeling knowledge with Cassandra? It's an area where I have so little experience that almost all of my suggestions or work in that area amounts to trial and error. I understand the basics but when it comes to best practices etc I'm just totally in the dark Unfortunately I think this thread is gonna be pretty quiet for a while until more people find us. If you guys find answers to your stuff though please come back to fill us in!
|
# ¿ Feb 3, 2019 00:49 |
|
Mr Shiny Pants posted:Is this also the place to ask questions about Hadoop? Sure!
|
# ¿ Feb 3, 2019 15:51 |
|
Abel Wingnut posted:alright, so let me know what you guys think of this It sounds like you're trying to build a data hub Essentially- you split your database into two databases, which it seems like you're already doing. Your first database is the 'staging' database which basically acts as your classical data lake that you dump your data into. You then have scheduled jobs which iterate over that dumped data and harmonizes it into your clean product and places it into you second database, 'final'. Something you want to keep in mind is that with a sql db you're going to have a lot of hurdles to clear any time you want to ingest new data sources or if an existing source changes. It's much easier to ingest the data as-is as quickly as possible and then figure it out in your harmonization stage. Likewise, making your final db relational also places a hurdle in front of you any time you decide you want different information out of your harmonization- which is going to happen a lot as new data sources changes your understanding of what you want out of your final product. It's much easier to keep your staging and final dbs non-relational and then possibly export that final data under a relational lens.
|
# ¿ Feb 5, 2019 16:34 |
|
Abel Wingnut posted:with couchbase, i could see making each hash a document, thereby making hash the key. then from there, i could create the necessary sub-objects (not really sure on the correct term--i need to research this more. arrays maybe?). in any case, then i could go wild with indices to satisfy the queries. indices in couchbase, from what i understand, are stored in memory, and are superfast. but this doesn't strike me as the best strategy, as i'll just be doubling the data in memory and being wasteful. I don't have experience with couchbase but I DO have experience with document-based DBs so I can maybe help with your data modeling a bit. Generally if you're using a document-store you want your use case to drive your model. What are the business requirements driving this? If the queries going into the other database are producing a single 'entity' to be consumed then that's generally what you'll want your document to look like. Secondly on assigning keys. I'm assuming this is similar to a range index in MarkLogic where the keys are stored in memory for fast retrieval. My general rule of thumb is to store primary keys in memory if you can afford it due to the dramatic performance gain you get from it. Typically your keys are going to be a few dozen bytes tops and when multiplied across a few millions documents gives you what... a few hundred megs? This is assuming a naive storage method as well and not something like a spanning tree. Very small price to pay considering the increased performance you get. I'd recommend benchmarking the database and enabling the index just to see how much memory it takes. It can't hurt to try. Resources are meant to be spent!
|
# ¿ Mar 19, 2019 16:12 |
|
Again- I don't know about the specific technologies you're using but most document stores don't naively store documents as-is. They should have some sort of compression occurring so that if you have a lot of redundant data you will only really need to worry about it when it's in memory. Also when I see client's starting to worry about 'lots of redundant data' I find that when I drill down into what they're doing it's because they're still using old sql habits. For example- say you have a primary key that's used as an identifier across dozens or hundreds of different tables. A lot of times I'll see clients storing that identifier all throughout the document as they feel they need to keep it in every location it showed up in the original tables. Why? You only need to have it stored once at the root and that's it. Also keep in mind that if you have repeating elements you don't have to dedicate a field to describing what the particular element is. This is necessary in the SQL world but there's a more elegant way of storing it in a document. Say for example addresses. Your SQL table could look something like this: USER_PK ADDRESS_TYPE ADDRESS 61872 HOME '123 FAKE STREET' 61872 WORK '456 EXAMPLE LN' The naive approach to turning this into a document would look something like this { 'Person': { 'ID': '61872', 'ADDRESSES':[ { 'USER_PK':'61872', 'ADDRESS_TYPE':'WORK', 'ADDRESS':'123 FAKE ST' }, { 'USER_PK':'61872', 'ADDRESS_TYPE':'HOME', 'ADDRESS':'456 EXAMPLE LN' } ] } } However all you're really doing here is storing a relational table in a big document. Don't forget that your element names are allowed to be descriptive here! { 'Person': { 'ID': '61872', 'WORK_ADDRESS':'123 FAKE ST', 'HOME_ADDRESS':'456 EXAMPLE LN' } } It's a very simple example so hopefully I'm getting my point across. You'll find that you'll be able to make your documents much more information-dense if you take some time to sit down and think how to best represent the data as a document rather than copying over a bunch of relational tables. As a rule of thumb- just ask yourself how you would want the data to look if this were something being printed on an actual piece of paper for a human to read and use. That's usually the best direction
|
# ¿ Mar 23, 2019 21:30 |
|
Skim Milk posted:oh jeeze im so sorry. i thought this was YOSPOS. i was phone posting. my mistake no silly i linked it from yospos
|
# ¿ Mar 26, 2019 02:44 |
|
Can you give more info on what you mean by ‘jumble of nonsense’?
|
# ¿ Apr 20, 2019 15:55 |
|
Kudaros posted:I'm a data scientist coming from an academic background. Good on the machine learning, stats, etc., but not so great with databases. I can query relational databases well enough but I'm thinking now about how to organize enterprise data. This isn't my role, but apparently it is nobody's role at my company, at this point. This is what I specialize in & do for a living. First thing you need to do is determine if you want a data federation, a data lake, or a data hub. He goes into what those are and their pros and cons here: https://www.marklogic.com/blog/data-lakes-data-hubs-federation-one-best/ Answering the question of which of those three you need first will help guide which tech and processes you'll need to adopt.
|
# ¿ May 11, 2019 01:32 |
|
Pollyanna posted:Turns out we didn’t need that index at all. The _new_ problem is that somewhere in our massive collection of crap is a string with a null character in it - and I have no idea where. All my attempts at dumping the collection and grepping for \0 have failed. What’s the easiest way to search through a collection and find any fields that gave a string with a null character in it? Depends on the tech you’re using. Sounds like you want to generate an index over whatever element/field you want to search and then search it for null values once it’s done
|
# ¿ Jul 10, 2019 22:27 |
|
so it could be any element anywhere that’s null and that’s what you want to find? don’t want to sound like an evangelical but any time you’re looking for a needle in a haystack that’s when you want marklogic. you could run a wildcard in the element name for a value of null and see what pops out
|
# ¿ Jul 10, 2019 22:37 |
|
yeah we’re not cheap. if that type of problem is something you frequently run into though you should consider us
|
# ¿ Jul 10, 2019 22:45 |
|
New kid on the block: https://www.fusiondb.com/
|
# ¿ Oct 3, 2019 20:30 |
|
abelwingnut posted:what does this offer exactly? I have no clue!
|
# ¿ Oct 8, 2019 03:15 |
|
xQuery is good on MarkLogic but it also uses its own proprietary compiler so ¯\_(ツ)_/¯
|
# ¿ Oct 8, 2019 17:03 |
|
Pollyanna posted:Okay, so. We have several collections of documents with up to 69 (nice) fields each. A large subset of these fields are either null or hold an array of acceptable values for that given field (e.g. age_of_car has [0, 1, 2] representing the age of the car in years). In our program flow we have a set of data for a single instance of our data type (a potential insurance buyer) that has attributes for all these 69 fields. Are there any particular issues you’re running into with Mongo that drove you guys to investigate DocumentDB?
|
# ¿ Oct 18, 2019 19:31 |
|
Yeah this really does sound like your architect is making it more complex for the sake of it.
|
# ¿ Oct 18, 2019 21:41 |
|
couchbase should be able to handle that just fine
|
# ¿ Feb 12, 2020 19:29 |
|
eschaton posted:write a tiny bit of code to import the repetitive, implicit-schema JSON bullshit into a real database, even just SQLite, and do your queries against that Please don’t be that guy itt, thanks
|
# ¿ Feb 14, 2020 15:17 |
|
Dumb Lowtax posted:Would I be able to do those queries online (not locally)? How hard would this be compared to just hosting the JSON somewhere, if I am completely unfamiliar with the types of software used for SQL, and don't remember anything at all from my databases class? Yeah 5 GB is pushing it if you’re looking for free hosting. The only thing I can think of is DynamoDB which has 25 GB of free hosting but I also have zero experience with that. Honestly with the fact you’re *just* above the free tiers in demand you may just want to bite the bullet and pay for something. I doubt it will cost you more than a few bucks.
|
# ¿ Feb 14, 2020 15:58 |
|
I know this shouldn’t be anything new to anyone itt but here’s a pretty thorough takedown of Mongo if anyone needs to talk their management out of sticking their dick in that mousetrap: http://jepsen.io/analyses/mongodb-4.2.6
|
# ¿ May 15, 2020 21:48 |
|
|
# ¿ Apr 29, 2024 06:48 |
|
This would be pretty straightforward with nosql DBs that have triples. You should be able to get what you're looking for with the $lookup feature though. Don't know how it works under the hood / it's performance implications but it'll let you write the query as a left-outer join
PIZZA.BAT fucked around with this message at 17:42 on Dec 30, 2020 |
# ¿ Dec 30, 2020 17:39 |