The Linux Questions Thread: a bunch of pitfalls, but technically it's possible

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Linux Questions Thread: a bunch of pitfalls, but technically it's possible

«‹›3 »

Twerk from Home: Jan 17, 2009; This avatar brought to you by the 'save our dead gay forums' foundation.

I'm trying to help someone install "SHAPEIT" on a machine that only has glibc v2.11 available. I see static and dynamically linked binaries available, but no mention of what libraries are required other than what glibc version they were built against. I'm assuming that the static binary must only be statically linking everything except glibc, because it's refusing to run on that machine with glibc v2.11.

Is there any sane way to run this without upgrading glibc? I thought that a fully statically linked executable would be ready to run and not need the system's glibc, am I misunderstanding something?

# ¿ Sep 17, 2015 23:05

Adbot: ADBOT LOVES YOU

# ¿ Apr 27, 2024 03:27

Twerk from Home: Jan 17, 2009; This avatar brought to you by the 'save our dead gay forums' foundation.

Neslepaks posted:

Any hardware suggestions for a small home Linux server?

I need something smaller and quieter to take over the reins from the HP9000 that's been my trusty server for 10 years or so. Requirements are so modest that I could almost get away with an RPi or something, but I'd like something that's more reliable and has some real storage. Intel NUCs look juicy but are quite pricey. There are similar cheaper things from Acer or whatever, but I'm not sure I could necessarily count on them to run Linux well? They typically come preloaded with some Windows version that it irritates me to pay for, however little. Also, these media server type things are not exactly meant for the purpose of headless server, but I don't suppose that'd be a problem as long as they don't have some weird destabilizing hardware.

What about the atom NUCs, $130 for dual core and $170 for quad core? Something like this

# ¿ Jan 28, 2016 22:57

Twerk from Home: Jan 17, 2009; This avatar brought to you by the 'save our dead gay forums' foundation.

Vulture Culture posted:

Can confirm that the N3700 NUC is a powerhouse relative to the 6W TDP

You have one? It does plex transcoding? drat, I should have gotten one. I found it way more interesting an idea to have 4 very slow cores, but I picked up a refurb Haswell i5-4250u for $150 on ebay 6 months ago instead. The Haswell NUCs aren't as nice as the Broadwell / Braswell ones though.

# ¿ Jan 29, 2016 02:08

Twerk from Home: Jan 17, 2009; This avatar brought to you by the 'save our dead gay forums' foundation.

Verge posted:

So I need to replace my PC (possibly the only half-respectable components are the GPU and RAM chips) and it will be pretty much a gaming PC. That means a "steam machine" (side question: is that just a glorified PC with Steam's logo? Because it sounds like a glorified PC with Steam's logo marketed at consoleers) if you will. I'd rather not have much to do with Windows. Before I get into the hardware side (I'm a fan of AMD, if this matters) I wanna know...is there a linux build that's easy to use and will run games as easily (after setting the OS up) as Windows in the install and play manner I've become accustomed to? I understand I'll have to Wine somethings (right?) and I have no issue if that's a relatively straight forward process.

I'm not afraid of learning a new OS, I'm afraid I'm going to have to 'hack in' half the games I play.

Oh, and does linux gag to either xbox one or ps4 controllers specifically or are they both about equal compatibility with linux?

I tossed SteamOS on one of my partitions to test drive it, and ran into a couple issues, some expected, some not. Machine is a 2500K @ 4.5GHz, and an R9-290.

Xbox 360 controller compatability: Great.
Xbox one controller wireless compatability: Not gonna happen. They didn't even work on Windows 7 wirelessly until very recently.
Sound: Integrated sound or GPU sound work fine, I have an Asus sound card that doesn't.
Wine is either relatively straightforward, or won't support the thing you're trying to do at all. It can be frustrating overall.
ATI/AMD GPU performance on Linux is unpredictable at best, garbage at worst.
Some recent games (XCOM 2 for one) do not support any GPUs except nVidia on linux.
Performance is worse overall, and an AMD CPU is shooting yourself in the foot for gaming.

You're probably better off running Linux Mint or Ubuntu or something and installing steam on that rather than running SteamOS. I wouldn't consider a Linux-only system as a gaming platform unless all of the games you play have a Linux version. You'll save yourself a lot of pain if you get an nVidia GPU for gaming on linux: http://www.phoronix.com/scan.php?page=article&item=amd-r9-fury&num=1

I'm really optimistic about Linux gaming, but it's definitely in cowboy territory right now unless you only want to play Dota 2 and/or CS: GO, both of which work flawlessly.

# ¿ Feb 10, 2016 17:18

Twerk from Home: Jan 17, 2009; This avatar brought to you by the 'save our dead gay forums' foundation.

Verge posted:

Looks like Linux isn't for me then. Well, drat. Thanks for being honest and not fanboys, guys. That's...nonexistant anywhere else, haha.

It's probably getting better in a year or two, especially if Linux releases of AAA games keeps gaining momentum.

# ¿ Feb 11, 2016 02:14

Twerk from Home: Jan 17, 2009; This avatar brought to you by the 'save our dead gay forums' foundation.

What's a lightweight monitoring tool that could be used to see CPU and memory usage across 5-10 linux machines? I've seen Nagios, Shinken and Sensu but they all look like massive overkill. I don't even need historical graphs, just current memory consumption. This is for monitoring a couple of servers running batch jobs doing computational genetics work. Just need to see at a glance which are close to full and which have room for another 60GB of RAM and 20 threads of work. Minimal setup and configuration would be a huge plus, and having a local agent on each monitored machine would be fine.

Twerk from Home fucked around with this message at 01:22 on Mar 25, 2016

# ¿ Mar 25, 2016 01:19

Twerk from Home: Jan 17, 2009; This avatar brought to you by the 'save our dead gay forums' foundation.

evol262 posted:

First off, your batch scheduler should be able to manage that. Seriously consider Mesos if you're not using it.

Secondly, what kind of interface do you want?

This is for a university research lab with moderately technical users who just want to be able to run their python scripts and specific applications without too much fuss. I was picturing just a simple web UI dashboard that showed all the machines and their current status. There's no batch scheduler right now, people are just manually kicking off things that run for days. I'm checking out Mesos now. Out of the box the applications aren't super friendly to distributed computing, its stuff like Impute2 that scales well across a single multi-socket machine but would take some additional setup to span across multiple machines, and a lot of python scripts holding calls to such programs together and translating data formats.

I'm trying to justify buying a couple of <=256GB RAM machines rather than another huge one, because right now the research lab is using a single 80 logical core, 384GB RAM server for all its computing needs, but the RAM is starting to be a major bottleneck. Budget is tight and it would be nice to save money by buying multiple cheaper machines instead of a single 1.5TB or 768GB machine, especially because no one job needs more than 100GB of RAM ever. Thanks for the input!

# ¿ Mar 25, 2016 03:08

Twerk from Home: Jan 17, 2009; This avatar brought to you by the 'save our dead gay forums' foundation.

atomicthumbs posted:

two questions about Fedora 23

is it possible to install FGLRX or otherwise use my Radeon 7870 to run OpenCL? I've tried twice (second time involved downgrading the X server and patching the driver) and both times ended up with nothing but an installation that only boots to a black screen with a blinking cursor.

is there any fix for the issue that causes an Intel 7260 WiFi card to stop responding completely after a short period of using it, and refuse to work until a restart? I found a bug report on kernel.org about it, but all it had were a bunch of workarounds that didn't work before they marked it "CLOSED WILL_NOT_FIX".

This should help with #1: https://bluehatrecord.wordpress.com/2016/03/25/installing-the-proprietary-amd-crimson-driver-on-fedora-23-with-linux-kernel-4-4-6-300/

It's really messy getting modern Radeon DRM drivers into Fedora 23. You're doing the recommended way by downgrading X.

# ¿ Apr 1, 2016 21:49

Twerk from Home: Jan 17, 2009; This avatar brought to you by the 'save our dead gay forums' foundation.

atomicthumbs posted:

Those are the directions I followed and they did not work. I'm just going to install Ubuntu.

For what it's worth, I had a similar experience and then went to CentOS.

# ¿ Apr 2, 2016 00:38

Twerk from Home: Jan 17, 2009; This avatar brought to you by the 'save our dead gay forums' foundation.

Takes No Damage posted:

So my mom's PC 'upgraded' itself to Win10 while she was out of town and now she's pissed because everything's moved around and some of her old programs don't work anymore. Faced with either learning how to hack around with Win10 myself or figuring out how to roll back to Win7 and stay there, I'm considering giving a 'Nix a try.

Is Ubuntu still the most GUI-centric distro that someone only familiar with Windows would be able to stumble around in? 90% of her PC use is Facebook, office apps and burning CDs for church choir anyway so anything with a 'Start Menu' should be OK.

This sounds like a really bad idea, if I were in your shoes I'd pick between getting her set up on 7 permanently or getting office / her CD burner set up on 10. If she has programs that don't work on Windows 10, they're very unlikely to work well on modern Ubuntu.

# ¿ Apr 27, 2016 21:50

Twerk from Home: Jan 17, 2009; This avatar brought to you by the 'save our dead gay forums' foundation.

RFC2324 posted:

CentOS/RHEL.

Seconded. You want 10 year support cycles and attention to security? You got it.

# ¿ May 3, 2016 20:18

Twerk from Home: Jan 17, 2009; This avatar brought to you by the 'save our dead gay forums' foundation.

fuf posted:

yup hacked as gently caress mates. Thanks.

ugh why did I agree to host wordpress sites?

Two totally unrelated questions:

1) What's a good book on linux security?
2) If you had to host wordpress sites, how would you do it? Docker for each separate site or something?

I'm gonna further agree with evol262 that you don't want to try and manage a bunch of Docker containers without using something for container orchestration. We use Kubernetes here, but that's quickly getting to a whole other level of complexity.

Best of luck. Be really careful about what plugins you use, disable all features you don't need, and get SELinux set up.

# ¿ Jul 6, 2016 16:12

Twerk from Home: Jan 17, 2009; This avatar brought to you by the 'save our dead gay forums' foundation.

Dancer posted:

So I'm a molecular biology student, about to do some bioinformatics courses, and I need to be running Linux for them. During a bioinfo project I did a while back, the university computers I used were running Red Hat, but I just looked and you need to pay to use that long term. On their website I also found this thing called Fedora that's free. Should I assume that that's similar to "proper" Red Hat? Alternatively, any other distro you guys would recommend for scientific purposes (for example, I do intend to work with some rather large datasets so it would be nice if the OS itself wasn't bloated)?

Fedora has more experimental features and will make more radical changes. You probably don't want to run Fedora unless you want to upgrade your OS every 6 months. I'd suggest CentOS, which is just like Red Hat but free and you do your own support.

My anecdotal evidence is that my wife's bioinformatics lab all runs some flavor of SUSE Linux Enterprise and I have no idea why. Really, if you want the OS to get out of the way and be long-term stable with good package updates, pick between CentOS and Debian.

edit: to agree with what evol said, the kind of in-memory dataset analysis bioinformatics work I've seen done wants 256GB RAM at an absolute minimum with a comfortable number more like 768GB or 1TB. OS overhead isn't something worth worrying about.
VVVVV

Twerk from Home fucked around with this message at 16:18 on Jul 12, 2016

# ¿ Jul 12, 2016 15:57

Twerk from Home: Jan 17, 2009; This avatar brought to you by the 'save our dead gay forums' foundation.

If somebody here is looking for a somewhat fiddly project that actually has benefits, running ZFS on an Ubuntu root both gets you some real benefit and has that old-timey Linux feel, getting your hands dirty in disk sectors and partition layouts.

https://github.com/zfsonlinux/zfs/wiki/Ubuntu-16.04-Root-on-ZFS

# ¿ Sep 21, 2016 23:13

Twerk from Home: Jan 17, 2009; This avatar brought to you by the 'save our dead gay forums' foundation.

That's a pretty drat big downside, especially because I've seen a server with 300MB free in a 27 Terabyte hardware RAID recently, and ext4 managed to recover well from that.

Edit: It looks like you can use quotas to protect users from themselves, but that's yet another thing to worry about.

# ¿ Sep 22, 2016 05:25

Twerk from Home: Jan 17, 2009; This avatar brought to you by the 'save our dead gay forums' foundation.

Odette posted:

I don't understand why laptop manufacturers *still* sell laptops with that screen resolution. Surely 1080p panels are cheaper than 1366x768 panels by now?

God no, and a growing segment of the laptop market is 11"-13" Chromebooks for $200 or less. When push comes to shove, I'd put having 4GB of RAM over having a 1080p display, or even having an x86 CPU. The $150 chromebooks with an ARM chip, 1366x768 display, and 4GB of RAM are pretty usable machines!

# ¿ Nov 23, 2016 19:39

Twerk from Home: Jan 17, 2009; This avatar brought to you by the 'save our dead gay forums' foundation.

RFC2324 posted:

Pretty sure it can still run a 64bit OS, but I could be wrong.

That said, I'm just derailing your issue, it shouldn't be as a result of running a 32bit OS(since its a 32bit application).

I had a first gen eeePC, before they were available with Windows. Dothan based 700mhz pentium m, hoo boy that thing was slow.

# ¿ Dec 12, 2016 00:20

Twerk from Home: Jan 17, 2009; This avatar brought to you by the 'save our dead gay forums' foundation.

Horse Clocks posted:

I'm looking at building a new machine, and want to get on the GPU passthrough bandwagon for gaming.

Is there any hardware to avoid, and anything to actively go for?

You probably want an Intel CPU and an AMD GPU for the easiest GPU passthrough experience.

# ¿ Aug 17, 2017 22:29

Twerk from Home: Jan 17, 2009; This avatar brought to you by the 'save our dead gay forums' foundation.

I got roped into troubleshooting a machine that has 12x10TB drives in a hardware RAID-6. They've got a single ext4 filesystem set up on it, and it's 99% full right now. There's still 1.5TB of free space in absolute terms, but are complaining about writes failing.

I'm assuming that they've fragmented the filesystem all to hell by running it so full, but I also am not very familiar with volumes this large. Is percentage free or absolute amount free the right thing to be worrying about? Is this filesystem likely to need some manual help to recover to a good state other than freeing up some space?

# ¿ Oct 7, 2019 17:05

Twerk from Home: Jan 17, 2009; This avatar brought to you by the 'save our dead gay forums' foundation.

Computer viking posted:

Ah, backups. That would be nice.

(100TB of human genetics research stuff on a tiny budget with a "fire you first, ask later" approach to cloud solutions. At least I have my clearly worded feelings about the arrangement in writing - and the most irreproducible parts sneakernetted onto one other machine. And promises we'll set up a hot spare machine or something "soon".)

Haha this is literally the same thing that I asked a question about, as a 12x10TB hardware RAID6 array in a single box.

They've got 4 more 12x12TB boxes coming and are intending to set up GlusterFS shortly, though.

Edit: They have no backups, not even tape. A bunch of the important stuff is also on the PI's old USB Drobo stuffed full of 8TB drives, though

Twerk from Home fucked around with this message at 17:59 on Oct 9, 2019

# ¿ Oct 9, 2019 17:55

Twerk from Home: Jan 17, 2009; This avatar brought to you by the 'save our dead gay forums' foundation.

What's the least bad solution for huge file transfers over high bandwidth high latency WAN connections? Must be encrypted of course.

Ssh and anything tunneled through ssh is still dog slow if latency is high. Webdav? Implicit ftps? Usage case is gigabit+ connections on each end but >100ms latency.

# ¿ Dec 28, 2019 06:02

Twerk from Home: Jan 17, 2009; This avatar brought to you by the 'save our dead gay forums' foundation.

Gajarga posted:

How huge, how many files, how often?

10s to 100s of GB, could obviously be chunked if necessary, and I'm looking to automate some batch processing jobs with compute far away from the storage. Don't want to deal with keeping storage in two places in sync.

Usage case is a proof of concept that an academic lab could use cheap cloud or VPS compute without using expensive cloud storage and egress. High level idea is spin up instance, upload huge input files, do work, download small results. Upload is free with most compute vendors and we've got multi gigabit upload speed.

# ¿ Dec 28, 2019 18:08

Twerk from Home: Jan 17, 2009; This avatar brought to you by the 'save our dead gay forums' foundation.

nem posted:

Since we�re on filesystems what�s your take on Ceph/GlusterFS/OCFS/GFS in 2020?

I've been scrutinizing the Red Hat docs, and I'm going to guess that part of the reason they bought Ceph was that Gluster doesn't sufficiently take care of storage for openstack, but Red Hat also still tells you to prefer Gluster whenever you need a filesystem. CephFS still seems immature and not performant.

Ceph object seems pretty great. I also remember reading that Digital Ocean is using Ceph for I believe pretty much everything, so I guess both block and object.

# ¿ Jan 19, 2020 19:04

Twerk from Home: Jan 17, 2009; This avatar brought to you by the 'save our dead gay forums' foundation.

On the national lab topic, it looks like CERN is running Ceph on their compute nodes:

https://www.msi.umn.edu/sites/default/files/Pablo_CERN.pdf

University of Minnesota seems to host supercomputing focused events: https://www.msi.umn.edu/ceph-hpc-environments-sc19

# ¿ Jan 19, 2020 19:23

Twerk from Home: Jan 17, 2009; This avatar brought to you by the 'save our dead gay forums' foundation.

I know that this is a complicated question with lots of moving parts, but I'm in way over my head and could use some pros / cons of running with no swap on systems with a LOT of RAM.

In a university research lab, we've got researchers running tasks without any job queue right now. We're working to set something up, but the status quo is users directly running tasks that might consume enormous resources. The nodes have more RAM than disk in them, typically 1TB of RAM and either 256GB of 512GB boot disk, with capacity storage accessible only over the network.

Is there ever a situation here where we would want to allow swap? How would you even size a swapfile or partition for a machine with 1TB of RAM? They were running with the default Ubuntu 2GB swap, and after seeing them swap under load and grind to a halt before things get OOMKilled, I got rid of any swap in the hope that misbehaving tasks will get OOMKilled faster.

I know there's better solutions here, like containers, cgroups in general, or a job queue like Slurm, but I'm just looking for a quick swap yes/no at the moment while we work towards setting up Slurm. For what it's worth, the OOMkiller has done a really great job so far of managing to kill the runaway process and not important system processes!

# ¿ Sep 14, 2021 02:20

Twerk from Home: Jan 17, 2009; This avatar brought to you by the 'save our dead gay forums' foundation.

Thanks for the quick responses guys, I really appreciate it.

While I'm here, the lab has been sychronizing UID & GID across the servers and allowing local login with the same credentials by copying the same /etc/passwd across all machines, is there any reason that's a bad idea?

# ¿ Sep 14, 2021 03:05

Twerk from Home: Jan 17, 2009; This avatar brought to you by the 'save our dead gay forums' foundation.

xzzy posted:

As long as you can live with any local additions will getting overwritten it's fine. It's an old school approach but it works. Configuration management tools such as puppet or ansible are the more modern ways to do that sort of enforcement and make exceptions more manageable.. but getting a the infrastructure set up is a Big Job.

I'm using Ansible for everything that I've changed since I got here, and hope to get everything in Ansible eventually, but I haven't though about how to manage local accounts with it.

# ¿ Sep 14, 2021 03:20

Twerk from Home: Jan 17, 2009; This avatar brought to you by the 'save our dead gay forums' foundation.

BlankSystemDaemon posted:

NIS (formerly YP, short for YellowPages) is a directory service that was created for exactly this purpose. Since then, Kerberos and LDAP have mostly replaced it, and I'd really recommend using that over relying on ansible for it.
Fun fact, in order to use NFS in v4-only mode, you have to syncronize UID and GID.

Are you suggesting NIS or Kerberos/LDAP?

I do think we're going to want a real solution for this, but lightweight is the name of the game. We have more nodes than users, about 10 users and 20 huge nodes, and my primary job focus is actually software development for this group, I'm just wearing another at as a sysadmin because they have none.

Before I joined last month it was exclusively admin-ed by postdocs who did their best despite little experience in the area.

# ¿ Sep 14, 2021 16:44

Twerk from Home: Jan 17, 2009; This avatar brought to you by the 'save our dead gay forums' foundation.

BlankSystemDaemon posted:

Lightweight Directory Access Protocol it is, then!
With how easy it is on FreeBSD I shouldn't wonder if it's about the same on Linux nowadays.

A lot of places do ad-hoc sysadmining and get away with it only because things haven't gone wrong yet.

They are / were in extreme danger of catastrophic failure. On my first day at this job, I found that:

There was 1.3PB worth of disk across many nodes sitting on the floor of an office that nobody knew how to get into the datacenter, installed, and provisioned
The single node RAID6 that they kept everyone's home directories on had 2 failed disks that had been physically replaced, but nobody told the RAID controller to start a rebuild so they were unconfigured and it was degraded with no parity
Another shared filesystem was at 99% full with 100TB on a single filesystem
Nobody was patching, uptime in the years without kernel livepatching.

I knew it was like this came into it eyes fully open and aware of problems of this level. I've already made significant progress towards a more stable computing environment, and hope to spend most of my time productionizing software rather than janitoring Linux soon.

Edit: I've dealt with slapd before, but not super recently so I'll have to re-learn a bit. Thanks!

# ¿ Sep 14, 2021 17:27

Twerk from Home: Jan 17, 2009; This avatar brought to you by the 'save our dead gay forums' foundation.

Antigravitas posted:

Well, I'm someone who does a lot of storage, so I can confidently say AAAAAAAAAAAAAAAAAAAAAAAAAAAAAHHHHHHHHH

Have you heard of your lord and saviour, ZFS, by the way?

I'm planning to use ZFS on a single large box for disaster recovery continuity in an offsite datacenter. How big a box have you dealt with on a single node? My current plan for that single offsite DR node is a 36 bay box full of 16TB drives, targeting somewhere around 400TB usable space. Napkin math with 4vdevs of 9-drive wide z2s says 448TB, but I don't know how acceptable that's going to be for big continuous writes.

For primary storage I'm currently setting up Ceph, with a replicated pool on SSDs for home directories and CephFS filesystem metadata, and erasure coding on HDD for bulk storage. The name of the game is low IOPS capacity storage, it's a human genetics research lab.

# ¿ Sep 14, 2021 17:47

Twerk from Home: Jan 17, 2009; This avatar brought to you by the 'save our dead gay forums' foundation.

BlankSystemDaemon posted:

The real fun starts when you've got over one exabyte on a single system in one rack.

Supermicro had a 90 bay disk shelf that they've unfortunately discontinued which could be combined with a 36bay chassis that could fit a motherboard and 4 SAS HBAs.
11 of the disk shelves and one of those system chassis, and you could fit over 1000 disks in one rack. Then it's just a question of adding 78 vdevs that are each 14 disks wide (probably with draid, nowdays), and all you'd need are 2TB disks in order to reach 1 ExaByte.

Unfortunately the new 90 bay chassis that Supermicro have are top-loading, which means every single time you need to service one of those 90 disks, you're moving 90 disks at a time, instead of just two at a time with the old design.

EDIT: I just did some napkin math, and if I'm not mistaken, using 14TB disks that are available, you could reach just under 10EiB in a single rack nowadays.

My gut says that something like that would be wildly impractical and awful to actually try and use, just because it'd be very difficult to load data onto / off of all of those at any point, and I bet you'd run into weird bottlenecks. I'm working to keep single node size reasonable and avoid a setup like that, am I crazy?

Especially because we have a ton of 10GbaseT networking available but very little higher speed, the whole thing is being designed around each node only having 2x10G connections aggregated.

I'm not following your math, ~1000 14TB disks in one drive would be ~14PB in a rack, right? And an exabyte is 1000PB?

Twerk from Home fucked around with this message at 18:10 on Sep 14, 2021

# ¿ Sep 14, 2021 18:08

Twerk from Home: Jan 17, 2009; This avatar brought to you by the 'save our dead gay forums' foundation.

Antigravitas posted:

That's almost textbook tbh.

Though with ZFS you kind of want it on both ends because ZFS send is straight up revolutionary for backups. We had some datasets where if someone changed a file somewhere rsync would run for >24h. But now that we are ZFS everywhere our backup jobs run in seconds most of the time. Plus, because we have snapshots enabled, errant scripts can be recovered from either by our users (by copying from the .zfs/snapshots/ subfolders) or in extreme cases by us rolling back the file system without having to go to backup. And in a DR scenario crazy folder structures with tiny little files make no difference, because ZFS send doesn't need to walk the file system. Imagine transferring a file system with two million tiny files at link speed.

And drat, lz4 compression is magic. On some of our more Matlab-ish datasets we get hilarious compression ratios thanks to zero-padded nonsense, all while being faster than without compression. It's like eating your cake and keeping multiple copies of it.

Also, a single 9-wide RAIDz2 vdev of 16TB disks is going to saturate a 10Gbps link as long as those reads or writes are async and reasonably chunky.

It may not come across properly in my post, but I love working with this file system.

What does a ZFS HA solution look like? Two servers attached to the same set of disks with dual SAS expanders, one as a standby?

When evaluating filesystem level compression, it looked like for our usage it didn't make sense specifically because these datasets (human genetics sequencing, genotyping, and variant call data) is all so compressible that it's only handled in compressed formats anyway, and handling them uncompressed with filesystem-level compression would just result in bottlenecking on network more than disk because of what high compression ratios are achievable.

# ¿ Sep 14, 2021 22:45

Twerk from Home: Jan 17, 2009; This avatar brought to you by the 'save our dead gay forums' foundation.

BlankSystemDaemon posted:

That post 100% reminds me of some of the war stories I've heard over drinks, it's all kinds of yikes in some places.

I didn't include this originally because it's the dumbest thing I've seen in quite some time, but I noticed while patching these poor things for the first time in forever that one node was taking an eternity to patch a relatively minimal Ubuntu 18.04 install, so I check out what was going on. High iowait, its boot disk seems pretty unhappy. Let's check what's in there:

code:

smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.15.0-156-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, [url]www.smartmontools.org[/url]

=== START OF INFORMATION SECTION ===
Device Model:     Hitachi HTE723225A7A364
Serial Number:    REDACTED
LU WWN Device Id: 5 000cca 61cccce30
Firmware Version: ECBOA70B
User Capacity:    250,059,350,016 bytes [250 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Form Factor:      2.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ATA8-ACS T13/1699-D revision 6
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Wed Sep 15 12:45:31 2021 CDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

Well, it's booting off of a SATA 2 laptop hard disk, specifically a Hitatchi Travelstar from 2010. I have no idea what the gently caress happened, but I bet there's a story here.

# ¿ Sep 15, 2021 18:51

Twerk from Home: Jan 17, 2009; This avatar brought to you by the 'save our dead gay forums' foundation.

I'm back with more questions since you're all so helpful.

I've got a prebuilt binary that depends on the GNU Scientific Library, which is distributed in Ubuntu 18.04 and 20.04 as libgsl23. However, I noticed that the libgsl23 doesn't exist in 20.20 and 21.04 because it has a newer version of the library package, libgsl25, so I'm assuming that when 22.04 rolls around it'll have libgsl25 and not libgsl23.

What happens to versioned library packages like this on moving to a newer distro? Will having libgsl23 installed automatically become libgsl25? Is there some virtual or metapackage that can install libgsl without needing to specify a version in the package name? I'm not planning to run any of the non-LTS Ubuntus so I don't have to do anything right now, but this is a pattern that I'd like to better understand and solve.

# ¿ Sep 16, 2021 01:47

Twerk from Home: Jan 17, 2009; This avatar brought to you by the 'save our dead gay forums' foundation.

drunken officeparty posted:

Hi again, it’s me the baby with the textbook from 2008. I’m having trouble understanding permissions so I’ll ask here.

So you’ve got the owner, groups, and others. First off what does others even encompass? And groups don’t make sense to me either.

Say I’ve got
Bob - Admin, can do everything
Sally - In group Accounting
Sam - Accounting
Jeremy (we hate jeremy) - Can only see his own stuff and he should be thankful we don’t just fire him
Carol - HR person tasked with monitoring Jeremy

How would it be set up that Bob can do everything to everyone, Sally and Sam can do each others /home but not /jeremy, Jeremy can only do /jeremy, and Carol can do /jeremy but not /sally or /sam?

So, "others" is any user on the machine, regardless if group membership.

Groups are kind a broad simple access control solution, and modern real world systems will probably be using ACLs abd SELinux or some other more complicated access control tool.

For your problem, you could have an accounting group, a group to manage Jeremy, and Bob could either be a member of all groups or be a sudoer, which would let him become other users or just be the root user that can do anything.

If you want some more info about how this works in a more real world example, here's a bit about ACLs: https://www.redhat.com/sysadmin/linux-access-control-lists

# ¿ Oct 5, 2021 02:16

Twerk from Home: Jan 17, 2009; This avatar brought to you by the 'save our dead gay forums' foundation.

Is there a reason why the /dev/shm mount isn't in fstab on default Ubuntu and Debian configs? Is it always set to half of RAM, and is there any good reason for that?

I stumbled on this when like a moron I accidentally did a umount -a when I meant to mount -a, and then /dev/shm was not remounted when I did mount -a. I'd appreciate a quick blurb about the history of /dev/shm and why we don't have tmpfs configuration info in /etc/fstab.

# ¿ Oct 13, 2021 17:35

Twerk from Home: Jan 17, 2009; This avatar brought to you by the 'save our dead gay forums' foundation.

ExcessBLarg! posted:

These days it's automounted by systemd. I assume you can put an entry in /etc/fstab for /dev/shm and control options for it--if you really want--but systemd mounts it even without an /etc/fstab entry since some (essential?) userspace programs are dependent on /dev/shm existing and it would be bad to break those because the configuration doesn't exist.

So that if something accidentally fills /dev/shm it doesn't fill all your RAM.

If you mean, "why does half my RAM go to /dev/shm?" that doesn't actually happen. tmpfs-based file systems allocate pages on demand, so they take up very little memory when not actually used.

Thanks so much for this. I'm assuming that I could have asked systemd to remount it after my mistaken umount -a , rather than trying to remount it via mount -a?

I noticed that Python multiprocessing couldn't work without /dev/shm, all sorts of userspace applications put flags there to communicate.

Are tmpfs filesystems formatted with ext4 / xfs / distro appropriate default filesystems, or are they using something that's better suited for in-memory filesystems somehow?

# ¿ Oct 13, 2021 18:56

Twerk from Home: Jan 17, 2009; This avatar brought to you by the 'save our dead gay forums' foundation.

I'm setting up SLURM, which requires users and groups to be synchronized across the cluster, which I've found by testing means entirely synchronized, no machine can have users that the other machine does not even if all of the users that Slurm is using are present on all machines. I discovered this when my test cluster had postfix on one instance and otherwise identical /etc/passwd files, and SLURM couldn't connect internally until the postfix user existed on all machines. What's a good way to have some nodes be login nodes that actual end users are able to login to, but still have identical users / groups across the whole cluster? Just have some users use /sbin/nologin on some nodes? I guess i'll just maintain two /etc/passwd files via ansible, and distribute them based on role.

Edit: I think that I solved the problem below, but I need to know why and how to fix it: Upgrading from Ubuntu 18.04 to 20.04 renamed the logical name of the onboard Intel X722 NIC from enp96s0f0 to eno0 and disabled it. It looks like I can hack around this by having netplan rename the interface at startup, but that feels wrong.

Also, on an entirely different subject, I've got troubles with the integrated Intel X722 + X557 on my Supermicro boards. Upon upgrading from Ubuntu 18.04 to 20.04, one of the adapters has disappeared on two of my servers. I'm holding off any more updates until I figure out what is going on, because this seems bad. Everything is still working fine because I had 802.3ad link bonding set up and the remaining adapter is taking all the traffic, but it still feels bad and I want to figure this out.

They show up in lspci still, but do not appear at all in ip address or ifconfig, they just do not exist. I did ask the networking guys about it, who told me that this specific connection had been bouncing occasionally for about 2 years, and we'd just never noticed because of the NIC teaming.

My checklist for tomorrow based on some light googling today is to update BMC firmware, and then to try installing drivers straight from Intel here, which I'm not really looking forward to because they will not be automatically updated and I've never had to use non-free drivers before for NICs.

Has anybody had a bad time with Intel 10GBaseT NICs? Is having to grab drivers straight from Intel normal? Here's the situation I'm dealing with. I've got two machines now in this state, and others still on 18.04 that are working fine and have enp96s0f0 up and taking traffic in the bond0:

code:

$ lspci | grep X7
60:00.0 Ethernet controller: Intel Corporation Ethernet Connection X722 for 10GBASE-T (rev 09)
60:00.1 Ethernet controller: Intel Corporation Ethernet Connection X722 for 10GBASE-T (rev 09)

$ ethtool enp96s0f0
Settings for enp96s0f0:
Cannot get device settings: No such device
Cannot get wake-on-lan settings: No such device
Cannot get message level: No such device
Cannot get link status: No such device

$ ethtool enp96s0f1
Settings for enp96s0f1:
	Supported ports: [ TP ]
	Supported link modes:   1000baseT/Full 
	                        10000baseT/Full 
	Supported pause frame use: Symmetric Receive-only
	Supports auto-negotiation: Yes
	Supported FEC modes: Not reported
	Advertised link modes:  1000baseT/Full 
	                        10000baseT/Full 
	Advertised pause frame use: No
	Advertised auto-negotiation: Yes
	Advertised FEC modes: Not reported
	Speed: 10000Mb/s
	Duplex: Full
	Port: Twisted Pair
	PHYAD: 0
	Transceiver: internal
	Auto-negotiation: on
	MDI-X: Unknown
Cannot get wake-on-lan settings: Operation not permitted
	Current message level: 0x00000007 (7)
			       drv probe link
	Link detected: yes

$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eno0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether $MAC brd ff:ff:ff:ff:ff:ff
3: ens1f0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether $MAC brd ff:ff:ff:ff:ff:ff
4: enp96s0f1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP group default qlen 1000
    link/ether $MAC brd ff:ff:ff:ff:ff:ff
5: ens1f1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether $MAC brd ff:ff:ff:ff:ff:ff
6: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether $MAC brd ff:ff:ff:ff:ff:ff
    inet $IP brd $IP scope global bond0
       valid_lft forever preferred_lft forever
    inet6 $IPV6 scope link 
       valid_lft forever preferred_lft forever

Twerk from Home fucked around with this message at 02:45 on Oct 15, 2021

# ¿ Oct 15, 2021 02:35

Twerk from Home: Jan 17, 2009; This avatar brought to you by the 'save our dead gay forums' foundation.

I found my smoking gun on the network interfaces changing names after upgrading from Ubuntu 18.04 to 20.04! Next question is why.

pre:

dmesg output on 18.04:
1672:[    3.504445] i40e 0000:60:00.0 enp96s0f0: renamed from eth0

dmesg output on 20.04:
1688:[   10.598038] i40e 0000:60:00.0 eno0: renamed from eth0

I'm not very familiar with dmesg output, is i40e the name of the driver that is doing this? The kernel upgrade brought a newer default kernel driver for the Intel NICs, like so:

pre:

$ ethtool -i  enp96s0f0
driver: i40e
version: 2.1.14-k
firmware-version: 3.33 0x80000e48 1.1876.0

$ ethtool -i eno0
driver: i40e
version: 2.8.20-k
firmware-version: 3.33 0x80000e48 1.1876.0

# ¿ Oct 15, 2021 15:44

Adbot: ADBOT LOVES YOU

# ¿ Apr 27, 2024 03:27

Twerk from Home: Jan 17, 2009; This avatar brought to you by the 'save our dead gay forums' foundation.

ExcessBLarg! posted:

"eno#" are the names of the on-board (as in, soldered to your motherboard) network adapters, while "enp#s#f#" refers to devices on the PCI bus. Of course, on-board devices are also on the PCI bus, leaving it up to the device driver to determine whether it's "on-board" or not. Presumably the i40e driver updated some device list and now acknowledges that it's an on-board adapter.

Those bastards, both ports are onboard, it's 2 ports on the same device. What a weird state to decide that one of the X722 ports is onboard and the other one is on the PCI bus when one is at 60:00.0 and one at 60:00.1.

# ¿ Oct 15, 2021 19:13

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Linux Questions Thread: a bunch of pitfalls, but technically it's possible

«‹›3 »