BOINC/Grid computing: Numbers Go Up, Hardware Edition

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > BOINC/Grid computing: Numbers Go Up, Hardware Edition

«‹›13 »

mdxi: Mar 13, 2006; to JERK OFF is to be close to GOD... only with SPURTING

There will be at least one more beta for OPNG, likely happening this afternoon (US time). Also, it looks like the OPNG points issue is finally getting sorted out: they will be scaled to 20X compared to CPU points, to account for the vastly shorter runtimes. I'm in the "who cares; we're doing science" camp on this, but there has been a vocal "why am i not being awarded points" camp as well, all the way through the series of betas.

I admit that I've been confused at the amount of noise people have made about this. I know that people enjoy the gamification aspect (Exhibit A: the title of this thread), but points are the single least useful metric there is. Pretty much everyone awards badges based on runtime, and WCG's points are particularly meaningless, being scaled approximately 7X from every other BOINC project in existence, so you can't even use them to talk about cross-project performance without doing math.

Edit: This ended up not being done as a separate beta run, but just as a new batch of WUs for the existing run. Currently running are batches of WUs which have been generated from already complete CPU WUs, to ensure that the results are approximately identical.

I'm personally happy about this, because I have unbroken my AMD cards by installing the OpenCL portions of the AMDGPU-PRO drivers on them -- this being OCL 1.2 compliant, in contrast to the Mesa OCL driver. I didn't want the hassle, but my desire to crunch more science won out after a half-day of arguing with myself about it. I've got a really simple bash script that I'd be happy to share in the incredibly unlikely event that anyone else has also had this problem.

Edit 2: Really interesting post from a WCG admin on how they build and "score" WUs for this project:

quote:

For a CPU work unit, we estimate they can run X jobs based on what each job has inside of it. This is based off how many atoms are in a given ligand.

( 0.0000000122 * Atoms^2 + 0.0000000751 * Atoms + 0.0000105946 ) * ga_num_evals * ga_run = how long we estimate it'll take for an average cpu.

Each job has a different number of atoms and structure, which changes the equation by evals being different and higher generally with more atoms in a ligand. This is 100% just an estimate but gets us a pretty good average runtime on similar processors.

When a work unit is created, we package multiple jobs together or split them up based on how difficult they are. We try to target say 3 hours per CPU work unit. For the GPU version, we create them with 20 times the difficulty as CPU version. These are split the exact same way, thus they get 20 times more points because they were originally created 20 times harder.

If we ran one of the GPU work units on CPU, it would on average take them 60 hours to complete the same task.

We also learned today that OPN WUs "short circuit", in much the same way as how a chain of 'or' statements short-circuit by halting evaluation as soon as a single true value is found: when OPN(G) WUs find a good match on a ligand (based on some threshold), they halt work and declare themselves complete rather than exhaustively testing all possible values.

mdxi fucked around with this message at 06:56 on Mar 31, 2021

# ? Mar 30, 2021 18:53

Adbot: ADBOT LOVES YOU

# ? Apr 24, 2024 02:31

mdxi: Mar 13, 2006; to JERK OFF is to be close to GOD... only with SPURTING

WGC OPNG progress is moving really fast now, after a glacially slow bring-up, and cautious early beta period. From one of the admins this afternoon, after yet another batch of beta WUs:

quote:

We are having some virtual high fives here in communications with the researchers. They have some additional checks they want to look at, but currently we are getting really good results and things look awesome.

With this latest beta batch, I have changed the assimilator to be production ready with how it packages results for the researchers.

So next steps in my mind are this:

1. Eat pizza for lunch
2. Send additional results back to researchers for validation (tomorrow)
3. Await final thumbs up from researchers (Hopefully we'll have this by Friday)
4. Build batches and upload 7.28 to opng (Tomorrow)
5. Perform other final checklists (ongoing)
6. Have go/no go conversation with the researchers (TBD)

Then, a later update, regarding a release of WUs later tonight:

quote:

I plan on adding 10 batches just to make sure the points and everything match what I'm expecting to see when we go live during production. As far as I can tell, it is, but to be 100% sure instead of 99%, I'm running these 10 extra batches. These are going to be the last 10 batches for beta as I do not plan on running any more.

So if this set looks good, OPNG beta is in the bag.

# ? Apr 1, 2021 01:39

Vir: Dec 14, 2007; Does it tickle when your Body Thetans flap their wings, eh Beatrice?

Will this allow MacOS and ARM Linux users to use GPUs for Open Pandemics?

# ? Apr 1, 2021 19:20

mdxi: Mar 13, 2006; to JERK OFF is to be close to GOD... only with SPURTING

Vir posted:

Will this allow MacOS and ARM Linux users to use GPUs for Open Pandemics?

There is no Mali GPU support. Nvidia, AMD, and Intel only.

People have reported both success and failure for OPNG WUs on Macbooks, so it's available -- at least for Intel GPUs. Didn't find anyone crunching on a desktop Mac in my quick search of the forums.

# ? Apr 1, 2021 23:40

Vir: Dec 14, 2007; Does it tickle when your Body Thetans flap their wings, eh Beatrice?

There are some old Mac Pros with discrete AMD GPUs in them, and even some from Nvidia before that, but Folding@Home doesn't support any of them because there was a bug in OpenCL for MacOS back when it was more worthwhile, and MacOS has deprecated OpenCL and replaced it with Apple's own Metal API.

# ? Apr 2, 2021 00:04

Binary Badger: Oct 11, 2005; Trolling Link for a decade

Curiously, Apple has kept the legacy OpenCL (v1.2) even in its latest version of macOS; they even re-wrote it to run natively under the new Apple Silicon CPUs.

Apple did this so that the current plethora of scientific software written for Intel chips would either run without modification under Rosetta, or just require a recompile with the latest Xcode with another flag set for M1 code..

I doubt you'll ever find anyone on the forums willingly running Folding@Home on a Mac because the Folding software authors are either totally uninterested in writing in old OpenCL code or feel that they don't need to give Macs GPU support, which pisses Mac users off and then they'll join a project that DOES support Apple GPUs like Einstein.. and the CPU client definitely is not really optimized on Mac.

Binary Badger fucked around with this message at 04:49 on Apr 4, 2021

# ? Apr 4, 2021 04:26

mdxi: Mar 13, 2006; to JERK OFF is to be close to GOD... only with SPURTING

OPNG is live, but rather than the flood everyone was expecting, the current WU release rate is 1700 every 30-ish minutes (there's some randomness built in to keep people from fetching WUs on a clock, because yes that is a thing that people will do -- looking at you HSTB).

Currently unknown if that's gonna increase or not.

My machines have crunched 97 so far. They're very unevenly distributed across the farm (low: 0; high: 43), as you might expect for something with limited and somewhat-irregular availability. I'll be evaluating how this goes, and turning Einstein@Home back on as a low-priority project if my GPUs have consistent downtime.

# ? Apr 7, 2021 06:31

Vir: Dec 14, 2007; Does it tickle when your Body Thetans flap their wings, eh Beatrice?

The F@H statistics page has been given a makeover. It now displays team logos for those that have it, and it has also broken the client's web frontend.

SAGoons is ranked 69, but lacks a logo: https://stats.foldingathome.org/team/150

Old page: https://statsclassic.foldingathome.org/team/150

# ? Apr 12, 2021 00:43

Rexxed: May 1, 2010; Dis is amazing!
I gotta try dis!

Vir posted:

The F@H statistics page has been given a makeover. It now displays team logos for those that have it, and it has also broken the client's web frontend.

SAGoons is ranked 69, but lacks a logo: https://stats.foldingathome.org/team/150

Old page: https://statsclassic.foldingathome.org/team/150

# ? Apr 12, 2021 01:28

Chikimiki: May 14, 2009

Vir posted:

The F@H statistics page has been given a makeover. It now displays team logos for those that have it, and it has also broken the client's web frontend.

SAGoons is ranked 69, but lacks a logo: https://stats.foldingathome.org/team/150

Old page: https://statsclassic.foldingathome.org/team/150

Oh so that's why I couldn't see my total score on the web client, I was gonna ask in this thread as to why that is :ms:

# ? Apr 12, 2021 08:21

Vir: Dec 14, 2007; Does it tickle when your Body Thetans flap their wings, eh Beatrice?

F@H has rolled back the new stats pages, because it broke the web client. This premature rollout also revealed that many third party websites were hammering the statistics API with excessive calls. F@H has only one programmer on staff, which I guess is a typical symptom of how research grants pay for PhD projects, but infrastructure and support functions are under-funded.

Here's the beta page: https://statsbeta.foldingathome.org/team/150

# ? Apr 13, 2021 09:21

mdxi: Mar 13, 2006; to JERK OFF is to be close to GOD... only with SPURTING

WCG Monthly Update - March 2021

I had been writing this up on Reddit, because Markdown is so much nicer than bbcode (can I pay the :mods:

to add this now that Lowtax is dead?). But Reddit sucks balls as a community, so I quit. But I really enjoyed doing it, so I'm putting it here now.

It was a quiet month overall, except for OpenPandemics becoming GPU-enabled.

OpenPandemics

OPN1 is now GPU-enabled (see Betas section for more info)

Africa Rainfall Project

Research team member Camille Le Coz was recently accepted as a presenter at the EGU General Assembly 2021, a virtual conference for the European Geosciences Union. The conference is currently scheduled for late April.
The project's principal investigator, Professor Nick van de Giesen, will be giving a presentation about the project on March 11 at an IBM event.

Microbiome Immunity Project

Dr. Julia Koehler Leman, one of the research team members, will be speaking about the project at Winter RosettaCon 2021, a virtual conference for users of the Rosetta biodynamics suite. [Ed: Yes, WCG's MIP1 and Rosetta@Home use the same software]
Researchers are working simultaneously on three papers that are at various stages in the creation process. One of the papers has already been submitted to an academic journal for review.

Help Stop TB
No update.

Mapping Cancer Markers

Researchers continue to process work on World Community Grid while working on a paper about lung cancer markers.

Smash Childhood Cancer
SCC is on hiatus from WCG's perspective, but researchers are working on the next set of targets, and working in the lab with proteins targeted by previous WCG work:

Beta catenin -- The research team has decided to move forward with further testing on three compounds that show promise against this protein.
Osteopontin -- Testing continues on several compounds that may be effective at targeting this protein.
PAX3:FOX01 --The researchers are conducting lab testing on a compound that may be effective at targeting this protein

Betas
This month was full of OPN GPU testing -- the first WCG project to use a GPU in several years, for a variety of reasons. OPN1 on GPU has been given the short-name OPNG, so it's easy to tell WUs apart. Here's some info from the beta:

In addition to systemic testing, ten batches of completed WUs were rebuilt as GPU WUs (batch 30010 through 30019)
On GPU, a batch took an average of 3.5 days of compute time, vs 1162 days on CPU
This represents an average speedup of 336X (max speedup for a batch was 516X, but OPN WUs exit early when they find a "good enough" match so there's no apples-to-apples comparison)
OPNG uses Autodock GPU, which uses a modified algorithm that exhibits a greater probability of finding strong interactions between the molecules and viral proteins, and is well suited to dock larger or more complex molecules.
Overall, Autodock GPU exhibited a 1.6X increase in efficiency compared to Autodock 4 (this is algorithmic efficiency, not the raw speedup from parallelization on GPU)
Future OPNG WUs will test more complex compounds, while CPU will continue to focus on the current work

mdxi fucked around with this message at 00:19 on Apr 19, 2021

# ? Apr 19, 2021 00:17

mdxi: Mar 13, 2006; to JERK OFF is to be close to GOD... only with SPURTING

OPNG stress test currently underway:

quote:

This afternoon around 18:00 UTC, we'll begin an extreme stress test of the World Community Grid infrastructure with the help of the OpenPandemics - COVID-19 research team in the Forli Lab. We're grateful for the support and hard work of the Forli Lab team in co-creating and refining this test, and we all look forward to seeing what our entire system can do.

For the purposes of this stress test, we have been given 30,000 batches of work units to run on the GPU version of OpenPandemics - COVID-19. These are real work units that will provide data for the project.

We anticipate this test will take approximately 3 days to run through the 30,000 batches that have been provided. However, the test will end as soon as the 30,000 batches have been processed, whether this processing takes less than 3 days or more than 3 days.

The stress test will involve all parts of the World Community Grid pipeline, from generating batches to post-analysis. This will help us identify bottlenecks, and see where and how we can improve. Below is an outline of the pipeline:

Researchers identify targets and/or ligands to compute

Researchers create batches of work units to be run by World Community Grid volunteers

World Community Grid downloads work units from the researchers' server

World Community Grid builds work units and load them into BOINC (Berkeley Open Infrastructure for Network Computing)

Volunteer computers and devices download these work units

Volunteer computers process the work units

Volunteer computers upload the files back to World Community Grid servers

World Community Grid validates results

World Community Grid assimilates the results

World Community Grid packages batches into tar files for researchers

World Community Grid uploads the packages to the research server

Researchers re-hydrate the results and place data into their database

Researchers perform analysis on the results

Three more important points for those who want to participate in the stress test:

After the stress test is complete, we will revert to sending out results at a pace of 2,000 work units every 30 minutes. Depending on the researchers' needs, we may modify this in the future, but for the present our plan is to continue at the 2,000 per 30-minute pace.

GPU work units for OpenPandemics - COVID-19 are designed to run on OpenCL version 1.2 and above. However, there are certain cards that still have issues due to having GPU drivers that aren't 100% compatible with OpenCL 1.2. Most of the issues are with cards that were released before 2016.

Please post any issues or questions in this thread where we can see them more easily, rather than creating new threads that may be harder for us to track.

Edit: I'm reading through the thread for good info to add to this post.

- This test is working on a specific target:

quote:

The batches for the stress test are targeting the spike protein, the most important surface protein of the virus, using a structure that was determined using cryo-electron microscopy (cryoEM) by our collaborators at the Ward lab at Scripps Research. Approximately 280 million small molecules from the ZINC database will be docked against a promising, hypothetical binding site. Our goal is to identify a few of these molecules that will bind with sufficient affinity to the spike to interfere with the replication process of SARS-CoV-2.

- When the stress test went live, 8000 WUs per minute were being created.
- There's enough traffic that WCG's LBs were dropping connections and techs were/are working on it.

That's all the news for now

mdxi fucked around with this message at 06:30 on Apr 27, 2021

# ? Apr 27, 2021 06:10

mdxi: Mar 13, 2006; to JERK OFF is to be close to GOD... only with SPURTING

Spot the GPU betas and stress test:

# ? Apr 28, 2021 03:16

mdxi: Mar 13, 2006; to JERK OFF is to be close to GOD... only with SPURTING

Pacing update on the WCG OPNG stress test:

quote:

We crunched through about 7.5k batches in the first 36 hours. We will continue with this pace until the full 30k batches have been completed.

I'm curious what they're going to do as a result of this exercise. They've already said that, initially, nothing will change. But there's no reason to do something like this other than some form of capacity planning.

# ? Apr 28, 2021 18:15

mdxi: Mar 13, 2006; to JERK OFF is to be close to GOD... only with SPURTING

I just watched Dr. Ian Cutress's Hot Chips 2021 preview and discovered that there is an ASIC designed for molecular dynamics, which is the class of software that most biomedical grid computing projects are based around (Folding, Rosetta, etc.).

Apparently they're going to be talking about their new design, the Anton 3. I don't think they sell them at all though, so even if those packages were ported to run on it it wouldn't do us any good :pseudo:

# ? May 19, 2021 02:23

Vir: Dec 14, 2007; Does it tickle when your Body Thetans flap their wings, eh Beatrice?

John Chodera from Folding@Home posted a status update about the Covid Moonshot:

quote:

To very quickly update where we are:

We have very potent lead compounds that show great antiviral activity against all SARS-CoV-2 variants of concern

The compounds appear to be very good in standard in vitro safety panels

We're working through an issue with pharmacokinetics with rodents, which have very rapid metabolisms. This is critical for drug approvals because preclinical work to identify appropriate safe doses for humans typically use rat and dog, and if you can't use rodents, you're generally forced to use primates instead, which would significantly slow down our entry into the clinic

We've been working incredibly hard to identify partners and funding mechansisms that will carry us through preclinical work (which can cost millions of dollars) into clinical trials (which cost many more millions) so that as soon as we nominate a clinical candidate, we can move it into human trials as rapidly as possible

The Folding@home sprints still continue! We're up to Sprint 8, but I haven't been releasing the dashboards publicly because we're still debugging some issues with the dashboard that lead to scrambling of the compounds in data display. As soon as we get these fixed, we'll get all of these dashboards online and make more regular announcements. Here's a (sadly scrambled) preview, though!

https://fah-public-data-covid19-moo...ined/index.html

The continued sprints will be vital for both aiding in the final replacement of problematic parts of the compound for improving rodent metabolism and for the further efforts to ensure second-generation compounds are active against multiple coronavirus variants.

# ? Jun 14, 2021 15:01

mdxi: Mar 13, 2006; to JERK OFF is to be close to GOD... only with SPURTING

After way more downtime than anticipated (one day shy of two months) due to moving and various related fuckery, I'm coming back online and I'm really happy about that.

In other news, AMD is still handing out small (480 core) EPYC clusters to support COVID research. https://www.hpcwire.com/2021/07/16/amd-donates-neowise-cluster-to-genci-inria-for-covid-19-research/

# ? Jul 17, 2021 03:42

mdxi: Mar 13, 2006; to JERK OFF is to be close to GOD... only with SPURTING

July WCG Update

A surprising amount has happened since I went on hiatus, including a very surprising (and, I believe, unprecedented) event which I'm going to lead with.

Microbiome Immunity Project

A MIP Researcher posted:

With the recent advent of new protein folding techniques based on AI and deep learning, it is now possible to compute these structures hundreds of times faster than when we started the project. We are therefore drawing the project to a close at the end of June, bridging results to new, faster techniques that can be performed on traditional high-performance compute clusters and do not need the power of World Community Grid.

We already have one scientific article published resulting from the techniques developed from this project, with a second on the way that will apply those protocols to the results that you generated. We will keep you updated on more findings as we publish them.

WCG posted:

This month, researchers are concentrating on further analysis and writing for <another paper>, which they believe will have broad appeal to other scientists working on the microbiome and in related fields.

As we announced in last month's update, the project's time on World Community Grid will be ending as soon as the current work units are completed.

This will be the final monthly update for the project, but we are keeping in touch with the research team over the coming months as their data analysis progresses. To start, the researchers are preparing a project update, which we plan to release as soon as it is complete. We will also make announcements as their papers are published or if they have other news.

MIP was far from done, and it's true that there are large-scale data analysis techniques extant today that weren't as feasible 5 years ago, but personally what I feel like the feelgood messaging from the researchers is papering over is "we can buy piles of cores for cheap now, and we don't have to use volunteers anymore". Just look at my previous post for an example. Some people are also speculating that this move has to do with the fact that WCG requires an open data commitment from researchers, but of course there's no proof behind that.

Africa Rainfall Project

WCG posted:

At the current pace, the project will be able to simulate an entire rainy season by the end of 2021. However, our tech team has determined that we can send out work more quickly.

If you're currently supporting the Africa Rainfall Project, will you consider helping to speed up the project by making a simple change to your current settings?

Research team member Camille Le Coz successfully defended her dissertation last month and is now Dr. Le Coz! We congratulate her on this achievement.

Smash Childhood Cancer

WCG posted:

Since early this year, the researchers have been analyzing data from work run on World Community Grid.

Below are the key proteins for which we have new updates this month. Each of these proteins is involved in the development of at least one type of childhood cancer.

Beta catenin: Early testing on the three compounds mentioned earlier this year is promising, particularly for two of the compounds. The researchers have begun writing about their findings. They are also considering bringing in additional collaborators to help with further testing of the compounds.

Osteopontin: A biochemist collaborator is beginning further testing on this key protein.

PRDM14: The researchers have tested 11 compounds to see if and how they might target this protein. One shows promise so far, and will be tested further.

Note: Lab testing compounds for effectiveness generally requires several phases, and each phase can take at least several months.

Help Stop TB

WCG posted:

In October 2020, the World Health Organization released the most recent global statistics on TB, including the following:

* In 2019, an estimated 10 million people contracted TB.
* 1.4 million people died from TB in 2019.
* TB remains one of the top 10 causes of death worldwide, and the leading cause from a single infectious agent (above HIV/AIDS).

Since the newest research team member joined the team permanently, the group has been able to developed a few more methods to help with their data analysis. Right now they are checking that these methods run consistently and making sure that the data is ready.

For this particular project, the researchers often need to analyze the batches we send back to them before they can build more work units. This can sometimes lead to an intermittent work unit supply.

Mapping Cancer Markers

WCG posted:

Last month, we asked current volunteers who were not already donating computing power to this project to start contributing. Thank you to everyone who responded to this request! Below are the results so far:

The number of batches processed in July (to-date) is higher than last month (1,230 as compared to 1,130).
The daily average number of batches has increased from 37.7 to 41.0.
(And this is during the summer in the northern hemisphere, when we generally see a dip in participation.)

OpenPandemics

OPN Researchers posted:

In late 2020, we announced the selection of 70 compounds (from an original group of approximately 20,000) that could be promising to be investigated as potential inhibitors of the virus that causes COVID-19. Lab testing is currently underway for some of these compounds (see the end of this report for details).

In late April and early May, we provided World Community Grid with approximately 30,000 batches of GPU work units. This was part of a stress test of the World Community Grid infrastructure and analysis work flow, and quickly generated an extremely large amount of data for us.

The stress test was a great exercise to uncover bottlenecks in our workflow. Because of the almost unbelievable magnitude of results returned�the equivalent of about 3/4 of the number of CPU results for one year in one week�it became apparent to us that the major bottleneck was what we internally call "rehydration/analysis." This is the step where we convert the so-called "genome" describing the location, rotation, and torsion state of a given docking result into xyz atom coordinates and perform the analysis.

The stress test motivated us to develop considerable optimizations in our code for the GPU version. These optimizations sped up rehydration/analysis by more than ten-fold, which led to an overall speedup of 5x of our workflow. These optimizations are lined up to be incorporated in our mainstream code source in the AutoDock-GPU GitHub page and will be available to the whole community, benefiting all the researchers that use our code for their simulations.

Currently, all computation is focusing on the spike protein of the SARS-CoV2 virus. The first work units targeting the spike were the "stress test," which docked about 300 million small molecules against one of many possible binding sites. Subsequently, we targeted multiple possible binding pockets with both reactive and non-reactive molecules.

The reactive molecules contain a chemical group capable of reacting selectively with either tyrosine or lysine amino acids (which are common building blocks of proteins) using a particular kind of sulfur chemistry (sulfonyl fluoride exchange, SuFEx). If any of these molecules really does bind to the spike protein, it could interfere with viral entry into human cells and, in turn, slow down the replication of the virus.

In our analysis, we filtered raw docking results to identify the most promising compounds to be synthesized and tested in biological assays. During this process, the number of results was reduced from hundreds of millions of molecules to a few dozen that showed the most interesting interaction patterns with the viral enzymes.

With our collaborators at Enamine, we identified those more accessible through synthetic chemistry, ultimately selecting molecules that could target two of the main proteases of the SARS-CoV2 virus: 28 for the protease Plpro, and 47 for the protease Mpro. Enamine rapidly synthesized, purified, and shipped the molecules to our experimental collaborators laboratories at Scripps Florida (Griffin and Kojetin labs) and Emory University (Sarafianos lab). As soon as biological results are available, we will share them with the community.

WCG posted:

Since the update was published, they've also begun looking ahead to possibly testing a second set of compounds. They will share further details if and when this happens.

# ? Jul 30, 2021 08:49

OhFunny: Jun 26, 2013; EXTREMELY PISSED AT THE DNC

Thanks for the updates mdxi! It's nice to be able to read all the news in one place and in one post.

# ? Jul 30, 2021 19:27

mdxi: Mar 13, 2006; to JERK OFF is to be close to GOD... only with SPURTING

I've had a node down since moving (roughly a month). Finally got around to diagnosing it yesterday: dead SSD. Here's the interesting/annoying thing though: instead of posting but failing to boot, the failure mode was to sit at the UEFI logo screen forever, regardless of keyboard input. This might be a good reason to disable boot splash on all my nodes -- I never bothered because commodity UEFI on consumer hardware gives you poo poo for diagnosing early boot problems.

Since bringing everything back online, I've had to use PPT settings (thanks, AMD!) to dramatically under-watt everything. As suboptimal as my old apartment was, with its under-specced window unit AC, the fact that the nodes were in the room with the AC meant that they ran cooler there. Here, with proper central HVAC, the irony is that the nodes are now off in a bedroom that I'm using as my office and while airflow is fine, this room becomes a hotspot that the centrally-located thermostat doesn't care about.

Given (1) this new reality, and (2) that many-core CPUs are becoming the norm across all market segments, and (3) the performance of chips like Apple's M1, I'm (4) starting to think that my next round of upgrades -- which is likely a year away, minimum -- might be in a more efficiency-oriented direction. I've always been a huge fan of efficient computing, but when Ryzen happened and there were suddenly so many cores available for so few dollars, I got greedy and pivoted toward just wanting to be able to crunch more.

The other option would be to go in the other direction, and have, say, two Threadripper based nodes instead. But that would be paying for a lot of hardware I don't need. Plugging a 64-core CPU with 144 (or whatever) PCIe lanes into a $500 mobo, then attaching a single 120G NVMe drive and a low-end GPU feels wasteful in the extreme.

Obvs, the answer is "wait and see". What's everybody else doing/thinking, hardware-wise?

# ? Aug 16, 2021 19:03

SamDabbers: May 26, 2003

Maybe try distributing your nodes around the house rather than concentrating them in a single room. Then you would have less of a hotspot in your office for the central air to deal with. Since networking speed isn't all that important for this application you could connect them via wifi if you don't want to run ethernet everywhere.

As for hardware I just use what I have and don't buy anything specifically for distributed number crunching. In the winter I will spin up my desktop and use it as a space heater, and my desktop-class "server" in the basement can crunch year round.

# ? Aug 16, 2021 19:49

mdxi: Mar 13, 2006; to JERK OFF is to be close to GOD... only with SPURTING

Yeah, that's the thermally optimal solution. Unfortunately I am now an old who (I know this may be hard to believe) is over hardware for hardware's sake. I don't want PCs scattered around my house anymore

For now I've actually just reduced their power usage even more. I'm down to 55W on the 3900s and 65W on the 3950s. Despite their higher limit the 3950s are running cooler because each core is getting so much less power -- they're now basically running at 2GHz, while the 3900s are still hitting around 2600MHz.

PBO is the best thing AMD has ever done, in my opinion. I'm sure the other 7 people in the universe who dramatically underwatt their machines feel the same way.

mdxi fucked around with this message at 01:40 on Aug 20, 2021

# ? Aug 20, 2021 01:38

Vir: Dec 14, 2007; Does it tickle when your Body Thetans flap their wings, eh Beatrice?

mdxi posted:

Here, with proper central HVAC, the irony is that the nodes are now off in a bedroom that I'm using as my office and while airflow is fine, this room becomes a hotspot that the centrally-located thermostat doesn't care about.

If you could have all air to the house go via this room before re-entering the system for distribution, that might work out, like a sort of light furnace. But the money spent on HVAC re-design might not make it worth it. Depends how long you're going to use the room like that, the projected future cost of energy, etc.

# ? Aug 25, 2021 12:42

Crunchy Black: Oct 24, 2017; by Athanatos

mdxi posted:

Given (1) this new reality, and (2) that many-core CPUs are becoming the norm across all market segments, and (3) the performance of chips like Apple's M1, I'm (4) starting to think that my next round of upgrades -- which is likely a year away, minimum -- might be in a more efficiency-oriented direction. I've always been a huge fan of efficient computing, but when Ryzen happened and there were suddenly so many cores available for so few dollars, I got greedy and pivoted toward just wanting to be able to crunch more.

The other option would be to go in the other direction, and have, say, two Threadripper based nodes instead. But that would be paying for a lot of hardware I don't need. Plugging a 64-core CPU with 144 (or whatever) PCIe lanes into a $500 mobo, then attaching a single 120G NVMe drive and a low-end GPU feels wasteful in the extreme.

Obvs, the answer is "wait and see". What's everybody else doing/thinking, hardware-wise?

I'm glad you asked about this because I noticed something yesterday while looking at efficiencies. (Mind you, this is all for F@H so my main concern is hosting GPUs, but) my Threadripper 1900x is VASTLY outclassed by both my 5800x and my 3600x. Like, by 3x. Was there an instruction set that first gen didn't get that's important? Don't recall. Now, both of the non-Threadripper machines are only hosting a GPU apiece, for F@H purposes, you want a thread per GPU available to feed it, so it's really only playing with 12 cores, but still.

All that observation is to say, you gotta pay to play, and yes, for BONIC only usage I think you'd get more bang for buck just building 2 or maybe even 3 Ryzen nodes for the cost of a single modern Threadripper, if long-term efficiency is your goal and these are always-on crunchers even considering modern, idiotic hardware costs. (Microcenter combos would be your friend, here.)

# ? Aug 25, 2021 14:24

Quaint Quail Quilt: Jun 19, 2006; Ask me about that time I told people mixing bleach and vinegar is okay

mdxi posted:

For now I've actually just reduced their power usage even more. I'm down to 55W on the 3900s and 65W on the 3950s. Despite their higher limit the 3950s are running cooler because each core is getting so much less power -- they're now basically running at 2GHz, while the 3900s are still hitting around 2600MHz.

PBO is the best thing AMD has ever done, in my opinion. I'm sure the other 7 people in the universe who dramatically underwatt their machines feel the same way.

Are you aware of this underclock stability benchmark that was posted a few pages ago on the AMD thread?
https://forums.somethingawful.com/showthread.php?threadid=3817104&pagenumber=597&perpage=40&userid=0#post517091802

I use 1usmus tools for auto OC, I'd use the ram OC calculator that he also wrote, but I foolishly bought ram not explicitly on my mobos compatible list and i'm dumb.

I also use 1usmus power plan.
(I'm aware you undervolt, it does that too as well as OC)

Do you run ram stock? I'll fold in winter.

Quaint Quail Quilt fucked around with this message at 16:24 on Aug 25, 2021

# ? Aug 25, 2021 16:21

pseudorandom: Jun 16, 2010; Yam Slacker

mdxi posted:

Here, with proper central HVAC, the irony is that the nodes are now off in a bedroom that I'm using as my office and while airflow is fine, this room becomes a hotspot that the centrally-located thermostat doesn't care about.

Vir posted:

But the money spent on HVAC re-design might not make it worth it. Depends how long you're going to use the room like that, the projected future cost of energy, etc.

While it wouldn't be as effective as a full HVAC redesign, you could move toward smart thermostats to help with hot spot rooms like that. I have an ecobee thermostat, and I bought two of their temperature/occupancy sensors for my office and bedroom. The main reason for buying them was a similar situation as this: my office is the farthest room from the AC and it's got 2 windows, 3 computers, and plenty of other tech stuff running (plus me) to warm the room all day. So, having a little sensor on the wall to tell the thermostat, "hey, I'm in here and the room is hot, please make it cooler", is a nice benefit.

# ? Aug 25, 2021 16:59

mdxi: Mar 13, 2006; to JERK OFF is to be close to GOD... only with SPURTING

Crunchy Black posted:

my Threadripper 1900x is VASTLY outclassed by both my 5800x and my 3600x. Like, by 3x. Was there an instruction set that first gen didn't get that's important?

Zen/Zen+ took 2 cycles to do AVX2. Zen2 dropped that to a single cycle, which led to enormous uplift for workloads using vector ops.

On top of that, a Zen3 core is 1.4X as fast as a Zen core, clock for clock, due to accumulated IPC gains (3% for Zen+, 15% for Zen2, 19% for Zen3). So Zen3 would retire an AVX2 instruction 2.8X faster than a Zen core at the same clockspeed.

mdxi fucked around with this message at 17:14 on Aug 25, 2021

# ? Aug 25, 2021 17:12

mdxi: Mar 13, 2006; to JERK OFF is to be close to GOD... only with SPURTING

Here's an article on the Anton 3 -- a custom-built, 528 core supercomputer optimized to do molecular dynamics processing, which is what an awful lot of BOINC projects (Folding, Rosetta, and about half of WCG to name a few) are doing.

https://www.nextplatform.com/2021/08/25/the-huge-payoff-of-extreme-co-design-in-molecular-dynamics/

# ? Aug 25, 2021 21:56

mdxi: Mar 13, 2006; to JERK OFF is to be close to GOD... only with SPURTING

WCG Monthly Update - August 2021

Mapping Cancer Markers

quote:

Thanks to the volunteers who responded to our recent request for additional donated computing time for Mapping Cancer Markers! The project continues to speed up; according to our tech team, it's now running approximately 30 percent faster than it was a month ago (which was already a faster pace than normal).

Help Stop TB

quote:

With the recent addition of new team members, the researchers are making decisions about how to proceed with two papers that are in the works. They are currently running tests and re-checking analyses for Paper 1 using new methods they recently developed. Once they've completed these analyses, they will determine the best direction for Paper 2.

Smash Childhood Cancer

quote:

Japanese government requests input from Dr. Nakagawara

Dr. Akira Nakagawara, the principal investigator for Help Fight Childhood Cancer and founding principal investigator for this project, was recently approached by the Japanese government to provide input on a proposed national research study of childhood cancer.

Data analysis updates

Since early this year, the researchers have been analyzing data from work run on World Community Grid.

Below are the key proteins for which we have new updates this month. Each of these proteins is involved in the development of at least one type of childhood cancer.

Beta catenin
The researchers continue testing on the two promising compounds that were mentioned in last month's update as showing activity with this protein.

Osteopontin
The biochemist who was brought on to help with testing for this protein is now using new instruments to help with affinity testing (which is a type of testing commonly used in drug discovery to help refine initial results).

TrkB
Since the last monthly update, the researchers submitted a grant proposal to the National Cancer Institute to help fund further research on this protein.

Note: Lab testing compounds for effectiveness generally requires several phases, and each phase can take at least several months.

Africa Rainfall Project

quote:

In last month's update, we asked volunteers who were already contributing to this project to consider making a change to their World Community Grid settings which would allow them to process more than one work unit at a time.

Thanks to the volunteers who answered this call, the project is seeing the following improvements:

* The pace has increased from a generation every 5.1 days to a generation every 4.4 days since last month (and down from around 6 days earlier this year).

* The next generation of work units is running on volunteers machines within 15 minutes of the previous generation being validated.

OpenPandemics

quote:

The research team recently sent 600 batches of accelerated work to World Community Grid. These batches containined simulated experiments on several important binding sites and additional compounds that could be promising as potential treatments.

Sulfonyl fluoride (SuFEx) is a molecule that is known to react and covalently bind to lysine (K) and tyrosine (Y) amino acid sidechains in proteins.

The researchers selected four possible binding sites in the main protease (Mpro) of SARS-CoV-2 that are adjacent to K and Y sidechains. They then prepared about 600 packages to be docked against nearly 300,000 molecules from Enamine that contain SuFEx. These batches were recently completed on World Community Grid

The most promising molecules will be purchased or synthesized, then shipped to the laboratory of Prof. Chris Schofield at the Chemistry Research Laboratory at the University of Oxford to verify experimentally if they bind to SARS-CoV-2 Mpro.

# ? Sep 1, 2021 06:22

Hasturtium: May 19, 2020; And that year, for his birthday, he got six pink ping pong balls in a little pink backpack.

Out of curiosity, what is the status of BOINC on Power architectures? I�m hoping to build a Raptor Computing Power9 machine and was wondering what projects I could contribute to. I�ve had my 7940x+RTX 3070 throwing work at Einstein@Home for a while.

# ? Sep 1, 2021 18:36

mdxi: Mar 13, 2006; to JERK OFF is to be close to GOD... only with SPURTING

Short answer: I can't find any in a few minutes of searching.

Long answer -- on the client side, BOINC has two parts: (1) the BOINC client itself, which acts as a scheduler and network transfer client, and (2) the application binaries provided by the projects that you attach BOINC to, which are downloaded and run transparently by the BOINC client

I would not be surprised if someone, or some distro, has ported the BOINC client to Power. My understanding is that it's pretty straightforward.

I have been unable to find projects that have built Power binaries of their application code. They tend to be short on resources of all sorts -- that's almost an "ipso facto" of BOINC usage.

My best guess would have been Folding, because they roll their own client and are funded by Stanford (who have plenty of money and nerds), but even they don't have a Power binary.

How's x86 binary translation on Power these days, I guess? :smith:

# ? Sep 1, 2021 21:24

Hasturtium: May 19, 2020; And that year, for his birthday, he got six pink ping pong balls in a little pink backpack.

mdxi posted:

Short answer: I can't find any in a few minutes of searching.

Long answer -- on the client side, BOINC has two parts: (1) the BOINC client itself, which acts as a scheduler and network transfer client, and (2) the application binaries provided by the projects that you attach BOINC to, which are downloaded and run transparently by the BOINC client

I would not be surprised if someone, or some distro, has ported the BOINC client to Power. My understanding is that it's pretty straightforward.

I have been unable to find projects that have built Power binaries of their application code. They tend to be short on resources of all sorts -- that's almost an "ipso facto" of BOINC usage.

My best guess would have been Folding, because they roll their own client and are funded by Stanford (who have plenty of money and nerds), but even they don't have a Power binary.

How's x86 binary translation on Power these days, I guess?

Sorta bearable. I don't have direct experience with it yet but QEMU can apparently get the job done. This makes me wonder how hard it would be to take a client executable intended for a Power Mac and then bludgeon it about the face and neck to make it into a starting point for a native Power executable. I also wonder if any BOINC projects have bothered to maintain those in the year 2021. The G5 was basically Power4 with a grafted-on Altivec unit, a few alterations, and hobbled cache, after all.

edit: It looks like ppc64-linux-gnu is supported, but that nobody's currently stepping up to bat. Dang.

Hasturtium fucked around with this message at 22:50 on Sep 1, 2021

# ? Sep 1, 2021 22:40

Lawman 0: Aug 17, 2010

I joined the something awful Milkyway@home team! :sun:

# ? Sep 7, 2021 13:11

mdxi: Mar 13, 2006; to JERK OFF is to be close to GOD... only with SPURTING

IBM has announced that they will no longer be funding or hosting World Community Grid. The project, in its entirety, will be transferred to the Krembil Research Institute, which is a unit of Toronto's University Health Network.

This is a link to the full announcement; my summary of the announcement and QA thread (linked in announcement) follows: https://www.worldcommunitygrid.org/about_us/viewNewsArticle.do?articleId=732

Krembil is a basic research unit, focusing on brain, spine, bone, joint, and eye disorders
It is home of Dr. Igor Jurisica, who is one of the researchers on the current Mapping Cancer Markers project, and was involved in the Help Conquer Cancer project (2012-2015; WCG's first GPU-enabled project)
All current projects will continue
A new project focued on Parkinson's will be coming
The website will get an update (very soon)
IBM computers will still be able to participate in WCG (it turns out that IBM compute participation in the project has been voluntary)
Predictably, some volunteers are more concerned about their points than about any actual impact (positive or negative) that this will have for the project. Yes, the numbers in the DB will be migrated over
There is no word on why or how this decision has been made

Edit: more info added as it becomes available

WCG admins are answering questions about IBM and staffing very carefully, but it sounds like no staff will be transferring over. Current admins are assisting though the transition period
Stated areas of interest for new research in the post-transition era: "Alzheimer�s and arthritis, in addition to viruses and pathogens"
The WCG open data pledge will remain in place
It sounds like Krembil will not be the instigator of all future projects, but that has not been explicitly stated
It has been explicitly stated that updates from research teams of past projects will continue
Krembil now has their own announcement up: https://www.uhn.ca/corporate/News/Pages/Community_based_supercomputer_coming_to_Krembil_Research_Institute.aspx
There will be a new logo
I asked about the other two climate projects (announced pre-covid and originally in the pipeline behind ARP1) and got no response from admins

mdxi fucked around with this message at 00:07 on Sep 14, 2021

# ? Sep 13, 2021 15:51

Vir: Dec 14, 2007; Does it tickle when your Body Thetans flap their wings, eh Beatrice?

The Covid Moonshot (of which Folding@Home is a part) is getting a 8 million GBP grant from Wellcome and the Covid-19 Therapeutics Accelerator to develop a treatment for coronaviruses.

Press release: COVID Moonshot funded by COVID-19 Therapeutics Accelerator to rapidly develop a safe, globally accessible and affordable antiviral pill
News post on Folding@Home: https://foldingathome.org/2021/09/27/covid-moonshot-wellcome-trust-funding/?lng=en

John Chodera posted:

This funding will enable the Moonshot to rapidly complete its final stages of lead optimization and perform the preclinical studies needed to reach the equivalent of Investigational New Drug (IND) filings to begin clinical trials.

# ? Sep 28, 2021 09:31

mdxi: Mar 13, 2006; to JERK OFF is to be close to GOD... only with SPURTING

This is super cool. I have switched my GPUs over to Folding, attached to team SAGoons.

# ? Sep 28, 2021 19:40

mdxi: Mar 13, 2006; to JERK OFF is to be close to GOD... only with SPURTING

Well, after a day of running Folding, I am now unhappy with most of my GPUs. They were very much an afterthought in my boxes, and in the lead-up to OpenPandemics going GPU-enabled, only 3 or my 6 machines even had GPUs slotted. When the OPNG alpha was announced I scrambled to get cards, and since the GPU crunch was already underway I had to take what I could get. That meant that I ended up with this situation:

2x GTX 1650 (pre-existing, bought for crunching)
1x GTX 750 Ti (pre-existing, formerly in a desktop, bought in like 2015 lol)
1x RX 550 (bought for OPNG)
1x WX 3200 (the workstation version of an RX 550, bought for OPNG)

I crunch on Linux, where the driver situation has been very interesting since AMDGPU happened. Prior to that, it was generally agreed that if you wanted to do anything other than display a desktop on a linux machine, you wanted an Nvidia GPU -- because as terrible as the drivers were compared to Windows, they were still far better than what AMD could occasionally be bothered to poo poo out. With the AMDGPU driver that flip-flopped: support and bugfixes were now much better with AMD, and in some cases the Linux driver outperformed Windows.

In terms of compute however, nothing has changed. CUDA still outperforms OpenCL, and more projects support CUDA. ROCm support (supposedly AMD's answer to CUDA) has moved very, very slowly on the AMD side and I'm not aware of any projects which actually use it. Meanwhile, if you want OpenCL 2.2 support (a spec which was released in 2017), you have to install the AMDGPU-PRO drivers rather than the standard one that comes with every distro.

So it was a hassle to get the 550s going, and yeah they sucked a little, but on OPNG the WUs were small enough that the performance deltas were insignificant. Who cares if something takes 40 seconds instead of 30?

Folding is a whole other story though. These WUs are chunky, taking 6-8 hours on my 1650s and a little over 2 days on the RX 550s. That's not a 30% delta anymore; it's like 700%. Even the 750Ti, which is an older card, is roughly twice as fast as the 550s. That performance is so bad that it almost feels like it would be a net positive to just turn the 550s off.

Also, if you were counting, you may have noticed that I only listed 5 GPUs for 6 machines. I'm currently down one, so that node isn't contributing to Folding at all. I guess it's time to start shopping for hilariously overpriced 1050/1650s on eBay because you still can't buy new GPUs, and even if I could there is no such thing as a 2050/3050, and I don't want anything that needs more than 75W. I'm suddenly genuinely interested in how Intel's dGPUs are going to perform and how available they'll be, since (1) leaks indicate that there will be a 75W entry part and (2) Intel usually has exceptional driver support on Linux.

mdxi fucked around with this message at 18:49 on Sep 29, 2021

# ? Sep 29, 2021 18:43

Vir: Dec 14, 2007; Does it tickle when your Body Thetans flap their wings, eh Beatrice?

I know what you mean about AMD GPU support - I messed around a lot with that in Linux until my AMD card became obsolete in terms of PPD.

Folding@Home has beta support for Intel GPUs. So far, some projects can be folded on integrated Intel GPUs, but the reason they're even bothering with this support is in anticipation of higher powered discrete Intel GPUs.

# ? Sep 29, 2021 19:09

Adbot: ADBOT LOVES YOU

# ? Apr 24, 2024 02:31

Crunchy Black: Oct 24, 2017; by Athanatos

mdxi posted:

Well, after a day of running Folding, I am now unhappy with most of my GPUs. They were very much an afterthought in my boxes, and in the lead-up to OpenPandemics going GPU-enabled, only 3 or my 6 machines even had GPUs slotted. When the OPNG alpha was announced I scrambled to get cards, and since the GPU crunch was already underway I had to take what I could get. That meant that I ended up with this situation:

2x GTX 1650 (pre-existing, bought for crunching)

1x GTX 750 Ti (pre-existing, formerly in a desktop, bought in like 2015 lol)

1x RX 550 (bought for OPNG)

1x WX 3200 (the workstation version of an RX 550, bought for OPNG)

You're going to have a bad time folding if you're not running at least Pascal-class GPUs. About a year ago NVIDIA did what they never do and optimized the folding base library to work better with modern CUDA capable GPUs. I run a single RX5700 living room machine just as a proof of concept. That update was good for a million extra PPD cumulatively over the rest of my cards and I'll continue to pay the NVIDIA tax because they're just that much better. Frankly, that 750, 550 and 3200 ain't worth the power you're feeding them to fold; you're going to bump up against time limits quickly.

e: found it and it was EXACTLY a year ago https://www.pcgamer.com/nvidia-cuda...20acceleration.

# ? Sep 29, 2021 20:18

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > BOINC/Grid computing: Numbers Go Up, Hardware Edition

«‹›13 »