Computing desk | ||
---|---|---|
< May 9 | << Apr | May | Jun >> | Current desk > |
Welcome to the Wikipedia Computing Reference Desk Archives |
---|
The page you are currently viewing is an archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages. |
...well, that's how I think of them anyway - basically a small, boxy thing without peripherals but serious processing power. I'm looking for something to run month-long single processor simulation models on, and if that pairs an i7 processor and RAM with one USB slot and nothing else, it'd be fine. I just don't know what term to even search for here. Suggestions? -- Elmidae ( talk · contribs) 16:47, 10 May 2019 (UTC)
Thanks guys, that is lots of material to ponder! I admit the question was vague - that's because I'm not even sure what characteristics will be desirable. The use case here is agent-based models that run in an environment which is wedded to a single memory space (built in MASON (Java), FYI). As distributing the model is thus not an option, it all depends on local processor speed; and since we are looking at weeks to months per run here, it seems sensible to put the money into a fast processor and save on graphics card (there's no graphical output), peripherals etc. Hence my eyeing what I perceive as stripped-down workhorse systems. If that ultimately comes out as more expensive than just hijacking an ex-lease gaming rig, then it's probably not the way to go.
- Related question: The MASON documentation states: MASON is not a distributed toolkit. [...] MASON was designed to be efficient when running in a single process, albeit with multiple threads. It requires a single unified memory space, and has no facilities for distributing models over multiple processes or multiple computers.
- I'm interpreting that as being able to use multiple cores within one system for threading (as AFAIK thread distribution to cores is handled by the OS anyway, and cores share one memory space); but no functionality of distributing to something that makes use of physically separate RAM. Does that sound right? --
Elmidae (
talk ·
contribs) 17:22, 11 May 2019 (UTC)
P.S. I read slightly more about AMD's Threadripper design and I think for those processors that support NUMA, although there is obviously a difference between local and 'remote' memory access (as implied by NUMA) the actual difference is fairly small [11] as the processor to processor interconnect on the MCM is fairly advanced (and short) so 'order of magnitude' is not likely to be accurate. It may be true for multi socket systems relying on Intel QuickPath Interconnect or HyperTransport though. Not sure how things are for Epyc multisocket systems using Infinity Fabric [12] or Xeons with Intel Ultra Path Interconnect [13].
That said, I perhaps didn't emphasise enough that I find it unlikely something that will need to concern you. My main point was to consider whether there was a performance reason to restrict yourself to single socket systems, in reality even buying refurbished ex lease systems these tend to very expensive so probably aren't what you want to look at. (Discounting stuff too old to be worth it like Core 2 era systems.) And besides even if they are in your price range, before worrying about NUMA, it's probably worth making sure your workload can reasonably take advantage of 16 or probably more threads you'd expect from such a system.
BTW I earlier mentioned performance of the memory subsystem, but of course performance has various measures like bandwidth and different aspects of latency. And these aren't always proportional in fact increased bandwidth can often come at the expense of latency.
P.P.S. Reading the MASON thing you quoted, my read is it's only really talking about trying to run multiple copies of the program perhaps on physically separated computers, clusters etc. It may be you will have performance disadvantages from running on a system with multiple NUMA nodes since it probably doesn't know how to manage this, but maybe not if the workload isn't highly dependent on memory access performance. Remember NUMA/UMA is about uniform memory access. AFAIK all such systems will still have a unified address space and so programs are free to simply ignore that it's NUMA if they want, just with possible performance disadvantages. [14] (The OS should hopefully recognise it's NUMA and try to schedule things accordingly as best as it can but I would imagine this can be limited when the program itself is using all the threads.)
Computing desk | ||
---|---|---|
< May 9 | << Apr | May | Jun >> | Current desk > |
Welcome to the Wikipedia Computing Reference Desk Archives |
---|
The page you are currently viewing is an archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages. |
...well, that's how I think of them anyway - basically a small, boxy thing without peripherals but serious processing power. I'm looking for something to run month-long single processor simulation models on, and if that pairs an i7 processor and RAM with one USB slot and nothing else, it'd be fine. I just don't know what term to even search for here. Suggestions? -- Elmidae ( talk · contribs) 16:47, 10 May 2019 (UTC)
Thanks guys, that is lots of material to ponder! I admit the question was vague - that's because I'm not even sure what characteristics will be desirable. The use case here is agent-based models that run in an environment which is wedded to a single memory space (built in MASON (Java), FYI). As distributing the model is thus not an option, it all depends on local processor speed; and since we are looking at weeks to months per run here, it seems sensible to put the money into a fast processor and save on graphics card (there's no graphical output), peripherals etc. Hence my eyeing what I perceive as stripped-down workhorse systems. If that ultimately comes out as more expensive than just hijacking an ex-lease gaming rig, then it's probably not the way to go.
- Related question: The MASON documentation states: MASON is not a distributed toolkit. [...] MASON was designed to be efficient when running in a single process, albeit with multiple threads. It requires a single unified memory space, and has no facilities for distributing models over multiple processes or multiple computers.
- I'm interpreting that as being able to use multiple cores within one system for threading (as AFAIK thread distribution to cores is handled by the OS anyway, and cores share one memory space); but no functionality of distributing to something that makes use of physically separate RAM. Does that sound right? --
Elmidae (
talk ·
contribs) 17:22, 11 May 2019 (UTC)
P.S. I read slightly more about AMD's Threadripper design and I think for those processors that support NUMA, although there is obviously a difference between local and 'remote' memory access (as implied by NUMA) the actual difference is fairly small [11] as the processor to processor interconnect on the MCM is fairly advanced (and short) so 'order of magnitude' is not likely to be accurate. It may be true for multi socket systems relying on Intel QuickPath Interconnect or HyperTransport though. Not sure how things are for Epyc multisocket systems using Infinity Fabric [12] or Xeons with Intel Ultra Path Interconnect [13].
That said, I perhaps didn't emphasise enough that I find it unlikely something that will need to concern you. My main point was to consider whether there was a performance reason to restrict yourself to single socket systems, in reality even buying refurbished ex lease systems these tend to very expensive so probably aren't what you want to look at. (Discounting stuff too old to be worth it like Core 2 era systems.) And besides even if they are in your price range, before worrying about NUMA, it's probably worth making sure your workload can reasonably take advantage of 16 or probably more threads you'd expect from such a system.
BTW I earlier mentioned performance of the memory subsystem, but of course performance has various measures like bandwidth and different aspects of latency. And these aren't always proportional in fact increased bandwidth can often come at the expense of latency.
P.P.S. Reading the MASON thing you quoted, my read is it's only really talking about trying to run multiple copies of the program perhaps on physically separated computers, clusters etc. It may be you will have performance disadvantages from running on a system with multiple NUMA nodes since it probably doesn't know how to manage this, but maybe not if the workload isn't highly dependent on memory access performance. Remember NUMA/UMA is about uniform memory access. AFAIK all such systems will still have a unified address space and so programs are free to simply ignore that it's NUMA if they want, just with possible performance disadvantages. [14] (The OS should hopefully recognise it's NUMA and try to schedule things accordingly as best as it can but I would imagine this can be limited when the program itself is using all the threads.)