Message boards :
Number crunching :
Efficiency improvement by larger GPU tasks?
Message board moderation
Author | Message |
---|---|
Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0 |
At the moment, this project gives out tiny tasks for GPUs that are done in under a minute. By contrast, projects like Einstein are under an hour. Perhaps this would involve a big rewrite, but wouldn't it drastically reduce the server load if a big batch of an hour or so's work was included in each task? I read somewhere from Tom that currently each task is a bundle of 5, and those 5 each have a parameter passed in the command line. Could this be changed so for each task we download a text file of 1000 parameters and work through all that before giving it back? |
Send message Joined: 12 Nov 21 Posts: 236 Credit: 575,038,236 RAC: 0 |
At the moment, this project gives out tiny tasks for GPUs that are done in under a minute. By contrast, projects like Einstein are under an hour.I would think it would cut down on the number of client/server contacts. That would be a help. So just what is the client/server contact per hour number? Anybody know? |
Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0 |
The problem I'm seeing with my machines is because the server won't give out tasks while getting them back, my computers are pestering the server every couple of minutes asking for work when the buffer is running low, but never get work because there's always one of those tiny tasks finished. So in my case it can be as often as every 1.5 minutes per computer (the minimum time on the server between asks).At the moment, this project gives out tiny tasks for GPUs that are done in under a minute. By contrast, projects like Einstein are under an hour.I would think it would cut down on the number of client/server contacts. That would be a help. So just what is the client/server contact per hour number? Anybody know? But my main thought was if each task was 100 times larger, there would be 100 times less tasks for the server to keep track of in the database. |
Send message Joined: 12 Nov 21 Posts: 236 Credit: 575,038,236 RAC: 0 |
Yeah, that's one way to look at it. If I could keep my GPU separation running 24/7 with no waits for downloads, I'd be a happy camper. I'm OK with the small size if I don't run out of work. I kinda get a kick out of seeing a GPU task finish every 30 seconds, while the same task in CPU takes 50 minutes. Mind boggling. But yes, MW is leaving a lot of work on the table that could be completed, if there were no gaps in task delivery to the clients. Make the tasks larger, or keep the client buffer full. Or, as the Pointy Haired Boss in Dilbert says, "Let's do both!"The problem I'm seeing with my machines is because the server won't give out tasks while getting them back, my computers are pestering the server every couple of minutes asking for work when the buffer is running low, but never get work because there's always one of those tiny tasks finished. So in my case it can be as often as every 1.5 minutes per computer (the minimum time on the server between asks).At the moment, this project gives out tiny tasks for GPUs that are done in under a minute. By contrast, projects like Einstein are under an hour.I would think it would cut down on the number of client/server contacts. That would be a help. So just what is the client/server contact per hour number? Anybody know? |
Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0 |
Yeah, that's one way to look at it. If I could keep my GPU separation running 24/7 with no waits for downloads, I'd be a happy camper. I'm OK with the small size if I don't run out of work. I kinda get a kick out of seeing a GPU task finish every 30 seconds, while the same task in CPU takes 50 minutes. Mind boggling. But yes, MW is leaving a lot of work on the table that could be completed, if there were no gaps in task delivery to the clients. Make the tasks larger, or keep the client buffer full. Or, as the Pointy Haired Boss in Dilbert says, "Let's do both!"I like seeing them go fast too and would miss that. But the server must be doing a lot of work matching up so many millions of tasks. I remember back to before they put them in bundles of 5, you could do one in 15 seconds! |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
Yeah, that's one way to look at it. If I could keep my GPU separation running 24/7 with no waits for downloads, I'd be a happy camper. I'm OK with the small size if I don't run out of work. I kinda get a kick out of seeing a GPU task finish every 30 seconds, while the same task in CPU takes 50 minutes. Mind boggling. But yes, MW is leaving a lot of work on the table that could be completed, if there were no gaps in task delivery to the clients. Make the tasks larger, or keep the client buffer full. Or, as the Pointy Haired Boss in Dilbert says, "Let's do both!" The only problem I can see with this is 'do they have another Server to analyze the data we are sending back or do they use the same one' because if they are using the same one then longer tasks would take longer to analyze and therefore more time the server 'is busy'. This again could be a money problem but getting a Server to analyze the data could be a pretty easy thing as it doesn't need to be the best of the best to do that. The IT folks should be able to help them figure out what specs they need so they aren't buying another one next year to replace this years purchase. |
Send message Joined: 24 Jan 11 Posts: 715 Credit: 555,447,973 RAC: 38,746 |
Is this what you were looking for Mikey? https://grafana.kiska.pw/d/boinc/boinc?orgId=1&from=now-32d&to=now&var-project=milkyway@home |
Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0 |
The only problem I can see with this is 'do they have another Server to analyze the data we are sending back or do they use the same one' because if they are using the same one then longer tasks would take longer to analyze and therefore more time the server 'is busy'. This again could be a money problem but getting a Server to analyze the data could be a pretty easy thing as it doesn't need to be the best of the best to do that. The IT folks should be able to help them figure out what specs they need so they aren't buying another one next year to replace this years purchase.The tasks at the moment are a bundle of 5. I was thinking a bundle of 100 would be better. Overall the same amount of individual tasks within the task need to be processed, so shouldn't give more work for the server. But being less Boinc tasks running as they're larger bundles could perhaps make the database smaller to keep track of which of us has which one and needs to be compared to a wingman. |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
Is this what you were looking for Mikey? YES Thank you very much!! |
Send message Joined: 3 Mar 13 Posts: 84 Credit: 779,527,712 RAC: 0 |
was there something like more than 5 jobs in a workunit causes the command line parameters to overflow / run out of space or was that something else ? |
Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0 |
was there something like more than 5 jobs in a workunit causes the command line parameters to overflow / run out of spaceYou are correct, but why do they have to be in command line parameters? Why can't the parameters be in a text file and contain enough for 100? |
Send message Joined: 16 Mar 10 Posts: 213 Credit: 108,362,595 RAC: 4,488 |
To get a definitive answer to that one would need to ask the original programmer(s) of both Separation and n-Body :-)was there something like more than 5 jobs in a workunit causes the command line parameters to overflow / run out of spaceYou are correct, but why do they have to be in command line parameters? Why can't the parameters be in a text file and contain enough for 100? However, my guess is that because the parameters in question are the only data items specific to a single work unit, it means that by not putting them in a file the overheads of creating and cleaning up the files is avoided. (There are a couple of files associated with all work units in a given batch, but I think those are immutable from start to eventual convergence of results and end of batch.) By the way, when the results come back it isn't just a case of "Log them in the database and that's that..." I believe they are using TAO, the "Toolkit for Asynchronous Optimization", in which case there is extra work associated with both validation and work unit generation! I have no idea how well that might scale with increase in jobs per work unit :-) One issue which some GPU users don't seem to consider is that work units that are good for GPUs are highly unlikely to be good for users with laptops and older PCs who want to contribute to a project, which is why places like Einstein@home tend to have distinct projects for different hardware combinations! Bear in mind that not all projects can be easily cut up into sub-projects for different sorts of hardware, and the two projects here are likely to fall into that category (unless TAO can cope with multiple projects working on a single optimization data set.) In such situations it is then up to the project team to decide on a balance between how quickly they want results and how many volunteers they are willing to exclude! Cheers - Al. P.S. For examples of what can happen if a project tries to feed "GPU fever" to excess, consider how SETI@home struggled in its later days as more folks with NVIDIA GPUs got access to a high-performance CUDA app; regular transitioner backlogs and all sorts of other problems... Also, when WCG did a deliberate stress test on their OPN1/OPNG project by just "opening the floodgates" the consequences there were very similar to what we have been seeing here and they had a far more powerful infrastructure to run things on... "Be careful what you wish for -- you might get it!" |
©2024 Astroinformatics Group