Help diagnosing my network rendering issues, once and for all

Started by tfinlay, April 02, 2014, 10:31:10 AM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

tfinlay

Here is the situation here:

6 users, KS 4.3.10
1 master/9 slave 256 core render farm  KS 4.3.4 Network Rendering
The same render farm is used for Rhino/Vray rendering as well.  The VRay host is a different machine than the KS host/master.

All machines are Win 7 Pro x64.

Every few days, all the machines need to be rebooted as KS network rendering no longer works.  A file can often be sent to render, but the rendering will show 0% the whole time.  I want to track down what is causing this issue once and for all.  Rebooting every time KS is needed is not an acceptable solution for us.

I have admin rights to my user machine and all 10 Render Farm machines. 

Can I turn on some sort of logging to see exactly what is happening with the network rendering on each machine? 
I want to see some definitive information I can pass onto KS support OR our IT department.

DriesV

Latest version of NR is 4.4.7. Maybe that solves things?

Dries

tfinlay

Quote from: DriesV on April 02, 2014, 10:42:15 AM
Latest version of NR is 4.4.7. Maybe that solves things?

Dries

This is the first thing KS support suggests to us if I bring the issue up to them.  All the way back to 3.0.  The problem here is that by the time I get it installed and all of the users on the same page, an even newer version has come out.  We've decided to hold back upgrades for a bit until we can figure out the problem.  I figure all 16 machines on the same version is better than a mixed bag.

I'm happy to upgrade if someone can point out a reason 4.4.7 would elminate this problem, but to upgrade just because its newer is not something I have the desire to do each time a new version comes out.

thomasteger

For one you are using 2 outdated pieces of software. Please make sure to upgrade both KeyShot and Network Rendering.

Why are you rebooting the machines? What does the slave status in the Network Render Queue tell you? Are all slaves active?

tfinlay

Well, right now all is working (figures) so I can't take a screen shot, but IIRC usually Network Queue will show all machines connected/active, but none of the tiles are counting down.  I reboot because that's what my boss tells me to do to fix the problem.  I'm sure there are other solutions as well.

I'll confirm exactly what is happening when I can replicate the problem again.

tfinlay

So, all is still "working".  I'm suspecting this is due to a lack of Vray usage recently.

However, this has started happening (see attachment)

What would cause this?


tfinlay

Checked again and only 8 or the 10 machines were up connected to the keyshot network queue.  I logged into that one remotely to check on things.  It still looked OK - I can't figure out how to tell why it isn't connected...

I tried to run KS Network Config to maybe start/stop and got this error...

I have 2 questions: 

1) if only 12 gigs are in use, why can KS Network Config not run?

2) Is there a good log to see where all this memory is going?  I made the mistake of rebooting before looking in Task Manager.  I can have a look at that next time.


I logged into the second machine, and it has no memoy issues - it just isn't showing on the network queue for some reason.  Running network config and start/stop connected it again.  Any ideas what I can do about this?

thomasteger

This is very odd. Can you try and run the installer as admin?

tfinlay

I'm an admin on all the machines... do you mean right click, run as administrator?  When you say "installer" - which installer are you referrring to?  Just curious, what would that do?

The offending machines have been either reconnected or rebooted to get all 10 working.  When one goes down again, I will certainly give it a shot.


thomasteger

I meant to say "application". Yes, right click, run as administrator. Can you open the render queue without any problems?