Render Farm Issue

Started by eze123, March 26, 2015, 03:53:17 AM

Previous topic - Next topic

0 Members and 2 Guests are viewing this topic.

eze123

We have a 4 node render farm that run on the following hardware

1x HP DL360 Dual Xeon 2.5ghz 12gb Windows Server 2008 R2
3x Dell R410 Dual Xeon 2.27ghz 12gb Windows Server 2008 R2

We are running version 5.1.35

My question is, I'm having very strange issues with the Dell servers where the machines get in a state that you cannot RDP or console into the units.  They technically show up online to keyshot, but anything else from a management standpoint makes it appear that it's offline.

In reviewing the event logs, around the time when they stop, I get some interesting event logs that indicate a lack of resources.  I get some DCOM issues as well as some Kernel-general issues. 

One particular issue I see around that time points to this

The Windows Modules Installer service failed to start due to the following error:
Not enough storage is available to process this command.


This other log leads me to resource issues:

An I/O operation initiated by the Registry failed unrecoverably.The Registry could not flush hive (file): ''.

As far as I can tell, we have more than enough storage (192gb free out of 250gb) and I would believe 12gb is enough

I have seen this error on all 3 Dell Servers.  I have updated all firmware and drivers on the Dells as well as have done a clean install on the OS for the 3 Dells. 

Has anyone had a similar issue?  The logs would lead me to believe that there's some sort of memory leak or something during the render process (I haven't been able to coincide the crash to any sort of job yet)  The fact that it's only happened on the Dells would lead me to believe it's hardware related.

Any ideas would be great.

E


guest84672

As a first step, can you please do me a favor and upgrade to KeyShot 5.2 and the latest version of Network Rendering?

eze123

I'll have to work with my team to make sure we can move to 5.2.  I do know though that we've had this issues through out the past few months and have upgraded through a number of iterations. 

We started off on a version of 4.x where we didn't see this issue, but had issues with missing buckets.  We were asked to move to 5.0 at which point we started seeing the server issues (it fixed the missing buckets issue though).   Now on 5.1 we are still seeing the issue.  Is this something you believe 5.2 addresses or do you just want us on the latest version?

E

guest84672

I would think that this is a hardware specific issue.

However, we are constantly improving and fixing things, so it is always recommended that you run the latest version.

angelina22

I would prefer you mostly take it to the dell customer care services.