Fatal bounds error detected

Hello Everybody, Briefly before Christmas I have a CFD Problem... Maybe anybode can help me: I am doing a transient simulation with cavitation. First

Old
  December 22, 2010, 11:36

Default
Stopped in routine ENFORCE_BOUNDS

 
#1

Senior Member

 

Join Date: Mar 2009

Posts: 138

Rep Power: 15

camoesas is on a distinguished road

Hello Everybody,

Briefly before Christmas I have a CFD Problem…
Maybe anybode can help me:

I am doing a transient simulation with cavitation. First I have made a transient simulation with cavitation turned off as initial guess. Now I have turned on cavitation and the solver explodes after a few Iterations. I get pressure values of about 300 bar, which causes my solver to abort.
See the Outfile Below:

Quote:

Slave: 3
Slave: 3 Fatal bounds error detected
Slave: 3 —————————
Slave: 3 Variable: Fluid 1.Density
Slave: 3 Locale : Innen
Parallel run: Received message from slave
——————————————
Slave partition : 3
Slave routine : ErrAction
Master location : RCVBUF,MSGTAG=1022
Message label : 001100279
Message follows below — :

+———————————————————————+
| ERROR #001100279 has occurred in subroutine ErrAction. |
| Message: |
| Stopped in routine ENFORCE_BOUNDS |
| |
| |
| |
| |
| |
+———————————————————————+
+———————————————————————+
| An error has occurred in cfx5solve: |
| |
| The ANSYS CFX solver exited with return code 1. No results file |
| has been created. |
+———————————————————————+
End of solution stage.
+———————————————————————+
| The following transient and backup files written by the ANSYS CFX |
| solver have been saved in the directory |
| D:kavitation_01_004: |
| |
| 1353_full.trn |
+———————————————————————+

+———————————————————————+
| The following user files have been saved in the directory |
| D:kavitation_01_004: |
| |
| mon |
+———————————————————————+

This run of the ANSYS CFX Solver has finished.

Some Iterations before I got this Strange Notice:

Quote:

Parallel run: Received message from slave
——————————————
Slave partition : 3
Slave routine : EX_TABLE
Master location : End of Continuity Loop
Message label : 009100008
Message follows below — :
+———————————————————————+
| ****** Notice ****** |
| While evaluating Fluid 1.Temperature, |
| Absolute Pressure |
| went outside of its upper limit. Its maximum value was |
| 3.5814E+07. The bounds error was handled by extrapolation. |
| If this situation persists, consider increasing the table range. |
+———————————————————————+

Some Details for my Case:

— reference pressure: 0 atm
— timestep: 5.5e-5 s
— inlet: total pressure
— outlet1: opening
— outlet2: pressure outlet

Any Hints appreciated! Thanks in advance

Simon

More Details required?

camoesas is offline

 

Reply With Quote

Old
  December 22, 2010, 18:21

Default

 
#2

Super Moderator

 

Glenn Horrocks

Join Date: Mar 2009

Location: Sydney, Australia

Posts: 17,275

Rep Power: 136

ghorrocks is just really niceghorrocks is just really niceghorrocks is just really niceghorrocks is just really nice

Your simulation is not converging well. Need to improve the numerical stability — smaller timesteps, better mesh quality and check the physics.

ghorrocks is offline

 

Reply With Quote

Old
  December 23, 2010, 07:11

Default

 
#3

Senior Member

 

Join Date: Mar 2009

Posts: 138

Rep Power: 15

camoesas is on a distinguished road

HI Glenn,

I have double checked the mesh quality, it is quite well:
Minimum Angle > 27�
Determinant > 0.5

I decrease the timestep to nanoseconds (which is a valuable size for cavitation problems) and see after christmas if it helped.

Here some details for my cavitation model:
— Rayleigh Plesset
— Mean Diameter: 2e-6m
— Saturation Pressure: 0.02 bar

Merry Christmas!

camoesas is offline

 

Reply With Quote

Old
  December 23, 2010, 07:21

Default

 
#4

Super Moderator

 

Glenn Horrocks

Join Date: Mar 2009

Location: Sydney, Australia

Posts: 17,275

Rep Power: 136

ghorrocks is just really niceghorrocks is just really niceghorrocks is just really niceghorrocks is just really nice

Mesh quality requirements are different for different physics models. The rules of thumb for single phase flow are often not appropriate for multi phase flow. I would spend some time to get the mesh as good as you can in the area of cavitation as it will pay dividends with improved convergence, better accuracy and reduced run time.

I would use adaptive time stepping to let it find its own time step size.

ghorrocks is offline

 

Reply With Quote

Old
  January 12, 2011, 03:57

Default

 
#5

Senior Member

 

Join Date: Mar 2009

Posts: 138

Rep Power: 15

camoesas is on a distinguished road

Hello Everybody and a Happy new Year!

I�ve got my case running, it was a false Expert Parameter Setting, I had:
solve volfrc = f

Setting this parameter to true, keeps the solver running. But Convergence is still bad.

Anyway now I am facing a real annoying problem:
My outfile clearly states to write Pressure to Transient file:

Code:

     TRANSIENT RESULTS: Transient Results 1
       File Compression Level = Default
       Include Mesh = No
       Option = Selected Variables
       Output Variables List = Fluid 1.Density,Fluid 1.MassTransfer,Fluid 
         1.Velocity u,Fluid 1.Velocity v,Fluid 1.Velocity w,Fluid 2.Volume 
         Fraction,MassTransfer,Pressure,Total Pressure

But in Post I cant read pressure data, all other variables are available!

Has anybody a solution for this stupid problem?!

Thanks

camoesas is offline

 

Reply With Quote

Old
  January 12, 2011, 06:10

Default

 
#6

Super Moderator

 

Glenn Horrocks

Join Date: Mar 2009

Location: Sydney, Australia

Posts: 17,275

Rep Power: 136

ghorrocks is just really niceghorrocks is just really niceghorrocks is just really niceghorrocks is just really nice

Your expert parameter turns the solving of the volume fraction equation off. You are not going to get far when you are not solving the equations.

I have no idea why pressure is not in the output file.

ghorrocks is offline

 

Reply With Quote

Old
  January 13, 2011, 07:43

Default

 
#7

Senior Member

 

Join Date: Mar 2009

Posts: 138

Rep Power: 15

camoesas is on a distinguished road

There is pressure in the out files indeed.
But I have to choose Solver Pressure instead of Pressure. This comes with the cavitation model. Its explained in the User Help

camoesas is offline

 

Reply With Quote

Old
  February 24, 2019, 22:20

Default

 
#8

New Member

 

luo dan

Join Date: Sep 2018

Posts: 27

Rep Power: 6

LUO DAN is on a distinguished road

I also encountered this problem. And what is the cause of this problem? I am so confused. I hope I can get your help. Thank you very much!

LUO DAN is offline

 

Reply With Quote

Old
  February 24, 2019, 22:49

Default

 
#9

Senior Member

 

Join Date: Jun 2009

Posts: 1,647

Rep Power: 29

Opaque will become famous soon enough

Could you post the message in the output file?

The <unknown> variable is out of bounds. the suggestion will depend on which variable is listed,

Opaque is offline

 

Reply With Quote

Old
  February 25, 2019, 00:03

Default

 
#10

New Member

 

luo dan

Join Date: Sep 2018

Posts: 27

Rep Power: 6

LUO DAN is on a distinguished road

the error is as follows:
thank you!

Slave: 9
Slave: 9 Fatal bounds error detected
Slave: 9 —————————
Slave: 9 Variable: Absolute Pressure
Slave: 9 Locale : S1

Parallel run: Received message from slave
——————————————
Slave partition : 9
Slave routine : ErrAction
Master location : Message Handler
Message label : 001100279
Message follows below — :

+———————————————————————+
| ERROR #001100279 has occurred in subroutine ErrAction. |
| Message: |
| Stopped in routine ENFORCE_BOUNDS |
| |
| |
| |
| |
| |
+———————————————————————+

+———————————————————————+
| ERROR #001100279 has occurred in subroutine MESG_RETRIEVE. |
| Message: |
| Stopping the run due to error(s) reported above |
| |
| |
| |
| |
| |
+———————————————————————+

+———————————————————————+
| An error has occurred in cfx5solve: |
| |
| The ANSYS CFX solver exited with return code 1. No results file |
| has been created. |
+———————————————————————+

End of solution stage.

+———————————————————————+
| The following user files have been saved in the directory |
| D:/LUODAN/may/seal_pending/dp0_CFX12_Solution/CFX12_021: |
| |
| pids, mon |
+———————————————————————+

+———————————————————————+
| For CFX runs launched from Workbench, the final locations of |
| directories and files generated may differ from those shown. |
+———————————————————————+

+———————————————————————+
| Warning! |
| |
| After waiting for 60 seconds, 1 solver manager process(es) appear |
| not to have noticed that this run has ended. You may get errors |
| removing some files if they are still open in the solver manager. |
+———————————————————————+

LUO DAN is offline

 

Reply With Quote

Old
  February 25, 2019, 00:30

Default

 
#11

Super Moderator

 

Glenn Horrocks

Join Date: Mar 2009

Location: Sydney, Australia

Posts: 17,275

Rep Power: 136

ghorrocks is just really niceghorrocks is just really niceghorrocks is just really niceghorrocks is just really nice

You have a bounds error on the Absolute Pressure. The highest pressure CFX can handle is very high, 1E10Pa I suspect, but the absolute pressure cannot be zero or negative in compressible simulations. So something is causing the absolute pressure to exceed these limits.

Most of the time this is caused by numerical instability and the simulation is diverging. In this case this FAQ is relevant: https://www.cfd-online.com/Wiki/Ansy…do_about_it.3F

In a small number of highly specialised cases the negative absolute pressure is real. These cases cannot be modelled by CFX. But they really only occur in extreme MEMS modelling cases so most people don’t come across this (fortunately).

__________________
Note: I do not answer CFD questions by PM. CFD questions should be posted on the forum.

ghorrocks is offline

 

Reply With Quote

Old
  February 25, 2019, 01:41

Default

 
#12

New Member

 

luo dan

Join Date: Sep 2018

Posts: 27

Rep Power: 6

LUO DAN is on a distinguished road

May multi-core parallel computing cause this Problem? Because the same case i simulated before did not arise such problems.Thank you!

LUO DAN is offline

 

Reply With Quote

Old
  February 25, 2019, 05:44

Default

 
#13

Super Moderator

 

Glenn Horrocks

Join Date: Mar 2009

Location: Sydney, Australia

Posts: 17,275

Rep Power: 136

ghorrocks is just really niceghorrocks is just really niceghorrocks is just really niceghorrocks is just really nice

It is unusual for multiprocessor to cause this type of error, but not impossible. It still suggests numerical instability, so the FAQ is applicable for that.

__________________
Note: I do not answer CFD questions by PM. CFD questions should be posted on the forum.

ghorrocks is offline

 

Reply With Quote

Old
  February 25, 2019, 07:32

Default

 
#14

Senior Member

 

Gert-Jan

Join Date: Oct 2012

Location: Europe

Posts: 1,664

Rep Power: 25

Gert-Jan will become famous soon enough

Quote:

Originally Posted by LUO DAN
View Post

the error is as follows:
thank you!

Slave: 9
Slave: 9 Fatal bounds error detected
Slave: 9 —————————
Slave: 9 Variable: Absolute Pressure
Slave: 9 Locale : S1

……….

+———————————————————————+
| Warning! |
| |
| After waiting for 60 seconds, 1 solver manager process(es) appear |
| not to have noticed that this run has ended. You may get errors |
| removing some files if they are still open in the solver manager. |
+———————————————————————+

In order to really help you, we need to know what problem your are solving and what settings you used. Please share in text format.

Gert-Jan is offline

 

Reply With Quote

Old
  February 26, 2019, 02:25

Default

 
#15

New Member

 

luo dan

Join Date: Sep 2018

Posts: 27

Rep Power: 6

LUO DAN is on a distinguished road

I simulated the carbon dioxide two-phase flow in the seal. The settings are listed in the following text. Thank you!

LUO DAN is offline

 

Reply With Quote

Old
  February 26, 2019, 04:40

Default

 
#16

Super Moderator

 

Glenn Horrocks

Join Date: Mar 2009

Location: Sydney, Australia

Posts: 17,275

Rep Power: 136

ghorrocks is just really niceghorrocks is just really niceghorrocks is just really niceghorrocks is just really nice

You are using a complex and custom material model. They often cause strange errors and are often very hard to converge.

I see you have set a very small timescale factor, which suggests you have already tried lowering the time step to get convergence.

This simulation is too complex to debug on the forum. All I can recommend is the general procedure for getting complex material models to converge:
1) Do a run using simple, built-in material models, maybe single phase ideal gas. Make sure this runs well and stably before proceeding.
2) Then add the complex components one at a time. In your case maybe do a single phase model using your RGP model.
3) Then do a model using multiphase, but simple fluid properties (maybe ideal gas)
4) If all your complex models work OK by themselves then try combining them. Be prepared for all sorts of new and unexpected error messages

__________________
Note: I do not answer CFD questions by PM. CFD questions should be posted on the forum.

ghorrocks is offline

 

Reply With Quote

Old
  February 26, 2019, 14:18

Default

 
#17

Senior Member

 

Gert-Jan

Join Date: Oct 2012

Location: Europe

Posts: 1,664

Rep Power: 25

Gert-Jan will become famous soon enough

Looks like you are trying to solve a degassing process, with nucleation, while assuming equilibrium conditions. At least I don’t see any user defined source terms.

Neverthelss this is a very difficult problem. I fully agree with Glenn. Start as simple as possible and step by step increase complexity by adding physics.


Last edited by Gert-Jan; February 27, 2019 at 04:05.

Gert-Jan is offline

 

Reply With Quote

Old
  February 27, 2019, 01:29

Default

 
#18

New Member

 

luo dan

Join Date: Sep 2018

Posts: 27

Rep Power: 6

LUO DAN is on a distinguished road

Yes, I have tried to simulate my case with single-phase RGP and it had a good convergence. I continued to simulate two-phase co2 flow and took the above results as initial conditions. But it is so hard to converge yet at a very small time step. Is this reasonable?

LUO DAN is offline

 

Reply With Quote

Old
  February 27, 2019, 06:42

Default

 
#19

Super Moderator

 

Glenn Horrocks

Join Date: Mar 2009

Location: Sydney, Australia

Posts: 17,275

Rep Power: 136

ghorrocks is just really niceghorrocks is just really niceghorrocks is just really niceghorrocks is just really nice

If the simulation is unstable then a very small time step will be required. If the simulation is not correctly set up then it will diverge no matter what time step you use.

You are doing a very complex simulation and you should expect it to be difficult to work working. It is also too complex to debug over the forum as it would take an expert many hours to do I suspect, and nobody has that sort of time to give to the forum. You are going to have to work this one out yourself I suspect.

__________________
Note: I do not answer CFD questions by PM. CFD questions should be posted on the forum.

ghorrocks is offline

 

Reply With Quote

Old
  April 4, 2022, 08:10

Default
Very helpful answers

 
#20

New Member

 

No Real Name

Join Date: Sep 2016

Posts: 4

Rep Power: 8

meusha is on a distinguished road

Quote:

Originally Posted by ghorrocks
View Post

Your simulation is not converging well. Need to improve the numerical stability — smaller timesteps, better mesh quality and check the physics.

This is all you ever hear on this forum, unfortunately.

meusha is offline

 

Reply With Quote

Q. After temperature dependent properties in my model, I get the following error message. What does it mean?

Fatal bounds error detected
—————————
Variable: AirVarProps.Specific Heat Capacity at Constant Pressure
Locale : Domain 1

+———————————————————————+
| Writing crash recovery file |
+———————————————————————+

Fatal bounds error detected
—————————
Variable: AirVarProps.Specific Heat Capacity at Constant Pressure
Locale : Domain 1

+———————————————————————+
| An error has occurred in cfx5solve: |
| |
| The CFX-5 solver exited with return code 1. No results file has |
| been created. |
+———————————————————————+

End of solution stage.

Q. Although the error message is referring to a «bounds error», it is really telling you that it has calculated an impossible value for that property. Most likely, this means that the value is negative, a fatal error for properties such as density, viscosity, thermal conductivity, and specific heat capacity.

If you have written an expression, go to the expression editor and plot it to be sure no values are negative over the expected temperature range.

Also, keep in mind that CFX generates a table from the input expression, and that the default range of the table is 100K to 3000K for temperature. Plot the expression for this range and if any values become unreasonable, either change the expression or change the range to be in line with expected results. Often times the range of temperatures expected in the problem will be far less than the default table rang


Q. After temperature dependent properties in my model, I get the following error message. What does it mean?

Fatal bounds error detected
—————————
Variable: AirVarProps.Specific Heat Capacity at Constant Pressure
Locale : Domain 1

+———————————————————————+
| Writing crash recovery file |
+———————————————————————+

Fatal bounds error detected
—————————
Variable: AirVarProps.Specific Heat Capacity at Constant Pressure
Locale : Domain 1

+———————————————————————+
| An error has occurred in cfx5solve: |
| |
| The CFX-5 solver exited with return code 1. No results file has |
| been created. |
+———————————————————————+

End of solution stage.

Q. Although the error message is referring to a «bounds error», it is really telling you that it has calculated an impossible value for that property. Most likely, this means that the value is negative, a fatal error for properties such as density, viscosity, thermal conductivity, and specific heat capacity.

If you have written an expression, go to the expression editor and plot it to be sure no values are negative over the expected temperature range.

Also, keep in mind that CFX generates a table from the input expression, and that the default range of the table is 100K to 3000K for temperature. Plot the expression for this range and if any values become unreasonable, either change the expression or change the range to be in line with expected results. Often times the range of temperatures expected in the problem will be far less than the default table range.


Show Form

No comments yet. Be the first to add a comment!

The Effective Type Sanitizer — Dynamically Typed C/C++

EffectiveSan is a compiler tool that automatically inserts dynamic
(i.e., runtime) type and bounds checking into C/C++ programs. The aim of
EffectiveSan is to detect memory errors and type bugs in arbitrary C/C++ code.

Background

C and C++ are examples of statically typed programming languages, meaning
that types are checked at compile time and not at runtime. Furthermore, C
and C++ are weakly typed programming languages that allow the type system
to be bypassed, including:

  • Arbitrary Casts, e.g., casting from a (T *) to an (S *) is
    possible (both explicitly and implicitly via operations like memcpy); and
  • No Bounds Checking, e.g., if reading the ith element
    of a (int[50]) array object, then it is never checked
    (statically or dynamically) that (i < 50); and
  • Use-after-free (allowing possible type mutation) is also possible.

Weak static typing is primarily motivated by flexibility and efficiency
(dynamic type and bounds checking is expensive). However, this also means
that the programmer is responsible for ensuring that types are not violated at
runtime. In practice, the programmer does not always get it right, and bugs
relating to type violations are common and potentially serious. For example,
consider the following «benign» code snippet:

    struct S {int a[3]; char *p;};
    struct T {float f; struct S s;};

    int get(struct T *t, int idx)
    {
        return t->s.a[idx];
    }

This snippet is well-typed according to standard C/C++ static type checking.
However, at runtime, a lot can go wrong:

  • Type Confusion Errors: Pointer t may be of the wrong type:

      S *s = (S *)malloc(sizeof(struct S));
      get((T *)s, 2);
    
  • Use-after-free Errors: Pointer t may have been free‘ed:

  • (Sub-)Object Bounds Errors: Index idx may be outside the bounds of
    the (sub-)object (a):

In practice, type and memory errors can be a lot more subtle, and are a common
source of security vulnerabilities, program bugs, and other undefined
behavior. For example, such errors are commonly exploited for control flow
hijacking
attacks, e.g., by overwriting the virtual function table pointer
(vptr) of C++ objects. This can be achieved in several ways using the
runtime errors described above, including:

  • Using a object bounds overflow from object A to B to
    directly overwrite B.vptr;
  • Using a sub-object bounds overflow within the same object B to
    directly overwrite B.vptr;
  • Using type confusion to cast a pointer p to B to a different type,
    then overwrite B.vptr indirectly using a «valid» operation on p; and
  • Using a Use-after-free similar to type confusion, where previously
    free‘ed pointer p points to a different type.

Assuming an attacker can overwrite the vptr with a suitable value, control
flow can then be hijacked using a call to a virtual function.

Beyond security, it is often useful to detect and eliminate deliberate
type-based undefined behavior—so-called type abuse—since it can harm
code quality/portability. For example, one idiom we have observed in the wild
is to implement C++-style inheritance using structures with overlapping
members, e.g.:

    struct Base { int x; float y; };
    struct Derived { int x; float y; char z; };

We have observed such idioms in SPEC2006’s perlbench and povray benchmarks
(despite povray being a C++ program).
Such idioms may violate the compiler’s Type Base Aliasing Analysis (TBAA)
assumptions, causing code to be miscompiled, else requiring special compiler
options such as -fno-strict-aliasing. Type abuse may also mask dangerous
(security critical) type errors as well.

Dynamic Typing for C and C++

The Effective Type Sanitizer (EffectiveSan) is a tool for instrumenting
C/C++ programs with dynamic type checks—effectively transforming C/C++
into a dynamically typed programming languages. The instrumented dynamic
type check compares the runtime type of an object (a.k.a. the effective
type
using C standard terminology) against the static type declared in the
code. An error will be logged if there is a mismatch.

For example, EffectiveSan will instrument the get() function by adding
type and bounds checks:

    int get(struct T *t, int idx)
    {
        BOUNDS b = type_check(t, struct T);     // Inserted type check
        b = bounds_narrow(b, t->s.a);           // Inserted bounds narrow
        int *tmp = &t->s.a;
        bounds_check(tmp, b);                   // Inserted bounds check
        return tmp[idx];
    }

Here, three additional operations are inserted:

  • type_check checks that the dynamic type of pointer t matches the
    static type (struct T). This means that t must point to either an
    object of type struct T, a sub-object of type struct T of some
    larger object, or a (sub-)object of some other type coercible to type
    struct T (e.g., a character array char[]). If the type is compatible,
    the dynamic (sub-)object bounds is returned.
  • bounds_narrow narrows the bounds b to the sub-object of interest.
    In this case, the sub-object is s.a.
  • bounds_check verifies that the memory access is within the narrowed
    bounds.

If either type_check or bounds_check fails then an error will be logged.
By default, all logged errors are printed to stderr when the program exits
(EffectiveSan does not stop execution, although this is configurable).

The inserted instrumentation can detect type and memory errors described
above. For example, consider the type error:

    S *s = (S *)malloc(sizeof(struct S));
    get((T *)s, 2);

Then running this program results in a type error:

    TYPE ERROR:
            pointer  = 0x30a12d3740 (heap)
            expected = struct T
            actual   = struct S { int32_t[3]; /*0..12*/ int8_t *; /*16..24*/ } [+0]
                       >int32_t [+0]

Here:

  • pointer is the pointer value, which happens to be allocated from the heap;
  • expected is the expected type, which in this case is (struct T); and
  • actual is the actual dynamic type of the pointer. The «actual» type
    is represented as a set of (type [+offset]) pairs, starting from the
    allocation type of the object (struct S), all the way to the type of
    the inner-most sub-object at the same offset (int32_t, a.k.a. int).
    Offset values are in bytes. Any pair with zero offset (i.e., [+0])
    represents a valid type for the pointer. In other words, this is a type
    error because there is no «actual» type pair corresponding to
    (struct T [+0]).

Next, consider the use-after-free error:

EffectiveSan considers «free» objects to have a special «<free memory>»
type. This allows use-after-free errors to be detected as a special kind
of type error:

    USE-AFTER-FREE ERROR:
            pointer  = 0x4034b5bfd0 (heap)
            expected = struct T
            actual   = <free memory> [+0]

Finally, consider the (sub-)object bounds error:

EffectiveSan uses dynamic typing and bounds narrowing to detect sub-object
bounds errors:

    SUBOBJECT BOUNDS ERROR:
            pointer  = 0x405efddc68 (heap)
            type     = struct T { float32_t; /*0..4*/ struct S; /*8..32*/ } [+8..+20]
                       >struct S { int32_t[3]; /*0..12*/ int8_t *; /*16..24*/ } [+0..+12]
                       >>int32_t [+0..+12]
            bounds   = 0..12 (8..20)
            access   = 16..20 (24..28)

Here:

  • pointer is the pointer value, similar to before;
  • type is a set of (type [+lb..+ub]) triples representing the dynamic type
    of the accessed (sub-)object, and the accessed sub-object’s bounds. Bounds
    are pairs of byte offsets;
  • bounds is the bounds of the accessed sub-object in (1) bounds relative
    to the start of the sub-object, and (2) bounds relative to the start of
    the allocation; and
  • access is the bounds of the memory access, relative to (1) and (2)
    explained above.

Using the above instrumentation, EffectiveSan can detect multiple classes of
errors, including type confusion, object bounds errors, sub-object bounds
errors, and use-after-free errors—all using the same underlying methodology.

How EffectiveSan Works

We give a very brief overview on how some of the internals of EffectiveSan
work. For more detailed information, please see our paper (see further
reading
below) or study the source code.

EffectiveSan consists of three main components:

  1. A «modified» clang front-end that preserves high-level
    C/C++ type information as LLVM IR meta-data.
  2. A LLVM-instrumentation pass that inserts type/bounds checks, as well as
    replaces memory allocation with the «typed» version.
  3. A run-time support library that implements the meta data tracking scheme.

The runtime system for tracking dynamic types is the main innovation. The
basic idea is to build on top of low fat pointers which are a system for
tracking the bounds (size and base) of allocated objects, which was
originally developed for bounds checking. Instead, EffectiveSan uses low fat
pointers to store type meta data at the base of allocated objects. For
example, consider the memory allocation:

    q = (struct T *)malloc(sizeof(struct T));

Then, under EffectiveSan, the memory layout will be as follows:

EffectiveSan object layout.

Here (META) is the EffectiveSan object meta data comprising (1)
a reference to the type meta data for (struct T), and (2) the
size (array length) of the allocation. Note that META is stored in the
memory immediately before the allocated object. The memory layout of
the object itself is unchanged, which is critical for compatibility.
EffectiveSan similarly transforms stack and global objects.

The combined META and object are allocated using the low-fat pointer
allocator. This means that any interior pointer, e.g., (p) can be
efficiently mapped to the base address (base(p)) which contains META. The
EffectiveSan runtime therefore has access to the following information:

  1. The dynamic type of (q) (from META);
  2. The static type of (p) (from the source code); and
  3. The offset between (p) and (q) (calculated).

The EffectiveSan runtime maps the (dynamic-type, static-type, offset)
triple to (sub-)object bounds (relative to p) using a layout hash table
stored in the type meta data. For example, the layout hash table for (struct T) is as follows:

    (T, T, 0) ---> -oo..oo     (T, float, 0) ---> 0..4     (T, S, 4) ---> 0..20
    (T, int, 4) ---> 0..12     (T, int, 8) ---> -4..8      (T, int, 12) ---> −8..4
    (T, char *, 16) ---> 0..8

Each entry represents a (sub-)object for type (struct T).

For example, if p has the static type (int *), the corresponding
triple will be (T, int, 12). This triple corresponds to the sub-object
(T.s.a), and the sub-object bounds (in bytes) is therefore p-8..p+4.

In contrast, if p has the static type (float *), the corresponding
triple will be (T, float, 12). This triple has no entry (i.e., a hash table
miss), meaning that a type error is detected.

Benchmarks

We compare EffectiveSan against an uninstrumented baseline. For these tests,
we use the standard SPEC2006 benchmark suite:

EffectiveSan SPEC2006 timings.

We also compare two different versions of EffectiveSan:

  • EffectiveSan; and
  • EffectiveSan (no logging).

Both versions use the EFFECTIVE_SINGLETHREADED runtime option (SPEC2006 is
single-threaded), and the «no logging» version uses the additional
EFFECTIVE_NOLOG runtime option. The former represents «normal» use, which
includes both instrumentation and logging overheads, while the latter is a
better estimate of the instrumentation overhead only. Some benchmarks,
notably perlbench and gcc, generate many errors meaning the logging
overhead is more significant.

Overall we see that EffectiveSan (logging) is 3.53x the uninstrumented
baseline, and EffectiveSan (no logging) is slightly faster at 3.41x the
baseline. For reference, a typical bounds checking sanitizer (e.g.,
AddressSanitizer), that does no type nor sub-object bounds checking, has a
typical overhead of 1.5-2.0x. EffectiveSan is intended to be a trade-off:
although it is generally more expensive it is also more comprehensive in
the class and number of errors detected.

EffectiveSan exhibits a low memory overhead of 1.12x for SPEC2006.

EffectiveSan detects may type, (sub-)object bounds, and use-after-free errors
in the SPEC2006 benchmarks, some known and some new. These include:

  • A use-after-free bug in perlbench (test benchmark only,
    previously found by ASAN).
  • A bounds overflow error in h264ref (previously found by ASAN).
  • Sub-object bounds overflow errors in gcc, h264ref and soplex.
  • Multiple type errors.

The type errors are summarized in the paper (see further reading below).
Most type errors appear to be related to type abuse, i.e., deliberate type
errors introduced by the programmer. Although sometimes it is hard to be sure
without knowing the programmer’s intent. Some examples of type errors
include:

  • xalancbmk uses bad C++ downcasts, e.g. casting a DTDGrammar to a
    SchemaGrammar;
  • gcc/sphinx3 casting objects to (int[]) to calculate hash values or
    checksums;
  • gcc using incompatible definitions (over different modules) for the «same»
    type;
  • bzip2/lbm confusing fundamental types (e.g., int vs float, etc.);
  • perlbench/povray‘s ad hoc implementation of C++-style
    inheritance by defining structures with a common shared
    prefix.

Further Reading

For more detailed information EffectiveSan, please see our PLDI’2018 paper:

  • Gregory J. Duck and Roland H. C. Yap, EffectiveSan: Type and
    Memory Error Detection using Dynamically Typed
    C/C++
    ,
    Programming Language Design and Implementation (PLDI’18), 2018

EffectiveSan is built on top of our earlier work on low fat pointers. More
information can be found here:

  • Gregory J. Duck, Roland H. C. Yap, Heap Bounds Protection with Low Fat
    Pointers
    ,
    Compiler Construction (CC’16), 2016
  • Gregory J. Duck, Roland H. C. Yap, Lorenzo Cavallaro, Stack Bounds
    Protection with Low Fat
    Pointers
    ,
    Network and Distributed System Security Symposium (NDSS’17), 2017
  • Implementation:
    https://github.com/GJDuck/LowFat

Usage

Installing

EffectiveSan releases can be downloaded from here:
https://github.com/GJDuck/EffectiveSan/releases

EffectiveSan is implemented as a modified version of clang/LLVM for the
x86_64/Linux architecture. To use, simply extract the distribution into
your desired location, e.g.:

    $ tar xvfJ effectivesan-VERSION.tar.xz

No other special installation steps are required.

Running

To instrument a program using EffectiveSan, simply compile using the
special modified clang/clang++ and the -fsanitize=effective -O2 options:

    $ effectivesan-VERSION/bin/clang -fsanitize=effective -O2 program.c
    $ effectivesan-VERSION/bin/clang++ -fsanitize=effective -O2 program.cpp

Note that EffectiveSan assumes -O2 optimization level in order to work
correctly. Next, the resulting executable can be run as normal:

A logged error messages should be printed to stderr when the program exits.

Note that it is common for the same type or bounds error to occur multiple
times during program execution. By default, EffectiveSan will «group»
similar errors, so as not to make the error log too long. Grouping behavior
can be changed or disabled using the EFFECTIVE_VERBOSITY runtime option
(see options below).

Options

EffectiveSan supports several compiler options listed below. To pass options
to EffectiveSan, use the -mllvm clang option, e.g.,
-mllvm -effective-no-globals:

  • -effective-no-escapes: Do not instrument pointer escapes.
  • -effective-no-globals: Do not replace global variables.
  • -effective-no-stack: Do not replace stack allocations.
  • -effective-blacklist blacklist.txt: Do not instrument entries from
    the blacklist.txt file (in special case list format).
  • -effective-warnings: Enable instrumentation warning messages.
  • -effective-max-sub-objs max: Set max to be the maximum number of
    sub-objects per type meta data.

In addition to the compiler time options, EffectiveSan also supports
several runtime options that can be set via environment variables:

  • EFFECTIVE_NOTRACE=1: Do not print error stack traces (default off).
  • EFFECTIVE_NOLOG=1: Do not print the log altogether (default off).
  • EFFECTIVE_SINGLETHREADED=1: Assume the program is single-threaded
    (default off).
  • EFFECTIVE_ABORT=1: Crash the program after printing the error report.
  • EFFECTIVE_MAXERRS=N: Abort the program after N errors
    (default SIZE_MAX).
  • EFFECTIVE_VERBOSITY=(0|1|2|9): Set error verbosity level, where higher
    means less error grouping, and 9 means no grouping (default 0).
  • EFFECTIVE_LOGFILE=filename.txt: Dump the log to filename.txt rather
    than stderr.

Building

It is also possible to build EffectiveSan from source. To do so, simply
extract the source distribution and run the build.sh script:

The entire build process should be automatic.

To also build the binary distribution, use the following command instead:

FireFox

It is possible to build a version of the FireFox web browser using
EffectiveSan. For this, run the following commands:

    $ cd firefox
    $ ./setup-firefox-build.sh

If this succeeds, then:

    $ cd firefox-52.2.1esr
    $ ./mach build
    $ ./mach run

Notes:

  • Building has only been tested on our own machine:
    «Linux box 4.13.0-45-generic #50~16.04.1-Ubuntu SMP Wed May 30 11:18:27 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux»
    using a Xeon E5-2630 v4 CPU with 32Gb of RAM.
    The build may not work on a different machine.
  • The build process will generate some TEST-UNEXPECTED-FAIL warnings about
    text relocations. These should not stop the build from running.
  • The resulting FireFox is noticeably slower. This is to be expected.
  • The resulting build has not been extensively tested and should be considered
    unstable. Some websites like youtube do not work, but most others
    appear OK under our tests. Building a stable version of FireFox, including
    removing errors caused by custom memory allocators (CMAs), would require
    considerable effort. We do not intend to maintain a stable build at this
    time.

Features

In addition to the core features (type/bounds/UAF-checking) highlighted
above, EffectiveSan also supports the following:

  • Unions: unions are supported by EffectiveSan the same way
    structs are. The only difference between a union and a struct
    is that the offset of each member is always zero. Otherwise, the
    internal representation and handling is the same.
  • Flexible array members: These are structs where the last member
    has a flexible size. For example, with
    (struct vector {int len; int data[];}), then data is a
    flexible array member indicated by an unspecified array size.
    EffectiveSan supports types with flexible array members.
  • C++ inheritance: Base classes are treated as a special kind of
    sub-object. Virtual inheritance is also supported.
  • Automatic coercions: EffectiveSan can automatically coerce memory
    of type (char[]) to any type, and vice versa. Similarly, EffectiveSan
    will automatically coerce (void *) and (T *), and vice versa.
    Coercions usually incur extra overhead (more hash table lookups).
  • Good compatibility: EffectiveSan has been designed to (1) not
    change the layout of objects in memory, and (2) not change the
    Application Binary Interface (ABI). For these reasons, EffectiveSan
    should achieve good compatibility with most existing software.
  • Fast (for what it is): Some effort has been invested into optimizing
    EffectiveSan’s instrumentation and runtime system. That said, there is
    probably more room for improvement.

Limitations

EffectiveSan is a complex sanitizer and therefore has some limitations:

  • LowFat limitations:
    EffectiveSan is built on top of low fat pointers
    LowFat and inherits many of
    LowFat’s underlying limitations. The main inherited limitations are:

    • Escaping Pointers
    • Global Variables
    • Operating System
    • Modern 64bit CPUs
    • Stack Object Ordering
    • Custom Stacks
    • Runtime Hardening
    • Spectre
    • Low Level Hacks
  • Compilation crashes with a fatal error:
    For example, with «fatal error: error in backend: Cannot select: ...«.
    This appears to be a bug in clang itself (and has already been reported).
    More information can be found here:
    GJDuck/LowFat#12.
    A work-around is to disable instrumentation for global variables:
    -mllvm -effective-no-globals -mcmodel=small.
  • Assembly: In order to support global variables, LowFat and EffectiveSan
    use the large code model. This also means that any inline or mixed
    assembly must also respect the large code model, else linker errors
    (relocation truncated to fit) will occur. A work-around is to disable
    instrumentation for global variables:
    -mllvm -effective-no-globals -mcmodel=small.
  • Malloc type:
    EffectiveSan assumes that the first cast determines the allocation type for
    pointers returned by malloc (and family). Otherwise, if there is no
    cast, the type will be left as (char[]). Globals, stack allocations,
    and C++ new have explicitly declared types so no guessing is required.
  • Simple errors may be missed:
    EffectiveSan may fail to detect «simple» errors that are statically visible,
    e.g., int x[100]; x[101] = 3;.
    This is mainly because clang/LLVM will «optimize» away such errors
    before the EffectiveSan instrumentation pass.
    EffectiveSan aims to detect «dynamic» errors only.
  • Use-after-free (UAF) error detection is incomplete:
    EffectiveSan does not detect use-after-free errors that occur after a type
    check. Furthermore, EffectiveSan does not detect use-after-free errors where
    the free’ed object is reallocated to an object of the same type. Complete
    use-after-free detection in multi-threaded environments is difficult because
    of the race between the pointer dereference, deallocation, and the
    UAF-check.
  • C/C++ undefined behavior:
    EffectiveSan does not aim to implement a strict interpretation of type-based
    undefined behavior under the C/C++ standards. To do so would require
    tracking properties such as pointer provenance. Nevertheless,
    EffectiveSan is a reasonable first approximation.
  • Custom Memory Allocators (CMAs):
    EffectiveSan assumes that the program uses standard memory allocators,
    such as malloc for C and new for C++. If the program uses Custom
    Memory Allocators
    (CMAs) then EffectiveSan may fail to correctly type
    objects, leading to missed errors or false positives. CMAs are a problem
    for many dynamic analysis tools not just EffectiveSan.
  • Limited multi-dimensional array support:
    EffectiveSan will flatten top-level multi-dimensional array objects, e.g.
    int *x = new int[3][4] will be treated the same as
    int *x = new int[12]. This is because the current implementation of
    the object meta data (META) can only encode one array length.
    Multi-dimensional array sub-objects (e.g. a struct member) are handled better.
  • Some types are treated as equivalent:
    Due to the limitations of the clang frontend, some types are treated
    as equivalent. For example, int and enums, void * and vptrs,
    pointers and references, etc.
  • Type errors on never-executed paths:
    EffectiveSan typically inserts the type check near the location where
    a pointer is «created», e.g., the start of a function for a pointer
    argument. This means it is possible that a type error will be reported
    even if the dereference is never reached. Early type checks are generally
    faster, e.g., once outside a loop versus every loop iteration.
  • Sub-object merging:
    EffectiveSan will «merge» overlapping sub-objects with the same type but
    different bounds. For example, given
    (union U { struct { int pad; float x[2]; } s; float y[2]; }),
    then overlapping sub-objects x and y will be «merged» into a single
    sub-object of type (float[3]). This can only occur for unions.
  • Incomplete type annotations:
    EffectiveSan relies on a «modified» clang front-end to pass C/C++ type
    information down to the LLVM IR level. However, clang was never designed
    to do this, and we are not expert clang hackers, so some type information
    is likely to be incomplete. Our main focus was on ensuring that the SPEC2006
    benchmarks were reasonably covered.
  • Meta data size limits:
    If a type has too many sub-object (see -effective-max-sub-objs limit,
    default 10000), the generated meta data may be incomplete, resulting in
    missed or spurious errors.
  • Invalid type meta data error:
    This error may occur if the runtime meta data gets overwritten or unloaded
    somehow (e.g., by uninstrumented code).
  • Error classification:
    EffectiveSan attempts to classify errors as TYPE, BOUNDS,
    SUB-OBJECT-BOUNDS and USE-AFTER-FREE. However, these are «best guesses»
    and sometimes EffectiveSan may mis-classify errors. For example:

    • Type confusion may manifest as a sub-object bounds error;
    • Bounds errors (for escaping pointers) may manifest as use-after-free
      errors or type errors.

The EffectiveSan option -effective-warnings will enable warnings about
meta data limitations, if any. The following explains the meaning of
each possible warning message:

  • missing type meta data for value...: The modified clang frontend failed
    to annotate a value. This usually indicates there is a missing case
    somewhere in the frontend.
  • type (T) is a forward declaration...: A type annotation exists but it
    is a forward declaration (empty definition). The frontend has been modified
    to avoid emitting forward declarations, however, the modifications are not
    perfect. This may result in type errors that are false positives.
  • type (T) has too many sub-objects...: The -effective-max-sub-objs
    limit was reached, so EffectiveSan attempts to delete some (probably unused)
    sub-objects. However, this may result in type errors that are
    false positives.
  • unable to instrument constant pointer cast...: There appears to be
    no way to annotate constant expressions (as far as we know),
    so EffectiveSan cannot yet instrument such casts.

We have mainly focused on minimizing such warnings for the SPEC2006
benchmarks. Other software may yield different results.

FAQ

Q: Why do we need EffectiveSan when we already have AddressSanitizer?

AddressSanitizer is a popular tool for detecting memory errors
such as bounds overflows and use-after-free errors. EffectiveSan can also
detect these kinds of errors, as well as other classes of error that
AddressSanitizer cannot detect, such as:

  • Sub-object overflows: Bounds errors within the same object.
    AddressSanitizer can only detect overflows that escape the bounds of
    the allocation; and
  • Type errors: Accessing memory using the wrong type.

In addition to AddressSanitizer, there are a whole bunch of dynamic
memory and type error detection tools.
The main difference is that EffectiveSan attempts to be as comprehensive
as possible, i.e., detecting all type/memory errors using a single
underlying methodology.

Q: Why does EffectiveSan report so many type errors? Most type errors are
harmless.

EffectiveSan reports anything deemed to be a type violation. Programmers
sometimes introduce «deliberate» type errors for various reasons, such as
convenience, efficiency, «cool hacks», etc. This is so-called «type abuse«.
EffectiveSan does not (and cannot) distinguish between accidental and
deliberate type errors.

Although type abuse may seem harmless, it is nevertheless undefined behavior
under the C/C++ standards. One common problem is the interaction of type
abuse
and Type Based Alias Analysis (TBAA), which can result in the program
being «mis-compiled».

Q: Why does EffectiveSan report type errors for std::map and std::set?

For example, one typical type error is something like the following:

    TYPE ERROR:
            pointer  = 0x1a00d02adb0 (heap)
            expected = struct std::_Rb_tree_node<std::pair<xalanc_1_8::XalanQNameByReference const, xalanc_1_8::ElemTemplate const*> >
            actual   = ...
                       ...
                       >>>>>struct std::_Rb_tree_node_base { int32_t; /*0..4*/ struct std::_Rb_tree_node_base *; /*8..16*/ struct std::_Rb_tree_node_base *; /*16..24*/ struct std::_Rb_tree_node_base *; /*24..32*/ } [+0]

These errors are caused by a combination of (1) bad casts in C++ standard
library header files, and (2) EffectiveSan may report type errors on
never-executed paths. For example, the end() method contains a bad cast
from a _Rb_tree_node_base to a _Rb_tree_node (a.k.a. _Link_type):

    _Link_type
    _M_end() _GLIBCXX_NOEXCEPT
    { return reinterpret_cast<_Link_type>(&this->_M_impl._M_header); }

This appears to be «type abuse» rather than an actual bug. These bad casts are
also detected by other dynamic type checking tools such as CaVer.

Follow-up Work

  • C. Poncelet et al.,
    So Many Fuzzers, So Little Time,
    ASE 2022,
    This paper combines EffectiveSan with fuzzing, and found several
    vulnerabilies in the Contiki-NG Network Stack that were not exposed by ASAN.

Versions

EffectiveSan should be considered alpha quality software. Since EffectiveSan
is relatively complex, there are probably a lot of bugs. Please report issues
here: https://github.com/GJDuck/EffectiveSan/issues

The released version of EffectiveSan has been improved since the prototype
evaluated in the paper. Generally, the released version is faster (3.41x
alpha overhead versus 3.88x for the prototype), contains fewer bugs, and
offers more comprehensive error detection. The featured SPEC2006 issues
reported in the paper text should be reproducible using the released alpha
version. The issue count (Figure 7) should be generally consistent with the
released alpha version, although the exact counts have changed due to bug
fixes and different C++ standard library versions.

Thanks

This research was partially supported by a grant from
the National Research Foundation, Prime Minister’s Office,
Singapore under its National Cybersecurity R&D Program
(TSUNAMi project, No. NRF2014NCR-NCR001-21) and administered
by the National Cybersecurity R&D Directorate.

View previous topic :: View next topic  
Author Message
drrrl
n00b
n00b

Joined: 27 Nov 2004
Posts: 70
Location: Warszawa, Poland

PostPosted: Fri Nov 25, 2005 1:09 pm    Post subject: cyrus and database (bdb) serious problem Reply with quote

Hi,

I’ve got a serious periodical problem with my cyrus databases. It happens irregularly once per month average. There is no visible reason for such behaviour and I’ve got no idea why doeas it happen.

Symptoms:

— logs start to show something like that:

Code:

Nov 25 09:27:05 [ctl_cyrusdb] checkpointing cyrus databases

Nov 25 09:27:05 [ctl_cyrusdb] DBERROR db4: DB_LOGC->get: log record checksum mismatch

Nov 25 09:27:05 [ctl_cyrusdb] DBERROR db4: DB_LOGC->get: catastrophic recovery may be required

Nov 25 09:27:05 [ctl_cyrusdb] DBERROR db4: PANIC: DB_RUNRECOVERY: Fatal error, run database recovery

Nov 25 09:27:05 [ctl_cyrusdb] DBERROR: critical database situation

Nov 25 09:27:14 [imap] DBERROR db4: fatal region error detected; run recovery

Nov 25 09:27:14 [imap] DBERROR: error exiting application: DB_RUNRECOVERY: Fatal error, run database recovery

Nov 25 09:27:18 [imap] DBERROR db4: fatal region error detected; run recovery

Nov 25 09:27:18 [imap] DBERROR: dbenv->open ‘/var/imap/db’ failed: DB_RUNRECOVERY: Fatal error, run database recovery

Nov 25 09:27:18 [imap] DBERROR: init() on berkeley

Nov 25 09:29:28 [imap] DBERROR db4: fatal region error detected; run recovery

Nov 25 09:29:28 [imap] DBERROR: dbenv->open ‘/var/imap/db’ failed: DB_RUNRECOVERY: Fatal error, run database recovery

Nov 25 09:29:28 [imap] DBERROR: init() on berkeley

.

repeating every 1-2 minutes

And about 10 minutes after the original problem appeared the following logs from postfix begun in addition (what is somehow a logical consequence):

Code:

Nov 25 09:43:17 [lmtpunix] DBERROR db4: fatal region error detected; run recovery

Nov 25 09:43:17 [lmtpunix] DBERROR: dbenv->open ‘/var/imap/db’ failed: DB_RUNRECOVERY: Fatal error, run database recovery

Nov 25 09:43:17 [lmtpunix] DBERROR: init() on berkeley

Nov 25 09:43:17 [lmtpunix] DBERROR db4: environment not yet opened

Nov 25 09:43:17 [lmtpunix] DBERROR: opening /var/imap/deliver.db: Invalid argument

Nov 25 09:43:17 [lmtpunix] DBERROR: opening /var/imap/deliver.db: cyrusdb error

Nov 25 09:43:17 [lmtpunix] FATAL: lmtpd: unable to init duplicate delivery database

Nov 25 09:43:17 [master] service lmtpunix pid 14560 in READY state: terminated abnormally

(repeating all the time I mean every second)

— mail stops to be delivered locally (it’s obvious)

— server load goes up (metalog takes a lot of CPU resources)

There is, fortunately, a way to recover the services: a simple ‘/etc/init.d/cyrus restart’ helps immediately:

Code:

Nov 25 11:25:55 [master] exiting on SIGTERM/SIGINT

Nov 25 11:25:55 [deliver] backend_connect(): couldn’t read initial greeting: Connection reset by peer

                — Last output repeated 2 times —

Nov 25 11:25:55 [postfix/pipe] 14FAE53092: to=<anyuser@anything.com-example>, relay=cyrus,

delay=3559, status=deferred (temporary failure. Command output: couldn’t connect to lmtpd: Connection reset by peer_ 421 4.3.0 deliver: couldn’t connect to lmtpd_ )

Nov 25 11:25:55 [deliver] connect(/var/imap/socket/lmtp) failed: Connection refused

Nov 25 11:25:55 [postfix/pipe] E2B7D5309A: to=<anyuser@anything.com-example>, relay=cyrus, delay=572, status=deferred (temporary failure. Command output: couldn’t connect tolmtpd: Connection reset by peer_ 421 4.3.0 deliver: couldn’t connect to lmtpd_ )

Nov 25 11:25:55 [postfix/pipe] 459BA53091: to=<anyuser@anything.com-example>, relay=cyrus, delay=3811, status=deferred (temporary failure. Command output: couldn’t connect to lmtpd: Connection refused_ 421 4.3.0 deliver: couldn’t connect to lmtpd_ )

.

.

.

Nov 25 11:25:56 [master] setrlimit: Unable to set file descriptors limit to -1: Operation not permitted

Nov 25 11:25:56 [master] retrying with 1024 (current max)

Nov 25 11:25:56 [master] process started

Nov 25 11:26:00 [ctl_cyrusdb] recovering cyrus databases

Nov 25 11:26:00 [ctl_cyrusdb] skiplist: recovered /var/imap/mailboxes.db (152 records, 11764 bytes) in 0 seconds

Nov 25 11:26:00 [ctl_cyrusdb] skiplist: recovered /var/imap/annotations.db (0 records, 144 bytes) in 0 seconds

Nov 25 11:26:00 [ctl_cyrusdb] done recovering cyrus databases

Nov 25 11:26:01 [master] ready for work

Nov 25 11:26:01 [ctl_cyrusdb] checkpointing cyrus databases

Nov 25 11:26:01 [tls_prune] tls_prune: purged 1 out of 3 entries

Nov 25 11:26:01 [cyr_expire] duplicate_prune: pruning back 3 days

Nov 25 11:26:01 [cyr_expire] duplicate_prune: purged 0 out of 431 entries

Nov 25 11:26:01 [cyr_expire] expunged 0 out of 0 messages from 0 mailboxes

Nov 25 11:26:04 [ctl_cyrusdb] done checkpointing cyrus databases

So the real question is what is the reason of such unexpected problems?

Is it a problem with cyrus or with bdb? Maybe it would be worth to change the database from bdb to something else (as proposed in this topic for ldap, but to be honest I have no idea if it is possible and how to do it. cyrus does not offer any flags to use another database:

Code:
# equery u cyrus-imapd

[ Searching for packages matching cyrus-imapd… ]

[ Colour Code : set unset ]

[ Legend    : Left column  (U) — USE flags from make.conf              ]

[           : Right column (I) — USE flags packages was installed with ]

[ Found these USE variables for net-mail/cyrus-imapd-2.2.12 ]

 U I

 — — afs      : Adds OpenAFS support (distributed file system)

 — — drac     : Enable dynamic relay support in the cyrus imap server

 — — idled    : Enable idled vs poll IMAP IDLE method

 — — kerberos : Adds kerberos support

 + + pam      : <unknown>

 — — snmp     : Adds support for the Simple Network Management Protocol if available

 + + ssl      : Adds support for Secure Socket Layer connections

 + + tcpd     : Adds support for TCP wrappers

So — please help!

Current version of my software are:

Code:
# eix net-mail/cyrus-imapd

* net-mail/cyrus-imapd

     Available versions:  2.2.10 ~2.2.10-r1 2.2.12 ~2.2.12-r1 ~2.2.12-r2 ~2.2.12-r3

     Installed:           2.2.12

     Homepage:            http://asg.web.cmu.edu/cyrus/imapd/

     Description:         The Cyrus IMAP Server.

# eix sys-libs/db

* sys-libs/db

     Available versions:  1.85-r1 1.85-r2 ~1.85-r3 3.2.9-r7 3.2.9-r10 4.0.14-r2 4.0.14-r3 4.1.25_p1-r3 4.1.25_p1-r4 ~4.2.52_p1 4.2.52_p2 [M]4.3.27

     Installed:           1.85-r2 4.1.25_p1-r4 4.2.52_p2

     Homepage:            http://www.sleepycat.com/

     Description:         Berkeley DB

Thanks in advance, Grzegorz

PS. Half a year ago my cyrus databases have been corrupted and restart did not help (see that topic) so I had to make manual recovery. Now, as I have DB4.2 installed recovery is easier (cyrus restart is enough), but the problem still exist…

Back to top

View user's profile Send private message

Janne Pikkarainen
Veteran
Veteran

Joined: 29 Jul 2003
Posts: 1143
Location: Helsinki, Finland

PostPosted: Sun Nov 27, 2005 4:54 pm    Post subject: Reply with quote

I had this problem with older version of Cyrus, but 2.2.12 has been rock-solid since day one, from about March of this year or so. If you’re still running some older Cyrus, I urge you to upgrade as soon as possible. Newer versions use skiplist as backend for most of the stuff and that has helped me enormously from both reliability and performance point-of-view.

For me older versions kept crashing with default settings because of too many concurrent users to all db files. deliver.db was especially problematic — for example it became very badly corrupted once, causing Cyrus to malfunction every 30 minutes. Nuking deliver.db (since it’s not critical file) helped that time. The problem is that by default BDB uses very small values for its cache (256 kilobytes) and transaction log size (32 kilobytes), which soon becames a Major Pain if your server has lots of users and traffic.

First of all, if you encounter crashes like you’ve seen with Cyrus:

— stop Cyrus

— cd /var/imap/db && db4.2_recover

— start Cyrus

If that doesn’t help:

— stop Cyrus

backup your current /var/imap/*.db and everything under /var/imap/logs/, just in case.

— dump your current mailboxes.db with ctl_mboxlist -d >some_temp_file.txt

— remove your deliver.db

— If using tls sessions, remove also tls_sessions.db

— Put this to /var/imap/db/DB_CONFIG (to give BDB 16 MB cache and 512 kb transaction log size):

Code:
set_cachesize   0    16777216    0

set_lg_bsize   524288



— Reconstruct your mailboxes.db with ctl_mboxlist -u < some_temp_file.txt

— Make sure all the file permissions seem to be ok

— Restart Cyrus and make sure it still works. If it doesn’t, immediately restore your backups.

BerkeleyDB has caused me reliability problems under other programs and distributions (Red Hat), too, so I try to avoid it whenever possible. For example, when I used BDB as backend for SpamAssassin’s bayesian filtering and auto-whitelisting, it caused all kind of deadlocks (not for the whole server, only amavisd-new/SpamAssassin) and mail spool started to grow… and grow… ever since I threw all SA data to MySQL + InnoDB, it has been rock-solid and fast.

Edit: Whoops, didn’t notice at first you’re already using Cyrus 2.2.12. And since your mailboxes.db seems to be already in skiplist format, you should consider migrating deliver.db to skiplist or alternatively disable duplicatesuppression in /etc/imapd.conf.
_________________
Yes, I’m the man. Now it’s your turn to decide if I meant «Yes, I’m the male.» or «Yes, I am the Unix Manual Page.».

Back to top

View user's profile Send private message

drrrl
n00b
n00b

Joined: 27 Nov 2004
Posts: 70
Location: Warszawa, Poland

PostPosted: Sun Nov 27, 2005 9:23 pm    Post subject: Reply with quote

Hello,

Janne Pikkarainen wrote:

BerkeleyDB has caused me reliability problems under other programs and distributions (Red Hat), too, so I try to avoid it whenever possible.

that’s true — I had a lot of problems with BDB and slapd. It was one of the reasons I gave up with openldap…

Quote:

Whoops, didn’t notice at first you’re already using Cyrus 2.2.12. And since your mailboxes.db seems to be already in skiplist format, you should consider migrating deliver.db to skiplist or alternatively disable duplicatesuppression in /etc/imapd.conf.

I have disabled duplicatesuppression long time ago — I like to see every single email coming, especially in case of problems :)

By migrating deliver.db to skiplist you mean the following?

Quote:

If that doesn’t help:

— stop Cyrus

backup your current /var/imap/*.db and everything under /var/imap/logs/, just in case.

— dump your current mailboxes.db with ctl_mboxlist -d >some_temp_file.txt

— remove your deliver.db

— If using tls sessions, remove also tls_sessions.db

— Put this to /var/imap/db/DB_CONFIG (to give BDB 16 MB cache and 512 kb transaction log size):

Code:
set_cachesize   0    16777216    0

set_lg_bsize   524288



— Reconstruct your mailboxes.db with ctl_mboxlist -u < some_temp_file.txt

— Make sure all the file permissions seem to be ok

— Restart Cyrus and make sure it still works. If it doesn’t, immediately restore your backups.

The thing I don’t quite understand is why a simple cyrus restart helps. It seems that either database corruption is not so bad in fact or cyrus has very powerful self-repairing mechanism. Or — the third option — I don’t know something ;-)

Maybe there is a way to migrate from BDB to any other DB for cyrus?

G.

Back to top

View user's profile Send private message

Janne Pikkarainen
Veteran
Veteran

Joined: 29 Jul 2003
Posts: 1143
Location: Helsinki, Finland

PostPosted: Mon Nov 28, 2005 6:02 am    Post subject: Reply with quote

You can migrate any Cyrus .db file to another format with command cvt_cyrusdb.

http://asg.web.cmu.edu/cyrus/download/imapd/install-upgrade.html

Look under «Upgrading from 2.1.x or earlier» — it only tells you how to upgrade mailboxes.db, but the magic can be applied to other .db files as well.

At first I was also very frustrated with OpenLDAP + BDB backend, but after tuning DB_CONFIG file it has been reliable for me. I still don’t trust it 100%, so I take frequent plain text file backups of all my BDB databases. But I honestly think BDB becames more reliable just by growing its cache size and transaction log size; that prevents many deadlock situations and reduces disk I/O.

Anyway, I believe that corruption only occurs whenever Cyrus commits its changes from the transaction logs to actual database files, and even then only if some corrupted part is accessed. That’s why crashes can be very rare.

And why simple restart helps… well, maybe newer versions of Cyrus does that db4.2_recover thing for us during the restart.
_________________
Yes, I’m the man. Now it’s your turn to decide if I meant «Yes, I’m the male.» or «Yes, I am the Unix Manual Page.».

Back to top

View user's profile Send private message

drrrl
n00b
n00b

Joined: 27 Nov 2004
Posts: 70
Location: Warszawa, Poland

PostPosted: Mon Nov 28, 2005 8:00 am    Post subject: Reply with quote

Thanks !

I’ll try this in the Friday’s night ;-)

G.

Back to top

View user's profile Send private message

drrrl
n00b
n00b

Joined: 27 Nov 2004
Posts: 70
Location: Warszawa, Poland

PostPosted: Sat Dec 03, 2005 10:57 pm    Post subject: Reply with quote

Janne Pikkarainen wrote:
You can migrate any Cyrus .db file to another format with command cvt_cyrusdb.

http://asg.web.cmu.edu/cyrus/download/imapd/install-upgrade.html

Look under «Upgrading from 2.1.x or earlier» — it only tells you how to upgrade mailboxes.db, but the magic can be applied to other .db files as well.

Unfortunately just the migration is not a solution.

I run:

Code:
# /etc/init.d/cyrus stop

# /usr/lib/cyrus/cvt_cyrusdb /var/imap/deliver.db berkeley /var/imap/deliver.db.new skiplist

# mv deliver.db.new deliver.db

# /etc/init.d/cyrus start

the conversion was successful:

Code:
Dec  3 23:43:57 [cvt_cyrusdb] skiplist: checkpointed /var/imap/deliver.db.new (449 records, 42832 bytes) in 1 second

but after starting cyrus I got:

Code:
Dec  3 23:45:41 [master] setrlimit: Unable to set file descriptors limit to -1:

Operation not permitted

Dec  3 23:45:41 [master] retrying with 1024 (current max)

Dec  3 23:45:41 [master] process started

Dec  3 23:45:42 [ctl_cyrusdb] DBERROR db4: /var/imap/deliver.db: unexpected file type or format

                — Last output repeated 6 times —

.

.

.

Dec  3 23:45:43 [cyr_expire] DBERROR db4: /var/imap/deliver.db: unexpected file type or format

Dec  3 23:45:43 [cyr_expire] DBERROR: opening /var/imap/deliver.db: Invalid argument

Dec  3 23:45:43 [cyr_expire] DBERROR: opening /var/imap/deliver.db: cyrusdb error

So… it seems that cyrus «knows» somehow, that deliver.db must be in BDB format, while both mailboxes.db and annotations.db as well as /var/imap/user/?/*.seen are in skiplist format. The question is how to change it?

G.

Back to top

View user's profile Send private message

Display posts from previous:   

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Понравилась статья? Поделить с друзьями:

Читайте также:

  • Far cry 5 fix error 000001 скачать
  • Far cry 5 error snowshoe 2096ea55
  • Far cry 5 error 000001 system does not meet minimum requirements как исправить
  • Far cry 5 0000001 ошибка
  • Far cry 4 тормозит как исправить

  • 0 0 голоса
    Рейтинг статьи
    Подписаться
    Уведомить о
    guest

    0 комментариев
    Старые
    Новые Популярные
    Межтекстовые Отзывы
    Посмотреть все комментарии