FieldByName, FindField: too convenient to be honest

I don’t know about you, but I can’t count anymore the number of times I’ve seen this code pattern (in code snippets online as well as in production code):

  while not ADataSet.Eof do
  begin
    [...]
    ADataSet.FieldByName('MyFieldName1').SomePropertyOrMethod;
    ADataSet.FieldByName('MyFieldName2').SomePropertyOrMethod;
    [...]
    ADataSet.Next;
  end;

Using FieldByName in a case like this is certainly very convenient and has a lot of advantages:

  • Clarity and readability. The intent of the code is obvious and there is no question about which Data point you work with.
  • Proximity. You create your Field reference exactly where you need it.
  • Flexibility. You’re not stuck with design-time Fields, your DataSet can represent multiple queries, you just need to know that you have a ‘MyFieldName’ column in this context.
  • Security. FieldByName never returns nil because it raises an exception if the Field does no exist. So, you’re sure not to get an AV when directly using a property or method.

But is has a major problem:

  • It is searching -again- for the same already found Field at each and every record of the DataSet. Like this code has Alzheimer.
  • Knowing that each call to FieldByName is mainly a call to FindField, that can introduce up to 3 new sub-loops within your main loop. With our 2-Field example above, that’s 6 sub-loops that can be added for every row.

It does not seem that bad if it is just an example or some demo code, but as soon as it is published, someone will grab a snippet and a few copy’n'haste later it ends up in production code.

And indeed, I can tell you that this (untold) company’s production code actually contained dozens of cases like the following (anonymized but real) code:

  aQuery.ParamByName('Zone_mask').DataType := ftLargeint;
  aQuery.ParamByName('Zone_mask').Value    := aZoneMask;
  aQuery.ParamByName('Start_date').AsDateTime := fStartDate;
  aQuery.ParamByName('End_date').AsDateTime   := fEndDate;

  aQuery.Open;
  while not aQuery.EOF do
  begin
    effectiveDeviceInfo :=
      TEffectiveDeviceInfo.Create(
         aQuery.FieldByName('Device_code_name').Value,
         aQuery.FieldByName('Device_descr').Value,
         aQuery.FieldByName('Toggle_times').Value,
         aQuery.FieldByName('Comments').Value,
         aQuery.FieldByName('Effective_start_date').Value,
         aQuery.FieldByName('Effective_end_date').Value);

    fChannelDevice.AddObject(
      self.MakeSortKey(
        aQuery.FieldByName('Device_code_id').Value,
        aQuery.FieldByName('Channel_int').Value),
      effectiveDeviceInfo);

    aQuery.Next;
  end;
  aQuery.Close;

That’s between 8 and 24 added loops for each record in this dataset which could have tens of thousands of records!
Granted, that would be small loops if the number of fields is reasonable, but still…

So, I cringe when I see someone posting on StackOverflow a code snippet that looks good (verify Active, DisableControls, try..finally) but has the FieldByName inside the loop:

begin
  Assert(AdoQuery1.Active, 'Dataset is not active!');
  try
    AdoQuery1.DisableControls;
    AdoQuery1.First;
    while not AdoQuery1.Eof do
    begin
      AdoQuery1.Edit;
      AdoQuery1.FieldByName('MyFieldName').Value := Edit1.Text;
      AdoQuery1.Post;
      AdoQuery1.Next;
    end;
  finally
    AdoQuery1.EnableControls;
  end;
end;

And kudos to the poster who changed it after just a little nudge to create a local Field variable before the loop. It adds 2 more lines per Field, but the code is much better:

var
  AField : TField;  // <= line added
begin
  Assert(AdoQuery1.Active, 'Dataset is not active!');
  try
    AdoQuery1.DisableControls;
    AField := AdoQuery1.FieldByName('MyFieldName'); // <= line added
    AdoQuery1.First;
    while not AdoQuery1.Eof do
    begin
      AdoQuery1.Edit;
      AField.Value := Edit1.Text;
      AdoQuery1.Post;
      AdoQuery1.Next;
    end;
  finally
    AdoQuery1.EnableControls;
  end;
end;

Now, don’t go “premature optimization” on me, as the cases where the number of records is too small to benefit from ousting the FieldByName from the DataSet loop are pretty rare.

I would be curious to know what you get if you do a grep in your code base to find FieldByName or FindField in loops…
Do you have any?

This entry was posted in Coding standard, Delphi, Quality and tagged , , , , , . Bookmark the permalink.

34 Responses to FieldByName, FindField: too convenient to be honest

  1. Steven Kamradt says:

    I also have run into this pattern over and over again.

    Your suggestion is very valid, however another option that I have used is to set a local integer to determine the field index and use the dataset.fields[index] method to get at the field. Before the loop, set this local variable by using the field by name or find field and then inspecting its index property. If you name your local variables so that they almost mirror the real names, it makes the code just as easy to follow.

    Of course this only works well when your loop is local. When your loop is a little larger then you may have to be more creative.

  2. Lachlan Gemmell says:

    It’s worth mentioning that code using the TDataSet default indexed property suffers from this also. e.g.

    while not ADataSet.Eof do begin
    ADataSet['SomeField'] := SomeValue;
    ADataSet.Next;
    end;

  3. Usually I optimize this code:
    var
    fld1: TField;
    fld2: TField;
    [...]
    begin
    fld1 := ADataSet.FieldByName('MyFieldName1');
    fld2 := ADataSet.FieldByName('MyFieldName2');
    while not ADataSet.Eof do
    begin
    [...]
    fld1.SomePropertyOrMethod;
    fld2.SomePropertyOrMethod;
    [...]
    ADataSet.Next;
    end;
    end;

  4. Alan Clark says:

    It’s not possible when creating queries at design time, but I always use persistent fields, e.g.

    query.SomeFieldName.AsString := 'Eh?';

    This has the advantage that you can’t access a field that doesn’t exist, avoids typos in the field names, the app won’t compile when a field is removed from a query but still referenced in code, and no FindField is implicitly called.

    • André says:

      Yes I hate hard coded fieldnames too!
      Your app compiles fine, but when you click a button: invalid fieldname.
      It is also very bad for the performance!

      So I always use persistent fields: Delphi checks for me if everything is OK.

      My current customer uses generated data objects, with typed fields. So if the DB changes (which occurs a lot) you get compiler errors if you use an old field, which is great if you have to manage a large app!

    • Pete R. says:

      I always use persistent fields, too.

  5. A.Bouchez says:

    This is when ORM comes to mind.

    If your table is accessed via a class, its fields/columns are accessed via properties. So the code generated is optimal, no loop is used to access the field, and strong typing will prevent you putting a string into a numerical value e.g.

    See http://synopse.info/forum/viewtopic.php?id=14

    In all cases, I’m not sure the time spent in the “FieldByName” name lookup is a big issue, in comparison to the time spent in the BDE or network connection to the database. With real profiling of an application, creating a local Field variable before the loop won’t make any speed improvement in practice, IMHO.

    • François Gaillard says:

      “…creating a local Field variable before the loop won’t make any speed improvement in practice, IMHO.”
      Have you tried?
      Most of the time the Data is loaded in memory, so moving to the next record is quite fast (no round trip to the Server, no Network and no BDE since quite some time). In comparison, throwing a bunch of loops for each record is *not* negligible.

      • A.Bouchez says:

        Yes, I tried it! In the current applications I maintain, the bottleneck is definitively not in the “FieldByName”, but in the network access to a very concurrent Oracle database.
        I really like the ORM approach, in all cases.

        One cause of why FieldByName was not so slow in my apps is that I overwrote the AnsiCompareText() default implementation by a much faster version, which is not calling the CompareString() deadly slow Windows API.
        In our ORM, field name handling and JSON data from server parsing is performed in a very optimized pascal code, in asm when needed.

        • LDS says:

          “not calling the CompareString()”. Hope your function takes into account everything CompareString() does. Delphi cannot know which codepage the database allows for field names – and which is used.

  6. Marco Cantu says:

    I keep fixing similar code I see my clients use, and in some case the performance can increase 5 to 10 times, for large loops. Good you are raising this problem.

  7. Linas says:

    FieldByName is evil if you have many fields in your dataset and you’re trying to access them in a huge loop. That’s why I’ve made my own FindField implementation by locating field with log n speed, not by iterating through all the them. In big loops this method improves speed drastically.

  8. Eric says:

    Marco, I’m a bit puzzled by the speedups you’re reporting, IME the databases (even from a few local ones) are orders of magnitude slower that a field name lookup doesn’t even register in the performance figures (and in the cases I saw them register, it was less than 5%).

    We’re not using the standard DB layers though, but more direct connectivity layers (for compatibility and performance reasons), so I would guess that the actual issue lies in something very wrong in the way FieldByName/FindField are implemented…

    • Eric says:

      Ok, just had a quick look at DB.pas in Delphi XE, it looks like FindField & TNamedItem.HashName are poorly written enough to be the source of your performance issues.
      In a multithreaded situation HashName (it’s invoked once per FindField) could all by its own explain the 5x/10x speedups you’re seeing by cutting FieldByName out of the loops.

      Funny how the implementor went to great length to try to cut out what didn’t matter (implementing his own GetStringLength…), but missing the obvious in a way worthy of the Daily WTF.

      • François Gaillard says:

        Don’t forget also the cases where you have an in-memory-dataset or a read only query with cached data like in the “real code” example.
        In that case, the data access is not so slow.
        And more generally, the more you can move out of the loop, the better it is, if only marginally.
        I have yet to see cases in real life where FieldByName inside the loop was faster in the end than outside.
        And in my experience, removing FieldByName from inside the loop was always faster or *much* faster.

        • Eric says:

          We don’t use in-memory datasets anymore (at least not the DB-variety ones), but IIRC, the #1 speedup when we did was to not use TDataset & co, rather than work around the limitations.
          ie. we use DB code only to read/write from the database, and after that, move it to business classes, specialized containers, etc.

          For the above task, assuming the DB code doesn’t have the issues exhibited by HashName/FieldByName, it’s IME pretty much a case of premature optimization to move such lookups out of the loop: they just don’t matter enough to care.

          Though given what I saw within HashName, I guess that if you have to stick with TDataset, you just have to go for that kind of optimizations…

          Though in the long run, IMHO you’re probably better off moving away from TDataset as much as possible: less optimization needed, less code needed, and less worries.

      • A.Bouchez says:

        The worse in the current implementation is that it calls AnsiCompareText() for field name comparison, which is dead slow. It calls indeed the very slow CompareString() Windows API. A much faster implementation could be made just by replacing AnsiCompareText() with one of the FastCode CompareText() versions available. This is what I made and the result is good. Using hashing, as you made, is even better, of course!

  9. Don’t you love the refactoring ‘Add explaining var’ from ModelMaker Code Explorer?
    It is not just explaining, choosing the initialization point with care even improves your performance :-)

    –jeroen

  10. Isaque Pinheiro says:

    We use all FieldByName, very good article.

    What you tell me the code?
    Someone has something against you saying?

    unit Unit1;

    interface

    uses DB;

    type
    TMyDatSet = class
    strict private
    class var
    FDataSet: TDataSet;
    FField1: TField;
    FField2: TField;
    public
    class constructor Create;
    class property DataSet: TDataSet Read FDataSet Write FDataSet;
    class property Field1: TField Read FField1 Write FField1;
    class property Field2: TField Read FField2 Write FField2;
    end;

    implementation

    { TMyDatSet }

    class constructor TMyDatSet.Create;
    begin
    Field1 := FDataSet.FieldByName(‘Field1′);
    Field1 := FDataSet.FieldByName(‘Field2′);
    end;

    end.

    would like
    TMyDataSet.Field1.AsString: = ‘Test’;
    TMyDataSet.Field1.AsInteger: = 0;

    sorry, my english

    • Lachlan Gemmell says:

      I do something similar although I use a record rather than a class so I don’t have to worry about memory management.

      It keeps the use of a string literal for the field name limited to just the one location and gives a performance boost when used inside loops as discussed by Francois.

  11. I don’t see why they don’t just add a TDictionary<string, TField> to TFields and use that for the lookup. That would fix a lot of problems and get rid of the ugly temporary field variable pattern.

    I’ve seen one routine that involves copying the contents of one dataset to another, with some processing going on as well, that had more than 50 local variables of type T*Field declared at the top, to work around the FieldByName performance issue. It just about made me weep the first time I encountered it…

  12. Tonis Argus says:

    But how is the best to handle dataset events like OnNewRecord(Dataset : TDataset)
    Lets say that this event belongs to table TblOrder where is field TblOrderAmount
    Dataset.fieldbyname(‘Amount’).AsFloat can cause typing issues and errors at runtime.
    TblOrderAmount.AsFloat is somehow restricted. I do not know why but I have sometimes issues when during an event execution dataset state becomes dsBrowse. What will cause it I do not know.
    Dataset.fields[TblOrderAmount.Index].AsFloat The best way?

  13. Pingback: Anonymous

  14. Boys, I thing that your huge resultsets are the biggest performance issue. Having a query returning more than 2000 result-rows is a major design flaw. A human can`t fight with so much data. FieldByName does not have so big impact on .. say 500 records.
    Personally I use Steven Kamradt`s approach – access the fields by index after getting their names into local array (to print `em to produce json).

  15. yosvany says:

    please S.O.S.

    puede alguien ayudarme? no puedo leer el valor de retorno con StoredProcedure en Delphi XE2 desde una Funcion o Procedimiento en mi Base de Datos MYSql, pago lo que sea por ver esto funcionar, por favor necesito ayuda.

    Gracias

  16. yosvany says:

    Please S.O.S.

    can anyone help? I can not read the return value with StoredProcedure in Delphi XE2 from a function or procedure in my MySQL database, pay anything to see this function, please need help.

    thanks

  17. notation to at least use an “F” prefix for naming the fields of an object (class structure). You’ll find this throughout the source code provided by Borland. If you choose not to use Hungarian notation, then you should at least adopt this F-prefix standard. Also, if you ever write code specifically as an example for training purposes (e.g. to be included in an article), then you might choose to use the F-prefix convention, even if you do use Hungarian notation in your own work.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>