none
ATA Center not starting since about 8/19/2017

    Question

  • I'm not sure if it was caused by / related to KB4025631 installation, which was near the same time, but ATA Center is no longer starting.

    This is the latest version of ATA (Advanced Threat Analytics 1..8)

    Microsoft.Tri.Center-Errors log shows:

    2017-08-21 14:02:14.7071 5832 5   00000000-0000-0000-0000-000000000000 Error [CenterConfigurationManager+<GetConfigurationAsync>d__7] System.NullReferenceException: Object reference not set to an instance of an object.
       at async Microsoft.Tri.Center.Service.CenterConfigurationManager.GetConfigurationAsync(?)
       at async Microsoft.Tri.Infrastructure.Framework.ConfigurationManager`2.UpdateConfigurationAsync[](?)
       at async Microsoft.Tri.Infrastructure.Framework.ConfigurationManager`2.OnInitializeAsync[](?)
       at async Microsoft.Tri.Center.Service.CenterConfigurationManager.OnInitializeAsync(?)
       at async Microsoft.Tri.Infrastructure.Framework.Module.InitializeAsync(?)
       at async Microsoft.Tri.Infrastructure.Framework.ModuleManager.OnInitializeAsync(?)
       at async Microsoft.Tri.Infrastructure.Framework.Module.InitializeAsync(?)
       at async Microsoft.Tri.Infrastructure.Framework.Service.OnStartAsync(?)
       at Microsoft.Tri.Infrastructure.Framework.Service.OnStart(String[] args)

    The MongoDB service is running.

    System restart made no difference in behavior.

    ATA is installed to a dedicated drive (not the OS drive) and that passes CHKDSK with no errors.

    Wednesday, August 23, 2017 7:04 PM

All replies

  • Looked at: https://docs.microsoft.com/en-us/advanced-threat-analytics/ata-configuration-file

    Last config backup was 8/20.

    Attempt to restore gives the following:

    Command line: "(drive):\Program Files\Microsoft Advanced Threat Analytics\Center\MongoDB\bin\mongoimport.exe" --db ATA --collection SystemProfile --file "(drive):\Program Files\Microsoft Advanced Threat Analytics\Center\Backup\SystemProfile_20170820234657.json" --upsert

    2017-08-23T19:12:08.853+0000    [........................] ATA.SystemProfile    0B/303B (0.0%)
    2017-08-23T19:12:10.407+0000    [........................] ATA.SystemProfile    0B/303B (0.0%)
    2017-08-23T19:12:10.407+0000    Failed: error connecting to db server: no reachable servers
    2017-08-23T19:12:10.407+0000    imported 0 documents

    Which seems to imply the MongoDB server isn't working after all, even though the service is started.

    (Yes, the ATA license is installed and was confirmed before this went offline)

    ...AND MongoDB was no longer running. Grr. Manual start, and the config restore worked as expected - last config is back on the DB.

    But restarting the Microsoft Advanced Threat Analytics service still gives Error 1067 The service terminated unexpectedly.

    2017-08-23 19:18:01.2688 2272 5   00000000-0000-0000-0000-000000000000 Error [CenterConfigurationManager+<GetConfigurationAsync>d__7] System.NullReferenceException: Object reference not set to an instance of an object.
       at async Microsoft.Tri.Center.Service.CenterConfigurationManager.GetConfigurationAsync(?)
       at async Microsoft.Tri.Infrastructure.Framework.ConfigurationManager`2.UpdateConfigurationAsync[](?)
       at async Microsoft.Tri.Infrastructure.Framework.ConfigurationManager`2.OnInitializeAsync[](?)
       at async Microsoft.Tri.Center.Service.CenterConfigurationManager.OnInitializeAsync(?)
       at async Microsoft.Tri.Infrastructure.Framework.Module.InitializeAsync(?)
       at async Microsoft.Tri.Infrastructure.Framework.ModuleManager.OnInitializeAsync(?)
       at async Microsoft.Tri.Infrastructure.Framework.Module.InitializeAsync(?)
       at async Microsoft.Tri.Infrastructure.Framework.Service.OnStartAsync(?)
       at Microsoft.Tri.Infrastructure.Framework.Service.OnStart(String[] args)

    And the underlying error has not changed.

    Wednesday, August 23, 2017 7:15 PM
  • Reinstall states ATA is already installed. Am trying removal of 8/21 updates KB4034658 and a Visual C++ redistributable...

    ...and reboot. ...and wait, and wait, and wait at the "Working on updates  100%% complete  Don't turn off your computer" screen.

    Now it appears that the Microsoft Advanced Threat Analytics service has stuck in the "starting" state.

    Wednesday, August 23, 2017 7:45 PM
  • Hi,

    Can you try this?

    Open a command prompt and cd to

    \Microsoft Advanced Threat Analytics\Center\MongoDB\bin
    from there run:
    mongo.exe ATA --eval "printjson(db.SystemProfile.find({'_t':'CenterSystemProfile'}).toArray())" > out.txt

    open the out.txt file, if there is any sensitive info there, replace it with XXXX, and copy paste there the "sanitized" content. If the file contained errors or no json data, also paste it here please.
    BTW - latest version is 1.8 update 1, not 1.8, but I would not upgrade before fixing the issue...

    Thanks,

    Eli


    Wednesday, August 23, 2017 7:56 PM
  • MongoDB shell version v3.4.2
    connecting to: mongodb://127.0.0.1:27017/ATA
    MongoDB server version: 3.4.2
    [ ]

    That's all...

    Wednesday, August 23, 2017 8:14 PM
  • and through the VLSC, 1.8 is still the latest available version. I'm OK applying Update 1 as soon as practical, but need to download it through a reputable source ;)

    Wednesday, August 23, 2017 8:19 PM
  • TELNET to 127.0.0.1:27017 does give me a connection (to a blank screen) and there is not an active firewall at this time, so I don't think it's MongoDB access.

    Wednesday, August 23, 2017 8:23 PM
  • This is what I was afraid of...

    It means that the record in the DB containting the Center main configuration is missing.
    nothing wrong with mongo itself, or else the command I gave you would not have worked at all.
    Not sure if it was like that before, or after the restore, or how it got to this situation the first place.
    It might be that the backup file you used was already backing up a faulty data...

    I suggest you go over the json files in the backup folder, and find the latest one that has a line that contains the string:
    ["Entity","Profile","SystemProfile","ServiceSystemProfile","CenterSystemProfile"]
    then use it for the restore process again.


    As for the update, you can download it from download.microsoft.com.
    it will take some time for it to reach VLSC, but it's up to you :-)


    Wednesday, August 23, 2017 8:32 PM
  • Oldest .JSON file has this content: (1KB)

    {"_id":{"$oid":"59864a6b8b5da409007bd132"},"_t":["Entity","Profile","SystemProfile","LicenseSystemProfile"],"UpdateTime":"2017-08-05T22:44:59.9192565Z","LicenseType":"Evaluation","ProductId":null,"EvaluationExpirationTime":"2017-11-03T22:44:59.9192565Z","ValidationErrorMessageKey":"LicenseEvaluation"}

    This is NOT correct because I personally applied the license and saw it in the console.

    Should I try rolling the server to a backup from ~8/18 or 8/19?

    Wednesday, August 23, 2017 8:43 PM
  • As far as root cause, I'm not sure the last time it was working. I do no it was NOT on Monday morning 8/21 and I'm pretty sure it was on Friday 8/18.

    The server applied 1 update 8/19 and 2 8/21.

    Wednesday, August 23, 2017 8:46 PM
  • Was ATA activated on Aug 5 ?
    Wednesday, August 23, 2017 8:48 PM
  • Not sure about rolling the server back.

    This is a VM snapshot?

    I am not sure how mongo will deal with it unless the snapshot was taken while  the server was off.

    I can suggest something else.

    Shut down the server now, and take a "Safe snapshot" we can go back to later.

    restore the machine back to a working state where we can fetch the missing record,

    hopefully from a json backup file that contains it, keep it in a file , restore to the safe snapshot,

    and try to restore the config again.

    Wednesday, August 23, 2017 8:50 PM
  • No, before then as I was on vacation starting 8/4.
    Wednesday, August 23, 2017 9:21 PM
  • This is not a VM snapshot. It is running on an AWS (gasp!) VM that (AFAIK) has not had any rollback operations or the like.

    I am investigating whether there are snapshots / backups available that I could roll back to ~8/18.

    Wednesday, August 23, 2017 9:23 PM
  • then the record is fine, it shows that on Aug 5 the system was not activated, but it's not important, you can always activate it again with the same key.

    So we need to somehow get this record back, or reinstall ATA from scratch.

    Wednesday, August 23, 2017 9:40 PM
  • Would doing the 1.8 update 1 install recreate this?

    Thursday, August 24, 2017 2:22 AM
  • No, it has no way of guessing the configurations/settings you have set after deploy time.

    Thursday, August 24, 2017 7:39 AM
  • Joy.

    I also have a Premier Support ticket open and it sounds like this issue is impacting multiple customers.

    For future, is there a configuration backup procedure so, worst case, we could do a base install, then apply/restore the config?

    Friday, August 25, 2017 3:21 AM
  • If you write the case # & name of the engineer working with you I might be able to work with premier together.

    Premier is the best support route.

    As for official guidance, there is, you can find it here:

    https://docs.microsoft.com/en-us/advanced-threat-analytics/disaster-recovery

    and

    https://docs.microsoft.com/en-us/advanced-threat-analytics/ata-configuration-file

    There is also the auto config backup we tried to use, but I am not sure yet why we were not able to find a working configuration there. 

    Friday, August 25, 2017 7:13 PM
  • I had looked at the ATA config file export / import. Thanks for the DR link - missed that one.

    case 117082316230712 with Janelle Littlejohn

    Saturday, August 26, 2017 5:55 AM
  • Hi, we have the same issue, did anyone find a solution for this?
    Monday, September 11, 2017 7:10 AM
  • Can you run this command from the mongo bin folder, and paste the output in the generated file here?

    mongo ATA --eval "var collectionNames = db.getCollectionNames(), indexes = [];collectionNames.forEach(function (name) \{printjson(name);printjson(db[name].getIndexes());print('-------------------------------------');\}); " > indexes.txt

    This will help confirm if you have the exact same issue.

    Monday, September 11, 2017 6:01 PM
  • Can you run this command from the mongo bin folder, and paste the output in the generated file here?

    mongo ATA --eval "var collectionNames = db.getCollectionNames(), indexes = [];collectionNames.forEach(function (name) \{printjson(name);printjson(db[name].getIndexes());print('-------------------------------------');\}); " > indexes.txt

    This will help confirm if you have the exact same issue.


    Hi, I will run this tomorrow and post here the output but just want to share that I restored the ATA VM from backup back to 19/08/2017 and ATA center started successfully, we upgraded ATA to latest version via Windows update on 20/08/2017 so it seems issues started from there as last notification email came from ATA was on 29/08/2017. Now the version installed is 1.7.xxx but the lightweight gateway version installed on DCs are 1.8.xxx and their status is disconnected in ATA console, I checked the gateways service on them and it is not starting with error in event viewer same to old ATA center error (service terminated unexpectedly) so should I remove the lightweight gateways from the DCs and install them again with current version (1.7.xxx)?
    Monday, September 11, 2017 6:26 PM
  • The output is only interesting if you have a snapshot of the DB in the state where it showed the error.

    It's interesting that you say it's related to ATA upgrade.

    I suggest to try the upgrade again and confirm it's reproducible, 

    How many gateways do you have?

    In general, you will have to reinstall them, but I would consider upgrading to 1.8.1 first  to avoid redundant work of needing to upgrade them again. 

    Monday, September 11, 2017 6:35 PM
  • Thanks.

    In fact I just communicated with my colleague who is working with me on this case and he informed that he downloaded the upgrade package from link: https://www.microsoft.com/en-us/download/details.aspx?id=55536 and started installing it on the ATA server, we will check the status tomorrow if the ATA center service stopped again or continues to work after the upgrade also if gateways started to connect to the ATA server or not.

    We have 7 gateways (DCs) in our setup

    Monday, September 11, 2017 6:40 PM
  • Hi,

    In my case, after I upgraded using the downloaded package from https://www.microsoft.com/en-us/download/details.aspx?id=55536 and not using windows update on the server, ATA center service continues to run and all ok!.

    Just one gateway (DC) out of the of the seven gateways (DCs) is disconnected, I found the gateway not installed on that server\DC. 

    • Proposed as answer by AhmadJY Monday, September 25, 2017 11:22 AM
    Tuesday, September 12, 2017 6:40 AM
  • I am still interested in the mongo info if you have a snapshot of the DB while in the error state, 

    all I have as a clue is that the issue might have been triggered when running via MU and not manually,

    but as long as I don't have the info, I can't even be sure it's the exact same problem...

    Tuesday, September 12, 2017 11:34 AM