When you are troubleshooting slow or failed Citrix logons, no doubt that it helps to know a bit about the background events that take place to achieve a successful logon. Unfortunately it isn’t quite as simple as handing your logon credentials to StoreFront and being granted access to your apps and desktops.
No, instead, there are a number of different communications going back and forth between several different components.
It is important to know what components are involved in the logon process, and be aware of some different configurations that can be implemented to improve logon speed as well as knowing which configurations have a negative effect on logon times.
♣ StoreFront Logon Process
♣ What can cause slow StoreFront authentication?
♣ NetScaler Logon Process
♣ NetScaler Logon Failure Reasons
Citrix Logon Process (internal StoreFront, no NetScaler)
- User contacts and authenticates to StoreFront
- StoreFront contacts Delivery Controller to enumerate resources for authenticated user
- Delivery Controller responds to StoreFront with list of resources
- StoreFront displays list of resources to user
- User clicks on desktop or application resource
- StoreFront contacts Delivery Controller to request an available/best available VDA to host the session
- Delivery Controller finds applicable host, and responds to StoreFront with hostname and IP address of VDA. If VDA is powered off (XenDesktop) it will be powered on at this stage
- StoreFront creates an ICA file containing information gathered from Delivery Controller and passes to end-user
- ICA file is downloaded by end-user and opened with Citrix Receiver
- Citrix Receiver checks that a connection can be made to the VDA and then makes ICA connection
- VDA communicates with RDS license server for license check-out
- Delivery Controller creates the user session and processes Citrix policies
- User Authentication takes place between AD and VDA
- Users profile loads
- User GPOs process
- Application or desktop launch is complete and the resource is usable by end-user
What can cause each of these steps to perform “slowly”? – The 17 numbers below are mapped to the 17 numbers above.
- Overloaded AD for authentication, overloaded StoreFront servers. AD Sites and Services incorrectly configured.
- Overloaded StoreFront, overloaded Delivery Controllers.
- Overloaded Delivery Controllers, overloaded StoreFront.
- Overloaded StoreFront.
- Slow PC, slow Internet connection.
- Overloaded StoreFront, overloaded Delivery Controller.
- Lack of available VDAs, overloaded StoreFront, overloaded Delivery Controller.
- Overloaded StoreFront, slow network connection.
- Slow network connection, slow PC.
- Slow PC, latent network connection.
- Overloaded license server.
- Overloaded Delivery Controller.
- Overloaded AD, Sites and Services incorrectly configured.
- Overloaded profile store, latent network connection between profile store and VDA.
- Overloaded VDA, overloaded AD servers. DNS misconfiguration, too many GPOs configured to be applied.
- VDA poor performance.
Obviously this isn’t a definitive list, however it gives some sample ideas of what can cause slowness during the logon and resource launch process.
What can cause all these steps to perform slowly?
- Constrained bandwidth, overloaded network, storage, server hardware or virtual machine resource deficiency. Infrastructure failure is another impact that can have an adverse effect.
How to overcome?
Use montioring systems to monitor CPU, RAM, bandwidth, database, storage performance etc. and alert when thresholds are breached or failures occur.
Make sure each infrastructure component is redundant from hardware up to clustering and virtual machines. Also make sure storage and network throughput can cope with the demand of users especially during peak periods where logins and resource launches are going to be high.
What other factors can cause a slow logon?
- Large user profiles, roaming profile or using UPM etc.
- New profile average load time using Citrix Profile Management (Always cache enabled, profile streaming disabled):
- 7GB profile load average time using Citrix Profile Management (Always cache enabled, profile streaming disabled):
- OK so you notice the profile load is still virtually nothing, I don’t know why this is because the profile load is what caused 99% of the 74sec logon duration.
- Size on disk reporting the profile as 7.19GB on the VDA.
- Now let’s enable Profile Streaming.
- And disable Always cache.
- The logon time is back down to 15-20 seconds and the size on disk reports 57.7MB for the profile because we are now streaming the profile on demand, nothing is cached just yet. This is a good feature within Citrix Profile Management that allows for fast logon times even with large bulky profiles.
- Many GPOs and GPO settings
- Logon scripts when applied via GPO vs AD user level
Some of these recommendations have already been documented https://jgspiers.com/citrix-tips-tricks-tweaks-suggestions/
How authentication can be slowed:
- Overloaded AD, AD failure, AD Sites and Services not correctly configured.
How to overcome?
Ensure AD is highly available and you have enough AD servers to cope with demand based on Microsoft recommendations.
Ensure Sites and Services is correctly configured so that users authenticate with Active Directory servers close by. Regions with a large amount of users should have their own AD servers. Subnets for each office location/site VLAN should be defined within AD Sites and Services and assigned to sites that are closest to them so that AD authentication takes the optimal route.
How profile load can be slowed:
The sheer size of profiles are the culprit of slow loads. Profile bloat is one of the most common reasons why a logon may be slow for affected users. Other reasons can include a failed file server which hosts the roaming profile or network issues which prevent the profile from being fetched.
File server overloaded, lack of performance, too many users connecting to retrieve profiles or insufficient storage resources are other failure points.
How to overcome?
Some solutions such as Citrix Profile Manager include streaming and directory/file exclusion which help improve logon speeds. See https://jgspiers.com/citrix-profile-management-overview/ for more information on Citrix Profile Management.
If Citrix Profile Management takes a long time to process, you can enable logging using the Citrix Profile Management ADMX template.
Redirect as many folders as possible within a users profile. Exclude directories and files that simply are not needed from being redirected or roamed/cached to the VDA. Do test in a pre-production environment first before deploying any profile optimisations within a production environment.
Large vs small profile logon times was shown above and how Citrix Profile Management can combat large profiles to ensure fast logon.
How GPOs and logon scripts slow down logon
- Serveral small GPOs containing a few settings rather than one or two larger GPOs.
- Scripts that take a long time to run.
- Numerous Group Policy drive maps or maps to locations that are inaccessible.
- Not disabling User Configuration or Computer Configuration sections of a policy when they are not in use.
- Many policies, not to mention many different Citrix policies as these should also be taken in to consideration at all times.
- Printer mapping via GPOs can cause slow logons when many printers are created. Citrix does have the policy setting Wait for printers to be created which is disabled by default and only applies to Server OS VDA. This allows a session to start without waiting for all printers to be mapped from client device (printer redirection). This does not help in situations where GPOs are mapping printers directly to a Citrix session.
Applications can slow down logon
When a user launches an application, depending on when they see the initial application landing screen is how they judge how quickly the logon process has taken. If applications make backend connections to database servers or file shares as some do, you must ensure those connections are established with minimal time. To ensure this, make sure database and file servers are highly configured and have enough resource to cope with demand when under load. Application prelaunch can help achieve quicker launch times. Also, authentication should pass-through to the application eliminating any additional authentication steps so as to not affect the user experience.
To read up on Application Prelaunch see https://jgspiers.com/citrix-application-prelaunch/
NetScaler Logon Process with LDAP/RADIUS
To view authentication logging through the CLI see https://jgspiers.com/netscaler-authentication-failures-aaad-debug/
The authentication process as follows:
User enters credentials -> NetScaler makes attempt to bind to LDAP -> LDAP search is performed for users sAMAccountName -> users group membership is extracted from LDAP -> user is successfully authenticated -> RADIUS authentication is attempted (if used) -> RADIUS groups extracted if any -> authentication is accepted.
- start_ldap_auth attempting to auth username @ LDAP IP (may be load balanced VIP).
- Connecting to: LDAPIP:389 or 636 if using Secure LDAP. https://jgspiers.com/configuring-ldaps-citrix-netscaler/
- receive_ldap_bind_event Bind OK (LDAP bind successful using NetScaler bind account defined in LDAP server on NetScaler).
- receive_ldap_bind_event User name: dirty = <username> sanitized = <username>
- ns_ldap_search Searching for <<(& (sAMAccountName=sAMAccountName) (objectClass=*))>> from base <<OU=Users,DC=JGSPIERS,DC=COM>> (the base DN that the LDAP search is performed on is based on what you have specified in your LDAP server on NetScaler).
- receive_ldap_user_search_event Received LDAP user search event.
- ns_ldap_check_result checking LDAP result. Expecting 101 (LDAP_RES_SEARCH_RESULT).
- ns_ldap_check_result ldap_result found expected result LDAP_RES_SEARCH_RESULT.
- receive_ldap_user_search_event received LDAP_OK (user was found).
- receive_ldap_user_search_event Binding user… 1 entries.
- receive_ldap_user_search_event User DN= full DN location of user account.
- receive_ldap_user_search_event built group string for username of: Group membership displayed.
- send_accept sending accept to kernel for : Username (user is successfully authenticated).
At this stage LDAP authentication is complete. If you have configured RADIUS authentication for 2FA, authentication will continue:
- continue_radius_auth attempting to auth username @ RADIUS IP (may be a load balanced VIP).
- process_radius Got RADIUS event.
- process_radius radius accepts : username.
- send_accept sending accept to kernel for : username.
If you notice slow authentication, use aaad.debug, all event entries are time stamped so you can narrow the authentication process down to what part is actually slow and diagnose from there.
Other common authentication falure errors:
Invalid credentials entered by the user:
ns_ldap_check_result LDAP action failed (error 49): Invalid Credentials
send_reject_with_code Rejecting with error code 4001.
Wrong username entered (does not exist):
receive_ldap_user_search_event ldap_first_entry returned null, user not found.
send_reject_with_code Rejecting with error code 4009
Logon denied for a users account in Active Directory:
ns_ldap_check_result LDAP action failed (error 49) : Invalid Credentials
send_reject_with_code Rejecting with error code 4001.
User account disabled:
ns_ldap_check_result LDAP action failed (error 49) : Invalid Credentials
send_reject_with_code Rejecting with error code 4001.
Invalid BIND account username/password:
ns_ldap_check_result LDAP action failed (error 49) : Invalid credentials
receive_ldap_bind_event Got LDAP error
LDAP server unreachable:
ns_ldap_simple_bind ldap_simple_bind :Can’t contact LDAP server
send_reject_with_code Rejecting with error code 4001
LDAP bind credentials missing
In order to perform this operation a successful bind must be completed on the connection.
Make sure the credentials of the LDAP bind account on the LDAP profile are not missing.
LDAP EAGAIN returns etc.
If you get any of the below types of log text, and ultimately LDAP authentication is not working, recreate your LDAP server object on NetScaler and try again.
User complexity failure on password change:
If using LDAPS with Allow password change enabled, a user is prompted to change their password if it expires or is set to change on first logon. The new password specified does not meet the Active Directory set complexity requirements for passwords.
ns_ldap_check_result LDAP action failed (error 19): Constraint violation
receive_ldap_passwd_modify_event Password complexity violation
RADIUS authentication failure:
process_radius Received RAD_ACCESS_REJECT for: username.
process_radius Sending reject.
send_reject with_code Rejecting with error code 4001.
On NetScaler v11+ you can also navigate to Authentication -> Logs. This shows an output of all the authentication attempts including failure reasons. Data here is pulled from the ns.log file located in the /var/log/ directory.
Finally NISC (NetScaler Insight Center) and NMAS (NetScaler Management and Analytics System) also with Gateway Insight records user authentication attempts and records failures including failure reasons. Navigate to Analytics -> Gateway Insight -> Authentication on NMAS.
Authentication attempts can be drilled down to specific users for easier troubleshooting towards specific user authentication by clicking on the users name.
See https://jgspiers.com/ldap-load-balancing-citrix-netscaler/ for LDAP Load Balancing through NetScaler.
To enable Enhanced Authentication Feedback on NetScaler to provide more meaningful logon failures to users see https://jgspiers.com/netscaler-enhanced-authentication-feedback/
Prasanth
April 18, 2018HI,
I
VDA never communicates directly with Citrix license server in Citrix XenApp 7.x environment. Please correct me i’m wrong
George Spiers
April 18, 2018Correct.
Anonymous
July 19, 2018Hi George,
I am trying to add RADIUS server in Netscaler node 1 (primary), the test connection fails to reach Radius server, but when i add in NOde 2( secondary), the connection was successful.
Thanks,
George Spiers
July 20, 2018Odd. When you say test connection do you mean testing authentication or using the actual test authentication button on the RADIUS profile? Have you tried taking a traffic trace and reviewing with WireShark to see what is happening to that RADIUS traffic?
Anonymous
July 20, 2018when configuring Authentication Radius server i.e after entering IP, Port and Secure key, i am hitting test connection button, then its saying , server is not reachable or either is not a valid server, or 1812is not a valid radius port.
the radius client agent is configured at radius server and the traceroute and telnet from NSIP to Radius server is working fine. from Node two everything is working fine.
is there any command to trace, when i hit test connection button .. Thank you!
George Spiers
July 20, 2018Not sure what RADIUS system you are using but if firewalls are not blocking the communication then either the passwords do not match between appliances, or the RADIUS server isn’t configured to expect any traffic from NetScaler.
An nstrace running whilst pressing the test connection button may give better indication as to what is happening and if any traffic is being blocked. https://jgspiers.com/citrix-netscaler-traffic-capture-using-nstrace-nstcpdump/
Anonymous
July 23, 2018Thanks George, i did nstrace from both netscalers, from Node 1 (primary) – the request source IP is showing Subnet IP and NSIP as well with ICMP protocol.
form Node 2- the request source IP is only NSIP and as i said the connection is successful.
so do we need to configure a Radius agent with subnetIP at radius server? as per citirx NSIP will contact radius server.. bit confusing with my NODE1 🙁 any thoughts would be very helpful. Thank you!
George Spiers
July 23, 2018It is NSIP by default, as you noticed Node2 was using NSIP and the connection is successful. If you allow Node1 NSIP to communicate with RADIUS then you should be good.
Have a look inside the packets to see if there are any error messages. If you are not seeing any traffic returnesd from RADIUS server e.g. acknowledgements etc. then it is likely that traffic is being blocked.
Anonymous
July 23, 2018Hi George,
Thank you!, as i mentioned, both the nodes allowed with RADIUS ( agent creation) . i could see SNIP is sending traffic to RADIUS when i nstrace from Node1 . from Node 2 only NSIP is sending traffic. is there any way i can attach screen shot or email .
George Spiers
July 24, 2018Send me an email with the traces from Node1 and Node2. Type in the email your RADIUS IP, your NSIP and SNIP. I’ll take a look.
Anonymous
July 24, 2018Hi George,
I emailed the details. Thank you!
Anonymous
July 31, 2018Hey George,
Thanks for your help, i really appreciate it.
As you know the netscalers are not reaching RADIUS, i created a agent with SNIP at RADIUS and the connection was successful.
I know technically we have to create agent with NSIP . but here i dont know, the SNIP is sending the traffic. with SNIP created the the entry .
Thanks,
Srinivas
Srini
August 14, 2018Hi,
in our infra, radius policy is bound as primary authentication to the VIP, and when i open the VIP url, i am entering LDAP creds first then its prompting for entrust token (Radius). how was this setup ? technically i should enter Radius code since its bound as parimary auth.
is the LDAP bound is hidden or something to the VIP.
Thank you!
George Spiers
August 15, 2018Maybe the RADIUS server is acceptng the LDAP credentials. Some RADIUS systems do that, and authenticate to AD on your behalf before sending you a RADIUS challenge.
If you only have one authentication policy which is RADIUS as primary, that is likely to be what is happening.
Harry
December 9, 2020Hi George,
I am experiencing issues while trying to “Change password” after successfully logging into the Gateway. If the user has “password change on next log-in” setting enabled, it let’s the user change the password successfully.
Pingback: http citrix login process diagram com Account Portal Instructions Help Guide - trustne.com
Pingback: www.jgspiers.com change login - lgoinbb.xyz
Pingback: NetScaler身份认证失败?aaad.debug 来拯救你 – Alonso Blogs