How Different Is the Perception of Vibrotactile Texture Roughness in Augmented versus Virtual Reality?

Wearable haptic devices can modify the haptic perception of an object touched directly by the finger in a portable and unobtrusive way. In this paper, we investigate whether such wearable haptic augmentations are perceived differently in Augmented Reality (AR) vs. Virtual Reality (VR) and when touching with a virtual hand instead of one’s own hand. We first designed a system for real-time rendering of vibrotactile virtual textures without constraints on hand movements, integrated with an immersive visual AR/VR head-set. We then conducted a psychophysical study with 20 participants to evaluate the haptic perception of virtual roughness textures on a real surface touched directly with the finger (1) without visual augmentation, (2) with a realistic virtual hand rendered in AR, and (3) with the same virtual hand in VR. On average, participants over-estimated the roughness of haptic textures when touching with their real hand alone and underestimated it when touching with a virtual hand in AR, with VR in between. Exploration behaviour was also slower in VR than with real hand alone, although subjective evaluation of the texture was not affected. We discuss how the perceived visual delay of the virtual hand may produce this effect.


Mixed Real Virtual
Figure 1: Vibrotactile textures were rendered in real time on a real surface using a wearable vibrotactile device worn on the finger.Participants explored this haptic roughness augmentation with (Real) their real hand alone, (Mixed) a realistic virtual hand overlay in AR, and (Virtual) the same virtual hand in VR.

Introduction
Wearable haptic devices, worn directly on the finger or hand, have been used to render a variety of tactile sensations to virtual objects in Virtual Reality (VR) [13,38] and Augmented Reality (AR) [29,45].
They have also been used to alter the perception of roughness, stiffness, friction, and local shape perception of real tangible objects [3,13,33,41].Such techniques place the actuator close to the point of contact with the real environment, leaving the user free to directly touch the tangible.This combined use of wearable haptics with tangible objects enables a haptic augmented reality (HAR) [6] that can provide a rich and varied haptic feedback.The degree of reality/virtuality in both visual and haptic sensory modalities can be varied independently, but wearable HAR has been little explored with VR and (visual) AR [9,33].Although AR and VR are closely related, they have significant differences that can affect the user experience [21,28].Therefore, it seems necessary to investigate and understand the potential effect of these differences in visual rendering on the HAR perception.For example, previous works have shown that the stiffness of a virtual piston rendered with a force feedback haptic system seen in AR is perceived as less rigid than in VR [20], or when the visual rendering is ahead of the haptic rendering [15,27].
The goal of this paper is to study the role of the visual rendering of the hand (real or virtual) and its environment (AR or VR) on the perception of a tangible surface whose texture is augmented with a wearable vibrotactile device worn on the finger.We focus on the perception of roughness, one of the main tactile sensations of materials [4,23,36] and one of the most studied haptic augmentations [3,12,18,33,44,47].
Our contributions are: • A system for rendering virtual vibrotactile roughness textures in real time on a tangible surface touched directly with the finger, integrated with an immersive visual AR/VR headset to provide a coherent multimodal visuo-haptic augmentation of the real environment; and • A psychophysical study with 20 participants to evaluate the perception of these virtual roughness textures in three visual rendering conditions: without visual augmentation, with a realistic virtual hand rendering in AR, and with the same virtual hand in VR.

Related Work
Many works have investigated the haptic rendering of virtual textures to modify the perception of real, tangible surfaces, but few have considered the influence of visual rendering, or integrated both in an AR/VR environment.

Augmenting Haptic Texture Roughness
When running a finger over a surface, the deformations and vibrations of the skin caused by the micro-height differences of the material induce the sensation of roughness [26].An effective approach to rendering virtual roughness is to generate vibrations to simulate interaction with the virtual texture [11], relying on the user's real-time measurements of position, velocity and force.The perceived roughness of real surfaces can then be modified when touched by a tool with a vibrotactile actuator attached [12,47] or directly with the finger wearing the vibrotactile actuator [3,33], creating a haptic texture augmentation.An additional challenge in augmenting the finger touch is to keep the fingertip free to touch the real environment, thus delocalizing the actuator elsewhere on the hand [2,18,34,45].Of course, the fingertip skin is not deformed by the virtual texture and only vibrations are felt, but it has been shown that the vibrations produced on the fingertip skin running over a real surface are texture specific and similar between individuals [30].A common vibrotactile texture rendering is to use a sinusoidal signal whose frequency is modulated by finger position or velocity [3,18,44,47].It remains unclear whether such vibrotactile texture augmentation is perceived the same when integrated into visual AR or VR environments or when touched with a virtual hand instead of the real hand.

Influence of Visual Rendering on Haptic Perception
When the same object property is sensed simultaneously by vision and touch, the two modalities are integrated into a single perception.The phychophysical model of Ernst and Banks [17] established that the sense with the least variability dominates perception.Particularly for real textures, it is known that both touch and sight individually perceive textures equally well and similarly [4,5,49].Thus, the overall perception can be modified by changing one of the modalities, as shown by Yanagisawa and Takatsuji [50], who altered the perception of roughness, stiffness and friction of some real tactile textures touched by the finger by superimposing different real visual textures using a half-mirror.Likewise, visual textures have been combined in VR with various tangible objects, in both active touch [14] and passive touch [22] contexts.Normand et al. [33] also investigated the roughness perception of tangible surfaces touched with the finger and augmented with visual textures in AR and with wearable vibrotactile textures.Conversely, virtual hand rendering is also known to influence how an object is grasped in VR [7,39] and AR [34], or even how real bumps and holes are perceived in VR [43], but its effect on the perception of a haptic texture augmentation has not yet been investigated.
A few works have also used pseudo-haptic feedback to change the perception of haptic stimuli to create richer feedback by deforming the visual representation of a user input [46].For example, the perceived softness of tangible objects can be altered by superimposing in AR a virtual texture that deforms when pressed by the hand [40], or in combination with vibrotactile rendering in VR [9].The aforementioned vibrotactile sinusoidal rendering of virtual texture has also been combined with visual oscillations of a cursor on a screen to increase the perception of texture roughness [47].But even before manipulating a visual representation to induce a haptic sensation, shifts and latencies between user input and co-localised visuo-haptic feedback may be experienced differently in AR and VR, which we aim to investigate in this work.
A few studies have specifically compared visuo-haptic perception in AR vs. VR.Rendering a virtual piston pressed with one's real hand using a video see-through (VST) AR headset and a force feedback haptic device, Di Luca et al. [15] showed that a visual delay increased the perceived stiffness of the piston, whereas a haptic delay decreased it.In a similar setup, but with an optical see-through (OST) AR headset, Gaffary et al. [20] found that the virtual piston was perceived as less stiff in AR than in VR, without participants noticing this difference.The use of a VST-AR headset has notable consequences, as the "real" view of the environment and the hand is actually a visual stream from a camera, which has a noticeable delay and lower quality (e.g., resolution, frame rate, field of view) compared to the direct view of the real environment with OST-AR [28].While a large literature has investigated these differences in visual perception [1,37], less is known about visuo-haptic perception in AR/VR.In this work, we studied (1) the perception of a haptic texture augmentation of a tangible surface and (2) the possible influence of the visual rendering of the environment (OST-AR or VR) and the hand touching the surface (real or virtual) on this perception.

Design of Visuo-Haptic Texture Rendering in Mixed Reality
In this section, we describe a system for rendering vibrotactile roughness textures in real time, on any tangible surface, touched directly with the index fingertip, with no constraints on hand movement and using a simple camera to track the finger pose.We also describe how to pair this tactile rendering with an AR or VR headset  . of the defined markers in the camera frame F are estimated, then filtered with an adaptive low-pass filter.These poses are used to move and display the virtual model replicas aligned with the real environment.A collision detection algorithm detects a contact of the virtual hand with the virtual textures.If so, the velocity of the finger marker X is estimated using discrete derivative of position and adaptive low-pass filtering, then transformed onto the texture frame F .The vibrotactile signal is generated by modulating the (scalar) finger velocity ˆ in the texture direction with the texture period (see Eq. 1).The signal is sampled at 48 kHz and sent to the voice-coil actuator via an audio amplifier.All computation steps except signal sampling are performed at 60 Hz and in separate threads to parallelize them.
to provide a coherent, multimodal visuo-haptic augmentation of the real environment.
The visuo-haptic texture rendering system is based on (1) a realtime interaction loop between the finger movements and a coherent visuo-haptic feedback simulating the sensation of a touched texture, (2) a precise alignment of the virtual environment with its real counterpart, and (3) a modulation of the signal frequency by the estimated finger speed with a phase matching.Fig. 2 shows the interaction loop diagram and Eq. 1 the definition of the vibrotactile signal.The system consists of three main components: the pose estimation of the tracked real elements, the visual rendering of the virtual environment, and the vibrotactile signal generation and rendering.

Pose Estimation and Virtual Environment Alignment
A fiducial marker (AprilTag) is glued to the top of the actuator (see Fig. 3a) to track the finger pose with a camera (StreamCam, Logitech) which is placed above the experimental setup and capturing 1280 px × 720 px images at 60 Hz (see Fig. 3c).Other markers are placed on the tangible surfaces to augment (see Fig. 3).Contrary to similar work which either constrained hand to a constant speed to keep the signal frequency constant [3,18], or used mechanical sensors attached to the hand [18,44], using vision-based tracking allows both to free the hand movements and to augment any tangible surface.A camera external to the AR/VR headset with a marker-based technique is employed to provide accurate and robust tracking with a constant view of the markers [31].We denote T , = 1.. the homogenous transformation matrix that defines the position and rotation of the -th marker out of the defined markers in the camera frame F , e.g., the finger pose T and the texture pose T .To reduce the noise in the pose estimation while maintaining good responsiveness, the 1€ filter [8] is applied; a lowpass filter with an adaptive cutoff frequency, specifically designed for human motion tracking..The filtered pose is denoted as T .
The optimal filter parameters were determined using the method of Casiez et al. [8], with a minimum cutoff frequency of 10 Hz and a slope of 0.01.The velocity (without angular velocity) of the marker, denoted as X , is estimated using the discrete derivative of the position and an other 1€ filter with the same parameters.
To compare virtual and augmented realities, we create a virtual environment that closely replicate the real one.Each real element tracked by a marker is modelled virtually, i.e., the hand and the augmented tangible surface (see Fig. 5).In addition, the pose and size of the virtual textures are defined on the virtual replicas.This allows to detect if a finger touches a virtual texture using a collision detection algorithm (Nvidia PhysX), and to show the virtual elements and textures in real-time, aligned with the real environment (see Fig. 5), using the considered AR or VR headset.
In our implementation, the virtual hand and environment are designed with Unity and the Mixed Reality Toolkit (MRTK).The visual rendering is achieved using the Microsoft HoloLens 2, an OST-AR headset with a 43 • × 29 • field of view (FoV), a 60 Hz refresh rate, and self-localisation capabilities.It was chosen over VST-AR because OST-AR only adds virtual content to the real environment, while VST-AR streams a real-time video capture of the real environment, introducing many supplementary visual limitations [25,28].Indeed, one of our objectives is to directly compare a virtual environment that replicates a real one.To simulate a VR headset, a cardboard mask (with holes for sensors) is attached to the headset to block the view of the real environment (see Fig. 3b).

Vibrotactile Signal Generation and Rendering
A voice-coil actuator (HapCoil-One, Actronika) is used to display the vibrotactile signal, as it allows the frequency and amplitude of the signal to be controlled independently over time, covers a wide frequency range (10 Hz to 1000 Hz), and outputs the signal accurately with relatively low acceleration distortion 1 .The voicecoil actuator is encased in a 3D printed plastic shell and firmly attached to the middle phalanx of the user's index finger with a Velcro strap, to enable the fingertip to directly touch the environment (see Fig. 3a).The actuator is driven by a class D audio amplifier (XY-502 / TPA3116D2, Texas Instrument).The amplifier is connected to the audio output of a computer that generates the signal using the WASAPI driver in exclusive mode and the NAudio library.The represented haptic texture is a series of parallels virtual grooves and ridges, similar to real grating textures manufactured for psychophysical roughness perception studies [18,26,48].It is generated as a square wave audio signal , sampled at 48 kHz, with a period and an amplitude .Its frequency is a ratio of the absolute finger filtered (scalar) velocity ˆ | | , transformed into the texture frame F , and the texture period [18].As the finger is moving horizontally on the texture, only the component of the velocity is used.When a new finger velocity ˆ , is estimated at time , the phase of the signal needs also to be adjusted to ensure a continuity in the signal.In other words, the sampling of the audio signal runs at 48 kHz, and its frequency and phase is updated at a far lower rate of 60 Hz when a new finger velocity is estimated.A sample of the audio signal at sampling time , with >= , is thus given by: This rendering preserves the sensation of a constant spatial frequency of the virtual texture while the finger moves at various speeds, which is crucial for the perception of roughness [26,48].
The phase matching avoids sudden changes in the actuator movement thus affecting the texture perception in an uncontrolled way (see Fig. 4) and, contrary to previous work [3,18], it enables no constraints a free exploration of the texture by the user with no constraints on the finger speed.Finally, a square wave is chosen to get a rendering closer to a real grating texture with the sensation of crossing edges [47], and because the roughness perception of sine wave textures has been shown not to reproduce the roughness perception of real grating textures [48].The tactile texture is described and rendered in this work as a one dimensional signal by integrating the relative finger movement to the texture on a single direction, but it is easily extended to a two-dimensional texture by simply generating a second signal for the orthogonal direction and summing the two signals in the rendering.

System Latency
Because the chosen AR headset is a standalone device (like most current AR/VR headsets) and cannot directly control the sound card and haptic actuator, the image capture, pose estimation and audio signal generation steps are performed on an external computer.All computation steps run in a separate thread to parallelize them and reduce latency, and are synchronised with the headset via a local network and the ZeroMQ library.This complex assembly inevitably introduces latency, which must be measured.
The rendering system provides a user with two interaction loops between the movements of their hand and the visual (loop 1) and haptic (loop 2) feedbacks.Measures are shown as mean ± standard deviation (when it is known).The end-to-end latency from finger movement to feedback is measured at (36 ± 4) ms in the haptic loop and (43 ± 9) ms in the visual loop.Both are the result of latency in image capture (16 ± 1) ms, markers tracking (2 ± 1) ms and network communication (4 ± 1) ms.The haptic loop also includes the voicecoil latency 15 ms (as specified by the manufacturer 1 ), whereas the visual loop includes the latency in 3D rendering (16 ± 5) ms (60 frames per second) and display 5 ms.The total haptic latency is below the 60 ms detection threshold in vibrotactile feedback [35].The total visual latency can be considered slightly high, yet it is typical for an AR rendering involving vision-based tracking [27].
The two filters also introduce a constant lag between the finger movement and the estimated position and velocity, measured at (160 ± 30) ms.With respect to the real hand position, it causes a distance error in the displayed virtual hand position, and thus a delay in the triggering of the vibrotactile signal.This is proportional to the speed of the finger, e.g., distance error is (12.0±2.3)mm when the finger moves at 75 mm s −1 .

User Study
The user study aimed to investigate the effect of visual hand rendering in AR or VR on the perception of roughness texture augmentation.In a 2AFC task, participants compared the roughness of different tactile texture augmentations in three visual rendering conditions: without any visual augmentation (see Fig. 5, Real), in AR with a realistic virtual hand superimposed on the real hand (see Fig. 5, Mixed), and in VR with the same virtual hand as an avatar (see Fig. 5, Virtual).In order not to influence the perception, as vision is an important source of information and influence for the perception of texture [5,33,49,50], the touched surface was visually a uniform white; thus only the visual aspect of the hand and the surrounding environment is changed.

Participants
Twenty participants were recruited for the study (16 males, 3 females, 1 preferred not to say), aged between 18 and 61 years (Mdn = 26, IQR = 6.8).All participants had normal or corrected-to-normal vision, and none had a known hand or finger impairment.One was left-handed and the rest were right-handed; they all performed the task with their right index.When rating their experience with haptics, AR and VR ("I use it several times a year"), 12 were experienced with haptics, 5 with AR, and 10 with VR.Experience was correlated between haptics and VR (r = 0.59), and AR and VR (r = 0.67), but not haptics and AR (r = 0.20), nor haptics, AR, or VR with age (r = 0.05 to r = 0.12).Participants were recruited at the university on a voluntary basis.They all signed an informed consent form before the user study and were unaware of its purpose.

Apparatus
An experimental environment was created to ensure a similar visual rendering in AR and VR (see Fig. 5).It consisted of a 300 mm × 210 mm × 400 mm medium-density fibreboard (MDF) box with a paper sheet glued inside and a 50 mm × 15 mm rectangle printed on the sheet to delimit the area where the tactile textures were rendered.A single light source of 800 lm placed 70 cm above the table fully illuminated the inside of the box.Participants rated the roughness of the paper (without any texture augmentation) before the experiment on a 7-point Likert scale (1 = Extremely smooth, 7 = Extremely rough) as quite smooth (M = 2.5 , SD = 1.3).
The virtual environment carefully reproduced the real environment, including the geometry of the box, textures, lighting, and shadows (see Fig. 5, Virtual).The virtual hand model was a genderneutral human right hand with realistic skin texture, similar to that used by Schwind et al. [42].Its size was adjusted to match the real hand of the participants before the experiment.The visual rendering of the virtual hand and environment is described in Sec.3.1.To ensure the same FoV in all Visual Rendering condition, a cardboard mask was attached to the AR headset (see Fig. 3b).In the Virtual rendering, the mask only had holes for sensors to block the view of the real environment and simulate a VR headset.In the Mixed and Real conditions, the mask had two additional holes for the eyes that matched the FoV of the HoloLens 2 (see Fig. 3b).Fig. 5 shows the resulting views in the three considered Visual Rendering conditions.
Participants sat comfortably in front of the box at a distance of 30 cm, wearing the HoloLens 2 with a cardboard mask attached, so that only the inside of the box was visible, as shown in Fig. 3c.The generation of the virtual texture and the control of the virtual hand are described in Sec. 3.They also wore headphones with a pink noise masking the sound of the voice-coil.The experimental setup was held in a quiet room with no windows.The user study took on average one hour to complete.

Procedure
Participants were first given written instructions about the experimental setup and procedure, the informed consent form to sign, and a demographic questionnaire.A calibration was then performed to adjust the HoloLens 2 to the participant's interpupillary distance, the virtual hand to the real hand size, and the fiducial marker to the finger position.They familiarised themselves with the task by completing four training trials with the most different pair of textures.The trials were divided into three blocks, one for each Visual Rendering condition, with a break and questionnaire between each block.Before each block, the experimenter ensured that the virtual environment and the virtual hand were correctly aligned with their real equivalents, that the haptic device was in place, and attached the cardboard mask corresponding to the next Visual Rendering condition to the headset.
The participant started the trial by clicking the middle button of a mouse with the left hand.The first texture was then rendered on the augmented area of the paper sheet for 3 s and, after a 1 s pause, the second texture was also rendered for 3 s.The participant then had to decide which texture was the roughest by clicking the left (for the first texture) or right (for the second texture) button of the mouse and confirming their choice by clicking the middle button again.If the participant moved their finger away from the texture area, the texture timer was paused until they returned.Participants were asked to explore the textures as they would in real life by moving their finger back and forth over the texture area at different speeds.
One of the textures in the tested pair was always the reference texture, while the other was the comparison texture.Participants were not told that there was a reference and a comparison texture.The order of presentation was randomised and not revealed to the participants.All textures were rendered as described in Sec.3.2 with period of 2 mm, but with different amplitudes to create different levels of roughness.Preliminary studies allowed us to determine a range of amplitudes that could be felt by the participants and were not too uncomfortable.The reference texture was chosen to be the one with the middle amplitude to compare it with lower and higher roughness levels and to determine key perceptual variables such as the point of subjective equality (PSE) and the just noticeable difference (JND) of each Visual Rendering condition.The chosen 2AFC task is a common psychophysical method used in haptics to determine PSE and JND by testing comparison stimuli against a fixed reference stimulus and byfitting a psychometric function to the participant's responses [24].

Experimental Design
The user study was a within-subjects design with two factors: • Visual Rendering consists of the augmented or virtual view of the environment, the hand and the wearable haptic device, with 3 levels: real environment and real hand view without any visual augmentation (see Fig. 5, Real), real environment and hand view with the virtual hand (see Fig. 5, Mixed) and virtual environment with the virtual hand (see Fig. 5, Virtual).• Amplitude Difference consists of the difference in amplitude of the comparison texture with the reference texture (which is identical for all visual renderings), with 6 levels: ±12.5 %, ±25.0 % and ±37.5 %.A trial consisted of a 2AFC task in which the participant touched two virtual vibrotactile textures one after the other and decided which one was the roughest.To avoid any order effect, the order of Visual Rendering conditions was counterbalanced between participants using a balanced Latin square design.Within each condition, the presentation order of the reference and comparison textures was also counterbalanced, and all possible texture pairs were presented in random order and repeated three times.A total of 3 visual renderings × 6 amplitude differences × 2 texture presentation order × 3 repetitions = 108 trials were performed by each participant.

Collected Data
For each trial, the Texture Choice by the participant as the roughest of the pair was recorded.The Response Time between the end of the trial and the choice of the participant was also measured as an indicator of the difficulty of the task.At each frame, the Finger Position and Finger Speed were recorded to control for possible differences in texture exploration behaviour.Participants also rated their experience after each Visual Rendering block of trials using the questions shown in Table 1.For all questions, participants were shown only labels (e.g., "Not at all" or "Extremely") and not the actual scale values (e.g., 1 or 5) [32].

Trial Measures
All measures from trials were analysed using linear mixed models (LMM) or generalised linear mixed models (GLMM) with Visual Rendering, Amplitude Difference and their interaction as within-participant factors, and by-participant random intercepts.Depending on the data, different random effect structures were tested.Only the best converging models are reported, with the lowest Akaike Information Criterion (AIC) values.Post-hoc pairwise comparisons were performed using the Tukey's Honest Significant Difference (HSD) test.Each estimate is reported with its 95% confidence interval (CI) as follows: [lower limit, upper limit].

Discrimination Accuracy.
A GLMM was adjusted to the Texture Choice in the 2AFC vibrotactile texture roughness discrimination task, with by-participant random intercepts but no random slopes, and a probit link function (see Fig. 6a).The PSEs (see Fig. 6b) and JNDs (see Fig. 6c) for each visual rendering and their respective differences were estimated from the model, along with their corresponding 95% CI, using a non-parametric bootstrap procedure (1000 samples).The PSE represents the estimated amplitude difference at which the comparison texture was perceived as rougher than the reference texture 50% of the time.The Real rendering had the highest PSE (7.9 % [1.2, 4.1]) and was statistically significantly different from the Mixed rendering (1.9 % [−2.4,6.1]) and from the Virtual rendering (5.1 % [2.4,7.6]).The JND represents the estimated minimum amplitude difference between the comparison and reference textures that participants could perceive, calculated at the 84th percentile of the predictions of the GLMM (i.e., one standard deviation of the normal distribution) [17].The Real rendering had the lowest JND (26 % [23, 29]), the Mixed rendering had the highest (33 % [30,37]), and the Virtual rendering was in between (30 % [28, 32]).All pairwise differences were statistically significant.

Questionnaires
Friedman tests were employed to compare the ratings to the questions (see Table 1), with post-hoc Wilcoxon signed-rank tests and Holm-Bonferroni adjustment, except for the questions regarding the virtual hand that were directly compared with Wilcoxon signedrank tests.Fig. 8 shows these ratings for questions where statistically significant differences were found (results are shown as mean ± standard deviation): • Hand Ownership: participants slightly feel the virtual hand as their own with the Mixed rendering (2.3 ± 1.0) but quite with the Virtual rendering (3.5 ± 0.9, p < 0.001).• Hand Latency: the virtual hand was found to have a moderate latency with the Mixed rendering (2.8 ± 1.2) but a low one with the Virtual rendering (1.9 ± 0.7, p < 0.001).
the best JND (26 %), followed by the Virtual (30 %) and Virtual (33 %) renderings (see Fig. 6c).These JND values are in line with and at the upper end of the range of previous studies [10], which may be due to the location of the actuator on the top of the finger middle phalanx, being less sensitive to vibration than the fingertip.Thus, compared to no visual rendering (Real), the addition of a visual rendering of the hand or environment reduced the roughness sensitivity (JND) and the roughness perception (PSE), as if the virtual vibrotactile textures felt "smoother".Differences in user behaviour were also observed between the visual renderings (but not between the haptic textures).On average, participants responded faster (−16 %), explored textures at a greater distance (+21 %) and at a higher speed (+16 %) without visual augmentation (Real rendering) than in VR (Virtual rendering) (see Fig. 7).The Mixed rendering was always in between, with no significant difference from the other two.This suggests that touching a virtual vibrotactile texture on a tangible surface with a virtual hand in VR is different from touching it with one's own hand: users were more cautious or less confident in their exploration in VR.This does not seem to be due to the realism of the virtual hand or the environment, nor to the control of the virtual hand, all of which were rated high to very high by the participants (see Sec. 5.2) in both the Mixed and Virtual renderings.Very interestingly, the evaluation of the vibrotactile device and the textures was also the same between the visual rendering, with a very high sense of control, a good realism and a very low perceived latency of the textures (see Sec. 5.2).Conversely, the perceived latency of the virtual hand (Hand Latency question) seemed to be related to the perceived roughness of the textures (with the PSEs).The Mixed rendering had the lowest PSE and highest perceived latency, the Virtual rendering had a higher PSE and lower perceived latency, and the Real rendering had the highest PSE and no virtual hand latency (as it was not displayed).
Our visuo-haptic augmentation system aimed to provide a coherent multimodal virtual rendering integrated with the real environment.Yet, it involves different sensory interaction loops between the user's movements and the visuo-haptic feedback (see Fig. 2), which may not feel to be in synchronised with each other or with proprioception.Thereby, we hypothesise that the differences in the perception of vibrotactile roughness are less due to the visual rendering of the hand or the environment and their associated differences in exploration behaviour, but rather to the difference in the perceived latency between one's own hand (visual and proprioception) and the virtual hand (visual and haptic).The perceived delay was the most important in AR, where the virtual hand visually lags significantly behind the real one, but less so in VR, where only the proprioceptive sense can help detect the lag.This delay was not perceived when touching the virtual haptic textures without visual augmentation, because only the finger velocity was used to render them, and, despite the varied finger movements and velocities while exploring the textures, the participants did not perceive any latency in the vibrotactile rendering (see Sec. 5.2).Di Luca et al. [15] demonstrated similarly, in a VST-AR setup, how visual latency relative to proprioception increased the perception of stiffness of a virtual piston, while haptic latency decreased it.Another complementary explanation could be a pseudo-haptic effect of the displacement of the virtual hand, as already observed with this vibrotactile texture rendering, but seen on a screen in a non-immersive context [47].Such hypotheses could be tested by manipulating the latency and tracking accuracy of the virtual hand or the vibrotactile feedback.
We can outline recommendations for future AR/VR studies or applications using wearable haptics.Attention should be paid to the respective latencies of the visual and haptic sensory feedbacks inherent in such systems and, more importantly, to the perception of their possible asynchrony.Latencies should be measured [19], minimised to an acceptable level for users and kept synchronised with each other [16].It seems that the visual aspect of the hand or the environment on itself has little effect on the perception of haptic feedback, but the degree of visual reality-virtuality can affect the asynchrony sensation of the latencies, even though they remain identical.Therefore, when designing for wearable haptics or integrating it into AR/VR, it seems important to test its perception in real, augmented and virtual environments.
The main limitation of our study is the absence of a visual representation of the virtual texture.This is indeed a source of information as important as haptic sensations for the perception of both real textures [4,5,49] and virtual textures [14,22,33], and their interaction in the overall perception is complex.
Also, our study was conducted with an OST-AR headset, but the results may be different with a VST-AR headset.Finally, we focused on the perception of roughness sensations using wearable haptics in AR vs. VR using a square wave vibrotactile signal, but different haptic texture rendering methods should be considered.More generally, many other haptic feedbacks could be investigated in AR vs. VR using the same system and methodology, such as stiffness, friction, local deformations, or temperature.

Conclusion
We investigated virtual textures that modify the roughness perception of real, tangible surfaces, using a wearable vibrotactile device worn on the finger.To this end, we first designed and implemented a visuo-haptic texture rendering system that allows free exploration of the augmented surface using a visual AR/VR headset.We then conducted a psychophysical user study with 20 participants to assess the roughness perception of these virtual texture augmentations directly touched with the finger (1) without visual augmentation, (2) with a realistic virtual hand rendering in AR, and (3) with the same virtual hand in VR.The textures were on average perceived as "rougher" and with a higher sensitivity when touched with the real hand alone than with a virtual hand either in AR or VR.We hypothesised that this difference in perception was due to the perceived latency between the finger movements and the different visual, haptic and proprioceptive feedbacks, which were the same in all visual renderings, but were more noticeable in AR and VR.With a better understanding of how visual factors influence the perception of haptically augmented tangible objects, the many wearable haptic systems that already exist but have not yet been fully explored with AR can be better applied and new visuo-haptic renderings adapted to AR can be designed.

Figure 2 :
Figure2: Diagram of the visuo-haptic texture rendering system.Fiducial markers attached to the voice-coil actuator and to tangible surfaces to track are captured by a camera.The positions and rotations (the poses) T , = 1.. of the defined markers in the camera frame F are estimated, then filtered with an adaptive low-pass filter.These poses are used to move and display the virtual model replicas aligned with the real environment.A collision detection algorithm detects a contact of the virtual hand with the virtual textures.If so, the velocity of the finger marker X is estimated using discrete derivative of position and adaptive low-pass filtering, then transformed onto the texture frame F .The vibrotactile signal is generated by modulating the (scalar) finger velocity ˆ in the texture direction with the texture period (see Eq. 1).The signal is sampled at 48 kHz and sent to the voice-coil actuator via an audio amplifier.All computation steps except signal sampling are performed at 60 Hz and in separate threads to parallelize them.

Figure 3 :
Figure 3: Visuo-haptic texture rendering system setup.(a) HapCoil-One voice-coil actuator with a fiducial marker on top attached to a participant's right index finger.(b) HoloLens 2 AR headset, the two cardboard masks to switch the real or virtual environments with the same field of view, and the 3D-printed piece for attaching the masks to the headset.(c) User exploring a virtual vibrotactile texture on a tangible sheet of paper.

Figure 4 :
Figure4: Change in frequency of a sinusoidal signal with and without phase matching.Phase matching ensures a continuity and avoids glitches in the rendering of the signal.A sinusoidal signal is shown here for clarity, but a different waveform will give a similar effect.

Figure 5 :
Figure 5: The three visual rendering conditions and the task of the user study.During a trial, two tactile textures were rendered on the paper sheet (black rectangle), one after the other, then the participant chose which one was the roughest.The visual rendering stayed the same during the trial.(Real) The real environment and real hand view without any visual augmentation.(Mixed) The real environment and hand view with the virtual hand.(Virtual) Virtual environment with the virtual hand.

Figure 6 :
Figure 6: GLMM results in the vibrotactile texture roughness discrimination task, with non-parametric bootstrap 95% CIs.(a) Percentage of trials in which the comparison texture was perceived as rougher than the reference texture, as a function of the amplitude difference between the two textures and the visual rendering.Curves represent predictions (probit link function) and points are estimated marginal means.(b) Estimated PSE of each visual rendering.(c) Estimated JND of each visual rendering.

Figure 7 :
Figure 7: Boxplots and geometric means of response time at the end of a trial, and finger position and finger speed measures when exploring the comparison texture, with pairwise Tukey's HSD tests: * is p < 0.05, ** is p < 0.01 and *** is p < 0.001.(a) Response time of a trial.(b) Distance traveled by the finger in a trial.(b) Speed of the finger in a trial.