搜索
您的当前位置:首页正文

GigaVoxel

来源:二三娱乐
GigaVoxels:Ray-GuidedStreamingforEfficientandDetailedVoxelRendering

CyrilCrassinFabriceNeyretLJK/INRIA/GrenobleUniversities/CNRS

SylvainLefebvreINRIASophia-Antipolis

ElmarEisemann

MPIInformatik/SaarlandUniversity

Figure1:Imagesshowvolumedatathatconsistofbillionsofvoxelsrenderedwithourdynamicsparseoctreeapproach.Ouralgorithmachievesreal-timetointeractiveratesonvolumesexceedingGPUmemorycapacitybyfar,tankstoefficientstreamingandray-castingsolutions.Basically,thevolumeisonlyusedattheresolutionthatisneededtoproducethefinalimage.Besidesthegaininmemoryandspeed,ourrenderingprovidesisinherentlyanti-aliased.

Abstract

Weproposeanewapproachtoefficientlyrenderlargevolumetricdatasets.Thesystemachievesinteractivetoreal-timerenderingperformanceforseveralbillionvoxels.

Oursolutionisbasedonanadaptivedatarepresentationdepend-ingonthecurrentviewandocclusioninformation,coupledtoanefficientray-castingrenderingalgorithm.Onekeyelementofourmethodistoguidedataproductionandstreamingdirectlybasedoninformationextractedduringrendering.

OurdatastructureexploitsthefactthatinCGscenes,detailsareoftenconcentratedontheinterfacebetweenfreespaceandclustersofdensityandshowsthatvolumetricmodelsmightbecomeavalu-ablealternativeasarenderingprimitiveforreal-timeapplications.Inthisspirit,weallowaquality/performancetrade-offandexploittemporalcoherence.Wealsointroduceamipmapping-likeprocessthatallowsforanincreaseddisplayrateandbetterqualitythroughhighqualityfiltering.Tofurtherenrichthedataset,wecreateaddi-tionaldetailsthroughavarietyofproceduralmethods.

Wedemonstrateourapproachinseveralscenarios,liketheexplo-rationofa3Dscan(81923resolution),ofhypertexturedmeshes(163843virtualresolution),orofafractal(theoreticallyinfiniteres-olution).Allexamplesarerenderedoncurrentgenerationhardwareat20-90fpsandrespectthelimitedGPUmemorybudget.Thisistheauthor’sversionofthepaper.TheultimateversionhasbeenpublishedintheI3D2009conferenceproceedings.

1Introduction

Volumedatahasoftenbeenusedinthecontextofscientificdatavisualization,butitisalsopartofmanyspecialeffects.Companies

datastreamingfromlarger,butslowermemoryunits.Thispointisparticularlycrucialforreal-timeapplicationsbecausethememoryrestrictionsontheGPUaremoreseverethanforCPUsoftwarepro-ductions.EvenifwewereabletoiterativelyfilltheGPUmemoryinabrute-forcemanner,thetransferof512MBeachframe(whichisthestandardmemorysizeoncurrentGPUs)alreadypreventsreal-timeperformance.

Thesecondchallengeistheactualrenderingofvolumes.Display-ingdatawithpurefixed-stepraymarchingcanleadtoaliasartifactsandiscostlyduetothelargeamountofvoxelsthatneedstobevis-ited,shaded,andblended.

Fortunately,mostofthetime,onlypartsofthevolumeareneededatthefullresolution.Othersareinvisibleordistantenoughtobeapproximatedwithoutqualityloss.Occlusioncanalsocoverpartsthatinconsequencecanbeomitted.Thiscanhappenevenifthescenehasnoopaqueentity.Transparentmaterialscanquicklyac-cumulatetodenser,thusopaque,elementswhichblocktheview.Thispaperaimsatmakingvoxelsanalternativerenderingprimitiveforreal-timeapplications.Ourframeworkishighlyinspiredbyvoxelenginetoolsusedinspecialeffectproductions:itliftsfeaturesknowntobetimeandmemoryconsuming(eveninthescopeofofflineproduction)tointeractiveandreal-timerates.

2PreviousWork

Muchworkhasfocusedonvolumevisualization,toomuchtobecoveredinthispaper.Forarecentoverviewofreal-timevolumerenderingtechniques,wereferthereaderto[EHK∗06].Also,manyapproachesfocusonovercomingdiscretizedrepresentationstore-coverasmoothsignal.Thisalsoincludeshigh-qualitynormalre-construction(e.g.,[MMMY97]).Ourfocusliesmoreondataman-agementandefficientartifact-freerendering.Thehighresolutionofourinputalsohides,tosomeextent,aliasingissues.

Fullvoxelgridshavebeenusedinmanyapplicationstobenefitfromthevisualcomplexityallowedbyvolumesfor,e.g.,fur[KK89],vegetation[DN04],orpseudo-surfaces[Ney98].Becauseoftheverydetailedandrealisticappearancethatarisesfromtheuseofvoxels,mucheffortwasspentonacceleratingvolumerendering.Recently,manyGPU-basedapproacheswerepublished.Earlierso-lutionsreliedonvolumeslicing[LL94],whereasnewerapproachesperformaraymarchinginthefragmentshader.

Manyefficientmethodsassumethattheentirevolumeispresentinthegraphicscard’smemory,thushighlylimitingthepossibledetaillevel.Therefore,inthecontextofreal-timerendering,vox-elsremainedmostlyavaluablerepresentationfordistantmodels(e.g.,[MN00,GM05,DN09])becauseresolutioncanbekeptlow,counteractingtheusualmemorycost.

Mostrecentmethodsrelyonsomeraymarchingprocedurewhereopaquemodelsareofparticularinterestastheyallowforanearlyraytermination.Heightfieldssharethisconcept.Italsoaimsatimprovingperformanceviasophisticatedpreprocessingmeth-ods[BD06]andintelligentraymarching[PO07],butaremostlylimitedtothisrestrictedtypeofdata.

Itistruethough,thatafullvolumeisnotalwaysneeded.Forcom-putergraphicsscenes,itoftensufficestohavedetaillocatedmostlyinalayerattheinterfacebetweenempty-andfilledspace.Theobservationhasbeenusedinrenderinginformofspecializedrep-resentationssuchasshellmaps[PBFJ05],relieftextures[OBM00],bidirectionaltextures[TZL∗02],andhypertextures[PH89].Inallcases,volumedataisrepresentedinalimitedinterfaceattachedtoanobject’ssurface,ortothespacearound,asisthecaseforsomerecentGPUstructures[LHN05b,LH06,LD07].Nonetheless,the

Figure2:InCGscenes,detailstendtobeconcentratedatinterfacesbetweendenseclustersandemptyspace.

storageandrenderingofthesepreviousmethodsareonlyefficientifthevolumelayerremainssmallonthescreen.Instead,wewilldealwithgeneralvolumes.

Generalvolumetricdatasetsalsooftenconsistofdenseclustersinfreespace(compareFigure2)so-calledsparse-twice.Onecanben-efitfromclustereddata,e.g.,bycompaction[KE02]orfasttraver-salofemptyspace,avoidanceofoccludedregions[LMK03],orbystoppingrayswhenacertainopacitylevelisreached[Sch05].Wealsowanttomakeuseofregionsofconstantdensityforaccelerationandcompactionpurposes.

Althoughinspiredbytheaforementionedwork,thispaperfocusesonout-of-corevoxelrenderingtoachievetheaimofrichnessoftherepresentation,inconditionswherethememoryoftheGPUistypicallyordersofmagnitudesmallerthanthedata,whichcanbeverydetailed.Muchresearchhasfocusedonthetopicofmassivevolumerendering.Boadaetal.[BNS01]analyzethevolumedatabycomputingamipmap-likestructurebasedonanoctree.Theythenchooseacutthroughthetreeandusetheleaves’mipmapdataduringrendering.LaMaretal.[LHJ99]keepallthesedif-ferentresolutionlevelsinmemoryandchoosetheblocksaccordingtothedistancefromtheobserverduringrendering.Ourgoalisacombinationofboth:view-dependentrenderingformassivevol-umes.Thisiscloseto[GS04],wheredataiscompressed,usedinaview-dependentmanner,andemptyspaceisskippedefficiently.Al-thoughrelativelyefficient,theapproachinvolvesmuchCPUwork,doesnotachievetemporalcontinuity,nordoesitincludeocclusionteststocullhiddenpartsofthevolume.

Volumedecompositionsintoblockshavebeenusedinmanyotherapproaches,e.g.,[HQK05],butwithoutfiltering,theblockstruc-tureremainsvisibleandaliasingartifactscanoccur.Filteringwasdiscussedin[VSE06]wheretwoapproachesarepresented.Onebasedon[LHN05a],butleadingtoamoredifficultfilteringscheme,andonewithhigherprecisionbasedon[LKS∗06],butthattendstooverrefinesomepartsofthescene.Thestructureisview-independentthough,whereasourapproachadaptsresolutioncontinuously.Thismakesdiscontinuitiesdisappearandtemporaltransitionstoadifferentsubdivisionlevelaresmooth.

Ourstructuresharessimilaritieswithbrickmaps[CB04],butisdynamic,and,asaconsequence,itscontentisview-dependent(throughLOD(level-of-detail)andvisibility).Wethuscombineanadaptiveversionofthestructurefrom[LHN05a]withregularconstant-sizedmemoryblocks[LKS∗06],whichallowforveryef-ficienthardware-basedfiltering.

ThemostrelatedworkhasveryrecentlybeenpublishedbyGobbettietal.[GMAG08]andwewilltakeacloserlookattheirsolutioninthenextsection.Preliminaries

OurapproachwasdevelopedinparalleltotheworkofGobbettietal.[GMAG08].Whileitsharessimilaritieswiththiswork,ourap-proachprovidesbetterquality/performanceandoffersnewpossi-

bilitiessuchasproceduralcontentandlevel-of-detailmanagementthuspushingforwardtheideaofmakingvolumesarenderingprim-itive.Tobetterunderstandourcontributionsandmakethispaperselfcontained,wewillfirstgiveanoverviewofwhatourapproachshareswithGobbettietal.’sone.

Let’sstartwithanaiveconsideration.Ifthevolumeissmall,GPUsallowforanefficientrenderingbysimplysteppingthroughthedatasetstoredina3Dtextureandaccumulatingthevoxels’colors.Inthisway,onecanbenefitfromseveralhardwarerelatedadvantages,likedirect3Daddressing,tri-linearinterpolation,anda3Dcoherenttexturecachemechanism.Forlargervolumes,eveniftheGPUhadenoughmemory,therearetwoproblems:first,thealgorithmwouldbeslowduetomanystepsthatneedtobetakeninlargedatasetsand,second,thewholedatasetwillnotfitintotheGPUmemory.Oneinsightisthat,foragivenpointofview,nottheentirevolumeneedstobeinmemory.Byorganizingthedatainaspatialsub-divisionstructure,emptypartscanbeletunsubdividedanddistantpartscanbereplacedbylowermipmaplevels,leadingtoalowerresolution(adifferentlevel-of-detail,LOD),thuslessGPUmem-oryrequirements.Inaddition,foragivenpointofview,hiddenpartsdonotneedtobeloadedatall(seeFigure2).AsGobbettietal.,wechoseanoctreestructure,whichisconvenienttorepresentandtraverseontheGPU[LHN05b,LHN05a],andiswelladaptedtostoreregulardatalikevoxels.

Theoctreeisrefinedastoreflecttheneededprecisioninthescene.Eachtreenodecontainsapointertoaso-calledbrickorisindicatedasconstant/emptyspace.Abrickisasmallvoxelgridofsomepre-definedsizeM3(usuallyM=32)thatapproximatesthepartoftheoriginalvolumethatcorrespondstotheoctree’snode.Forexample,thebrickforarootnodewouldbeanM3voxelapproximationoftheentiredataset.Thedatarepresentationthuscombinesthemem-oryefficiencyofanadaptedstructurewithsmall3Dtextureunitsthatcanbeefficientlyevaluatedthroughraymarching.

Allbricksaregroupedinalarge3Dtexture,thebrickpool,storedontheGPU.Thismemoryislimitedandsoitscontentneedstobechosenwiselyandupdatedwhentheviewpointchanges.Atalltimes,thealgorithmmakessurethatthebricksofallvisibleleafnodesareinthepool.Thisenablesanappropriaterenderingoftheentirevolumefromthecurrentviewpoint.Forothernodes,bricksmightbemissing,inwhichcasepointersareinvalidated.

Whentheviewpointismoved,nodesinthetreearefusedorcol-lapsedbasedontheneededresolutionandvisibility.Anupdateistriggeredifdatainthebrickpoolismissing.Eachbrickinthepoolhasacertaintimestampthatisresetuponusage.IfanoctreenodeneedsasubdivisionandnewbricksaretransferredtotheGPU,thealgorithmwillusethememorylocationsthatwerepreviouslyre-servedfortheoldest,thusunused,bricks(thisconceptisreferredtoasLRU-LastRecentlyUsed).

Tokeeptrackofthecurrentdataorganizationandfacilitateupdates,thestructureismirroredontheCPU.Structuremodificationscanthusbedoneonthehostwithallitsgeneralcomputationalcapaci-ties,andonlychangesneedtobetransferredtotheGPU.Thisfa-cilitatessomeoftheoperations,suchastheLRUorderingontheCPU.Italsoallowsforabrickcacheinthemainmemory,whichisimportantifthevolumeissolargethatharddiskaccessesarenecessary.

3OverviewandContributions

Withrespecttotheworkdescribedintheprevioussection,wein-troduceseveralimportantimprovements.InSection4,wepresentourgeneralizeddatastructureanddiscussitscondensedrepresenta-tionontheGPU.Itreducesmemoryrequirements,simplifiesimple-

mentation,acceleratescomputations,aswellasupdates,andallowsustoexploitthesparse-twicedatastructuresoftenencounteredincomputergraphics.InSection5,weexplainabasicrenderingstrat-egythatensuresacorrectlyproducedimage.Thissectionalsode-tailsthebasicprocessesinvolvedtotraversethegrid.

Wealsoaddressaliasing,whichappearsbecausethevalueofapixelshouldnotbedefinedbyaccumulationsalongaray,butbytheintegrationalongaconedefinedbytheeyeandthepixel’sex-tent.Computingthisvaluewithseveralrays(FSAAmethods)in-creasescomputationtimesignificantly.Asfor2Dtexturing,awaytoovercomethisissueistousemipmapping.Equivalently,avoxelhierarchycanbebuiltbyiterativelydownsamplingandaveragingneighbors.Duetothefilteringprocess,readingfromamipmapdeliverstheintegratedvalueofasmallvolume.Ifthestepsizedur-ingtheraymarchingischosenaccordinglytothedistancefromtheobserver,itispossibletoobtainagoodapproximationoftheac-tualconeintegralandacceleratecomputations.Mipmappingthusaddressesaliasingtoalargeextent,smoothestheresultsandcanincreaserenderingperformance,butitincreasesmemoryconsump-tionandmakestreeupdatesmorechallenging.

WethendiscussamoreadvancedsolutionthatreliesonanLRUmechanisminSection6.Here,weprovideaunifiedrender-ing/visibility/LODframeworkthatisefficient,eliminatesmuchofthepreviousCPUinteraction,andiseasytoimplement.

Finally,Section7willshowseveralresultsandpossibilitiestoam-plifyandincreasethedetailleveloftheoriginaldataprocedurally.Thisenableshigherrenderingqualitythanpreviouslypossible.

Figure4:OurN3-tree+brickstructure(illustratedasa22quad-treeforclarity).

Inourstructure,onlyasinglepointerisusedpernode,whereGob-bettietal.[GMAG08]neededeightsuchpointersforthechild,aswellaspointerstoneighboringnodes.Tomakethispossible,wekeepallnodesina3Dtexture,whichwerefertoasthenodepool.ThetextureisorganizedintoblocksofN3nodes.Groupingthechildren3ofanodeinonesuchblockmakesitpossibletoaccessallNchildnodeswithasinglepointer.Inthisway,notonlycoher-enceislargelyimprovedduringthetraversal(3DtexturecachesonmodernGPUshelpdrastically),butalsomemoryrequirementsarelargelyreduced.Eventhoughourstructuredoesnotexhibitneigh-borpointersbetweenadjacentnodes,wewillshowinSection5.1thatanefficienttraversalremainspossible.

Foreachnode,wereservedatatoeitherstoreaconstantvalue(ho-mogenousvolume),orapointertowardsabrick.Withthisinfor-mation,wereducethepreviouseightRGBA(32bit/channel)tex-els[GMAG08]totwo32bitvalues.Thisresultsina16timesstor-ageimprovement.ItmayseemlessimportantbecausetheN3-treememoryoccupationismuchlowerthanforthebrickpool,buttheamountofdatareadduringthetraversalofthetreeiscriticalfortheGPUrenderingperformancesandupdatestothestructurecanbeachievedwithsignificantlylessinformationtransferfromtheCPU.

Figure5:Nodeimplementation.

The64bitnodedataisrepartitionedasfollows(seeFigure5):-30bits-encodeapointertothechildnodesfornon-leaves(zeromeaningthatthereisnochild);

-1bit-indicateswhetherthenodeisrefinedtoamaximum,orwhethertheoriginalvolumestillcontainsmoredata;

-1bit-storeswhetherthecontentisaconstantRGBA8value(e.g.,emptyorcoreregions)ordescribedbyabrick;-30/32bits-accordinglyrepresenteither:

-ApointertoanM3brickfornon-constantleaves(30bits);-Theaveragevalueatthislocationforhomogenousleaves(4*8bits).Inthispaper,wealsointroduceahigh-quality(quadrilinear)filter-ingbasedonmipmapping(Section5.2).Withoutthisanti-aliasing,theamountofdatapernodecouldbefurtherreducedtoonly32bitsbecausethenonlyleavesneeddatapointers.Thus,childpoint-ers,brickpointers,andconstantvalues(storedasRGBA6)areallexclusiveandcansharethesame30bits.Theremainingtwobitsleaveenoughroomfortherestofthenecessarynodedata.Hence,

volumedataisnotonlyneededatthehighestresolution(leavesofthetree),butalsoforinteriornodes.Theperformancecostofusing64insteadof32bitsisminorbecauseitstillfitsinasingletexelofaluminance-alphatexture.

5BasicApproach

Inthissection,wewilldiscusshowtorenderandupdatethevolumerepresentedbytheN3-tree.Itisusefultofirstconsiderthesimplercaseofstandardrendering,whereweassumethatallnecessarydataispresent.Thiswillallowthereadertofurtherfamiliarizewiththedatastructurebeforeweaddressthedynamicupdates.Inparticular,ouradvancedapproachinSection6combinesthisrenderalgorithmandavisibilitymechanismtotriggerupdatesinasingleunifiedmethod.Previousworkreliedonaseparatesteptoanalyzehowtoadaptthestructureandwhichbricksshouldbeloadedtonext.ThisusuallyinvolvedmuchCPUinteraction,whereasmostofourcomputationswillbeexecutedontheGPU.5.1Ray-Casting

Ourrenderingconsistsofmarchingthedatainthestructurealongtheviewrayswhileaccumulatingcolorandopacity.Hierarchically,raysneedtotraversetheN3-tree,andwhenreachingaleafnode,theM3-bricksorhomogenousregion.

Toinitializetherays,weuserasterizationtodrawsomeproxyge-ometrythatdeliverstheoriginsanddirectionscorrespondingtotheviewraystothefragmentshader.Onecoulduseascreen-coveringquadonthenearplane,theboundingboxofthevolumedata,orsomeapproximategeometrythatcontainsthenon-emptyareasofthevolume.Suchaproxyfortheinitialpositioncanalsobeusedtodeterminewheretherayleavesthevolume.

Bothextremescanbecomputedinasinglepre-renderingbyacti-vatinganalphablendingsettoamaximumblendinganddrawingthedepthofthefront-facingsurfacesintheluminance,andoneminusthedepthoftheback-facingonesinthealphachannel.Al-ternatively,wecantiletwodepthbuffersinonetextureandletthegeometryshadermovetheback-facingtrianglesinthesecondtilewhileinvertingtheirdepth.Inpractice,toallowafaircomparisontopreviouswork,allourtestswereperformedusingthenearplanetoinitializetheraysandthevolumesboundingboxtostopthem.OneefficientmethodtotraversethetreealongtheinitialrayswouldbearecursiveDDA(i.e.,generalizedBresenham)throughtheN3treenodes.Thisalgorithmreliesonastackwhichcanbeimple-mentedonGPUasdynamicallyindexedmemory.ThisisinefficientoncurrentGPUs(duetothelackoffastindexablememorywithinshaderprocessors).

Instead,weuseaniterativedescentfromthetreerootsimilarlytothekd-restartalgorithm[HSHH07].Itisparticularlyefficientbe-causewehighlybenefitfromtheN3-treestructure.3WestartwiththeoriginoftheraysandlocatethemintheN-tree(top-down).Thedescentstopswhenwereachanodewiththeappropriatelevel-of-detail(notnecessarilyaleaf).Suchanodeeitherrepresentsaconstantregionofspace,orcontainsabrickwhoseresolutionisfineenoughsothatavoxelprojectstoatmostonepixel.Incaseofaconstantnode,thevalueissimplyintegratedanalyticallyalongthedistancetheraytraversedinthenode.Incaseofabrick,astan-dardray-marchingisapplieduntilweleavethecurrentnode.Thenewpositionthenservesastheoriginforthenextdescent.Oneim-portantobservationisthatourtraversaldoesnotneedthestructuretoindicatecorrectlevel-of-details(asdonepreviously[GMAG08]).Thisisdeterminedintheshader.Aswewillseelater,thisisakeyfeaturetominimizeupdateoperations.

ThedescentisfastbecausethecoordinatesofapointcanbeuseddirectlytolocateitintheN3-tree.Letx∈[0,1]3bethepoint’slocalcoordinatesinthevolume’sboundingboxandcbethepointertothechildrenoftherootnode.Theoffsettothechildnodecontainingxissimplyint(x∗N),theintegerpartofthemultiplicationbetweenxandN.Wefetchthechildatc+int(x∗N)andcontinuewiththedescentbyupdatingxwithx∗N−int(x∗N).Eventhoughanewdescentisneededeverytimeanodeisleft,mostlythesamepointersinthestructurewillbefollowed,thusthehardwaretexturecacheisverywellprepared.

Thisray-castingisperformedusingonebigfragmentshaderforbothtop-downtraversalandbricksampling.Itprovedmoreeffi-cientthanmakingthesetwostepsseparatepasses,probablyduetolocaldatastorageandtexturecache.5.2HighQualityFiltering

Figure6:Ourmethod(top)doesnotshowthenoiseofstandardtracing(bottom).

Therenderingofthebricksshouldactuallybeperformeddifferentlythanwhatwepreviouslyexplainedbecausewewanttoperformabetterfilteringofthevalues.Hence,weadaptthesamplingrateandtherelativemipmaplevelduringtheraymarchingdependingontheviewpoint.

Asmentionedbefore,ourtraversalstopswhenitreachestheappro-priatenode.Theideabehinditwastwofold.Ontheonehand,itallowstoincreaserenderingspeed,ontheotherhand,itisagoodapproximationofaconeinsteadofraytracing.Forsmoothtransi-tionsbetweendifferenttreeconfigurationsandabetterfittotheac-tualconesize,weneedaccesstoothermipmaplevelsofthebricks.Fortunately,thesecorrespondtothebricksinnodesweencounterduringthetreedescent(seeFigure4).

Onemightthinkthatmanymipmaplevelsmightbenecessarytoperformthisinternalfiltering,butabrickonlydescribesasmallextentofspace.Itcanbeshownthatforasmallnearplaneoffsetthreelevelsareenough.Threeisalsotheminimumbecause,ifthefilterkernelattheentrypointisonlyslightlybelowthenextmipmaplevel,itmightexceeditwhentheconeleavesthebrick.Forproperblendingthreelevelsareamust.

Tomakethesebricksavailablefortraversal,wecollectthemdur-ingthetop-downtraversal.Weuseasmallqueueofthreeelementsimplementedinshaderregisterswithoutusingdynamicindexingoperations.Storingdataonlyinleavesandusingneighborpoint-erstostepthroughthevolumearebothelementsthatwouldmakeappropriatefilteringverydifficult.

Therecoveredcolorandopacityvaluesarethenaccumulatedwithpre-integratedtransferfunctions(e.g.,[EKE01]).Aphasefunc-tionandpseudo-Phonglighting(usingthedensitygradientasthenormal)caneasilybeaccountedfor.

5.3TreeUpdatesviaInterruptingandResuming

Sofar,wehavediscussedtherenderingprocedureinthecasethatalldataispresent,butforcomplexdata,notallofitmightresideinGPUmemory.Now,thesimplestwaytoachieveacorrectoutputimageistostarttheraytracingprocessandtostoprayswheneverdataismissing.ThisiscommunicatedtotheCPU,whichinitiatestheloading,thenthealgorithmresumesatthelastrayposition.Theconsequenceisthatseveralpassesareperformedtocompletetheoutputforagivenframe.Toonlytreatactiverays,weuseastandard

techniqueinGPUraytracingwhichistoblockterminatedpixelswiththeZ-Buffer,relayingonthez-cullandearly-ZGPUfeaturestopreventtheirexecution.

Wheneverwestoparay,weneedtomaintainsomestatusinfor-mationabouttheray:itspreliminaryoutputcoloraccumulation,themissingnodeID,andthecurrentmarchingpositioninspace.WecanfitthisinformationintwoMultipleRenderTargets(MRTs).ThemissingIDsaredownloadedtotheCPUthatthenadaptsthedatastructure.

Duringthenextrenderingpass,initiatedwithaproxysurface,thesebuffersareusedasinputtextureandtheearlyZtechniquewillen-surethatonlyactivepixelswillhavethefragmentshaderexecuted.Eventhoughthisisveryrudimentary,thisalgorithmalreadyshowssomeniceproperties.Loadeddataisdirectlytraversedbytheray,andnodatalyingoutsidethefrustumwilleverbeloaded,eventhoughwedidnotexplicitlytestforit.Further,theCPUdoesnotneedtotracktheneededLODs,theraysthemselveswillindicatethroughtheirtraversalwhichprecisionisneededineachareaofthescreen.Nevertheless,thesimplesystemasdescribedsofarwouldnotknowwhatdatabecameuseless.Inthefollowing,wewilldis-cussamorecomplexLRUsystemandmodificationsofthebasicalgorithmthatallowforreal-timeperformance.

6AdvancedAlgorithm

Wediscussedourimplementationofthespatialsubdivisionstruc-turethatenablesrenderingoflargevolumetricmodels.Wealsoex-plainedhowrenderingisperformed,andpresentedasimplemethodtoupdatethedatainthetree.ThisfirstapproachinSection5.3interruptsraytracingifdataismissing,triggersanupdateandre-sumesoncethenewdatahasbeenuploadedtotheGPU.Thisleadstoaccuratelyrenderedimagesbutcostsmuchtime.

Theseconddisplaymodewewilldiscussnowintroducessomeap-proximation.Ifdataattheneededresolutionismissing,ahighermipmaplevelistried.Onlyifnoneispresentortheavailabledataistoocoarse,weresorttotheiterativemethodwehaveseenbefore.Usinghighermipmaplevelsisoftenagoodchoice,asithasbeenshownthatweperceivelessdetailsduringstrongmotion,whenitismostlikelytoencountermissingdata.Oncethecamerasettles,anaccuraterenderingisachievedwithinafewframes.Fixingabudgetfortheuploadthusdeliversarelativelystableframerate.

Nonetheless,informationaboutnodeusagestillneedstobecom-municatedtotheCPUandnotonlyforthefirsthitnode,butfortheentiretrajectoryoftheray.Inthissection,wewilldiscussoursolutiontothisproblem.

6.1CombiningRenderingandVisibility

ThebasicprincipleistosubdivideorfusenodesintheN3-treeinanLRUmanner.Eachnodeandbrickhasatimestampthatisresetuponusageandwheneveranewelementisaddeditwillreplacetheoldest.Let’ssupposeforamomentthattheCPUhadaccesstothisinformationforallrays.Thenitcanconvenientlyupdatethetimes-tampofthepoolelements(nodesandbricks)initsmirroreddatastructure.Furthermore,itcantriggertheloadingofneededdata.Itevenknowswheretostorethebrickinthepoolbecausetimes-tampsaremaintainedinhostmemory.AllnecessarymodificationsarethentransferredtotheGPUviatexture-updatecalls.ThisleadstoaunifiedmanagementofthebrickpoolandthenodepoolastwoLRUcontrolledcaches.

Thequestionthatremainsishowtodeterminewhichnodeswereused,whichonesneedrefinementorcanbecoarsened,andwhatdataismissingontheGPU.In[GMAG08],thisinformationisde-rivedbyevaluatingnodeLOD’sandfrustumcontainmentthrough

treetraversalontheCPU,thatinducesasignificantoverhead.Fur-ther,visibilityisdeterminedviaocclusionqueriesagainsttheren-deredimageofthepreviousframe.Butevenwheninterleavedin-telligently,occlusionqueriesresultinasignificantcost.

Followingourstrivetoestablishvolumedataasarenderingprim-itive,wewanttoavoidmuchoftheCPUinteraction.Ultimately,theGPUshouldsteerwhichdataisloadedandindicatewhatdatawashelpfulinthepreviousframes.Interestingly,thisinformationisavailableontheGPUwhenraytracingterminatesbecausethecurrentvolumewasjusttraversed.Thisisthekeyinsightthatmo-tivatedustocombinerenderingandvisibilityupdatephase.Inoursolution,besidestheoutputcolor,eachrayalsooutputssomesupplementaryinformationaboutdatausageorupdateneeds.Be-causetheinformationiscollectedbytherays,frustumcontainment,visibilityandLODselectionarenaturallyhandledinoneunifiedwayandallworkloadistakenofftheCPU.

Therearetwomajorproblemsthatneedstobeaddressedandthatwewilldescribeinmoredetailinthefollowingsections.Firstandespeciallyinthepresenceoftransparency,eachraytraversesalargenumberofnodes,butwecanonlyoutputalimitedamountofinfor-mationperfragmentintheframebuffer(evenwhenusingMRTs).Ad-hocsolutions,e.g.,storingonlythefirstnode,wouldnotleadtoacceptableresultswithaLRUscheme.Second,thisinformationneedstobesentbacktotheCPUtotriggertheupdates.Asband-widthislimited,wewillshowhowtoperformacompactionbeforedataexchange.

6.1.1

RenderedFeedbackInformation

Duringrenderingwetraversethetreeandstopwhenwereachthenodecorrespondingtotheneededlevel-of-detail.Iftherequireddataispresent,wetraversethemipmappedbrick.TomakesurethattheCPUkeepstheseelementsinthepool,wecollectthecur-rentnodeindexinanode-list(itwillbetheCPUthatmakessurethatallthenode’sparentsneededforthemipmappingalsoremaininthecache).Iftheappropriatelevel-of-detailismissingorthenodeisterminal,meaningthatthereisnomorerefineddataintheoriginalmodel,wesimplyaddittothenode-list.Rememberthatthisinformationisindicatedviaonebitinthenode(Figure5).Oth-erwise,wealsoaddittothenode-list,butindicatethatitneedsfurthersubdivision.Inpractice,itworksbesttolimitsubdivisionstoonerequestperray.Asinglesubdivisionmightalreadychangeopacityandallowanearlyrayterminationinthenextframe.Asstatedbefore,arbitrarilysizednode-listscannotbeefficientlyoutputoncurrentGPUs(currentlyonlyeightRGBA32rendertar-getsarepossible).Anadaptedrepartitionofthedataisnecessary.Wereservethefirstrendertargetfortheoutputcolor.Theremain-ingbufferswillkeepthenodeinformationandwewillrefertothemasthenode-listtextures.

Eachnode-listtextureprovidesroomforfourencounterednodes(oneperchannel)andtheirrespectivesubdivisiontag.Thisindi-catorwillmaketheCPUsimplyresetthetimestamp,orinduceasubdivisionbecausetheneededlevel-of-detailiscurrentlymissing.Interestingly,thereisnoneedtosendcollapseinformation.Iftherenderingalgorithmnolongerdescendsintoanode,itsindexwillneverbeputintothelist,thustheLRUmechanismwillnotresetitstimestampandthusreplacethedataatsomepoint.Thisestablishesalazyevaluationschemewhichsimplifiessubstantiallyprevioussolutionsthatneededtomaintainthetreeactively.

Using30bitsforthenodeindex,1bitforthesubdivisiontag,andsevennode-listtexturesallowfora7×4=28-nodeoutputperray.Generally,thisisinsufficientforcomplexscenes.Toensurethatnonodesaremissed,weexploittwoimportantpropertiesofour

algorithm:spatial(onscreen),andtemporalcoherence(betweensuccessiveframes).Thiswillintroducesomeadditionalcomputa-tionsandinpracticewefoundthatthreeMRTsarethebestchoice.Hence,eachpixelcanstore3×4=12nodes.

Neighboringrayswillvisitmostlysimilarnodesbecausethebrickinanodealwaysprojectsontoseveralpixels.Toexploitthisspatialcoherence,wegroupraysinpackagesintheimageplane.E.g.,forapackagesizeof2×2,theupper-leftraywillstorethefirst12traversednodes,theupper-rightthesecond12nodes,andsoon(cf.Figure7).Thistotalsin48nodeindices.

Figure7:Weexploitspatialcoherenceusingapatternattributingdifferentnodesets.

Inaddition,temporalcoherencecanbeused.DuringtraversaltherenderednodesarepushedinaFIFO.Inthefirstframe,westoppushingafter48,inthesecondframeafter96elements.This48-elementwindowisshiftedoverasmallsetofframes,thisincreasesthenumberofretrievablenodesfurther.Forregionswithlessnodes,oneshouldnoticethattheuseofaFIFOensuresaninformationoutputineachframe.Thereisnopenalizationduetotemporalcoherence.Inconsequence,wegainreactivitywherewecan.

6.1.2CompactionofUpdateInformation

Tosimplifyexplanations,wewillignorethespatial-coherencepat-ternandassumethat12nodesperraysuffice.

Oncethenode-listtexturesareconstructed,itwouldbepossibletoreadbackthememorytotheCPU,buttheresultingbandwidthwouldbeenormousandparsingadditionallycostlyonCPU.Ide-ally,oneinstanceofeachnodereferencewouldbeenough.AnaivewaytoachievethisconsistsinsortingthenodeindicesontheGPUandthenuseastreamreductiontoremovemultipleentries.Sortingisexpensiveandwewouldfurthersortmanyelementsthatwouldafterwardsjustbedeleted.Fortunately,wejustsawthatcoherencebetweenneighboringraysisapowerfulpropertyandcanbeusedtocompactionthetexture.Itispossibletoreducethesetsufficientlytomakethesortingstepunnecessary.

Figure8:Selectionstep:(Left:)Listsarecomparedfollowinga2Dpattern-(N)K=(Not)Kept.(Right:)Listentriesareonlycomparedonalocalscope.

Inafirstpass,wederiveaselectionmask.Here,thethei-thbitinanoutputpixelindicatesthatthei-thelementinthelistoftheunderlyingnode-listtexelshouldbekept.Theselectionmaskisbasedonalocalcomparison.WeuseapatternliketheoneshowninFigure8.Itcomparesthecenterlist(orange,Figure8-top-right)againstitssurroundinglistsandanentryinthelistisonlykeptifitdoesnotappearinthesurrounding(Figure8-left).Theoretically,thiscanbecostlyifthelistsarelong,butmostnodeswillbecrossedapproximativelyatthesamemomentbytwoneighboringrays.Thisallowsustoperformanefficientapproximatecomparison:thei-thlistelementisonlytestedagainstthe(i-1)-th,i-th,(i+1)-thelements

Figure9:Someresults:Trabecularbonedata(augmentedwithPerlinnoise)andHypertexturedbunny(basedonadistancefieldapproximatedontheflyontheGPU)

oftheneighboringlists(Figure8-right).Thiscaninduce\"falsepositives\butitremainsconservative.

Afterthisfirststep,eachpixeloftheselectionmaskcontainsabitvector,whosenon-zeroentriesrepresentthelistelementstokeep.WerelyontheHistoPyramids[ZTTS06]toreducethese-lectionmask.Thisalgorithmisaspecialcaseoftheprefix-sumscanandbinarysearchalgorithm,highlyoptimizedtoexploit2DlocalityandGPU’stextureprocessingunits.Thefinalsteprecoverstheactualcorrespondingnodeindices,thenthecompactedtextureistransferredtotheCPU.Duringthisrearrangementprocess,theoriginalpixelpositionwillbelost.Having12nodeentriesperlist,wecanavoidthisproblembyusingtheremaining20bits(over32)toencodethepixelpositionoftheray.Onlywiththislocationin-formationwewillbeabletorecovertheactualnodeindices.Afterthereduction,eachpixelcontainsaraypixel-positionandabitmask.ThisinformationallowsthecreationofacompactnodeindextexturethatwillbereadbackbytheCPU.Inpractice,thebitmaskusuallycontains2-3non-zeroentries.ItisthuspossibletomakethefinalcompactedtextureonesingleRGBA32texturewhichallowsustostoreindicesoffournodes.Ifthislimitisexceededtherequestsareautomaticallypostponedtothenextframe.

7Results

AlltestswereperformedonaCore2bi-coreE6600at2.4GHz,andanNVIDIA8800GTS512graphicscard(G92GPU)with512MB.Allimagesarerenderedat512×512(seeFigures1and9).Example1:Explicitvolume(trabecularbone).

Weuseda10243scannedvolumeofatrabecularbone.Mipmapswereprecomputedandanalyzedforconstantdensity.Thisexam-plestillfitintheCPUmemory.Wecopiedthisvolume8timesineachdirectioninordertosimulatea81923resolution.Inourdatastructure,weusedN=2andM=16.Thealgorithmachieves20-40Hzwithsmoothmipmappingactivated.Without,anaverageof60fpswaspossible.WealsoaddedscalesofPerlinnoisetoincreaserichness.ThisnoiseisaddedprocedurallyontheGPU.Evaluatingnoiseineachframewouldbeexpensive,atalowpenalty,wecre-atebricksontheGPUthatwecanthenuseinsubsequentframes.Incomparisonto[GMAG08],thisindicatesapproximatelya50%speedup(nevertheless,thisisbasedonourimplementation).Example2:Proceduralvolumebyinstantiation(Sierpinski).

Weusedoneuniquebrickofsize813.WenaturallychoseN=3sothatwecanrelyononeuniqueinstantiationforallnon-emptychildren.Theresolutionispotentiallyinfinite,butinpractice,thefloatingpointprecisionofcoordinateslimitsthezoomto219sothemaximumvirtualresolutionis8.4M3.Performanceoftenreaches90Hz,andusuallystaysaround60.

Example3:Hypertexturesandamplificationofamesh.

Forthisexample,weuseavolumedefinedbytheinteriorofamesh.WederiveadistancefieldfromthemeshontheflyontheGPU.andusethesurfacevicinityforhypertexturelookups.Hereweaimedat

veryhighqualityanduse20octavesofPerlinnoise,shading,andcomplexmaterials.DuetothecomplexcomputationsontheGPUtoproducethevolumeinformation,theframerateisrelativelylow(around20FPSfora10243volume).

Example4:Cumuluscloud.

Ourmethodwasusedtoencodetheclouddetailsin[BNM∗08](apaperdealingwithmultiplescatteringinclouds).Evenincombi-nationwithcomplexshading[BNM∗08],ouralgorithmallowsforanincreaseofdetail.WeusedN=2,M=32,and5octavesofPerlinnoisetosimulate3enhancethecumuluscloud,withavirtualresolutionof2048.

Memoryusage:Inmostexamples,thenodepoolwassmall(4MB)correspondingto643entries.Using163bricks,10243indicescanbereferenced.Thebrickpoolused430MBgivingroomfor423bricks.

Weusuallytrade-offsomecomputationtimeformemoryefficiency:inallourexamples,normalsarecomputedon-the-fly.Theprocedu-ralnoiseexamplesfurthershowedthatbrickscanalsobecreatedontheGPU.Thison-the-flynoisecreationprovesactuallytobemoreefficientthanaCPUevaluationandtransfer.

8ConclusionandFutureWork

Wehavepresentedamethodforinteractiverenderingoflargeandverydetailedvolumes.Ourworkshowsthatreal-timeperformancewithhighqualityvolumerenderingispossible.ThealgorithmavoidsmostoftheCPUinteraction.OurcompactdatastructureminimizesmemoryusageontheGPUside.Theintroductionofsmoothtransitionsbasedonmipmappingallowtemporalcoher-enceandanti-aliasing.Thishintsatthepossibilitythatvolumedatacouldbeanimportantfuturereal-timeprimitive.

Currently,animationisabigproblemforvolumedata.Inthefuture,wewouldliketoinvestigatepossiblesolutions.

Anotherinterestingavenuewouldbetoapplyourmethodtootherhierarchicalrepresentations:generalmeshsubdivisions,pointren-dering,orrecentstructureslikevolume-surfacetrees[BHGS06].

Wewouldliketothank:Digisensfortheirsupport;AntoineBouthors,EricBrune-ton,andHedlenaBezerraforproofreading,andtheanonymousreviewersfortheirhelpfulcomments.ThisworkhasbeenpartlysupportedbytheExcellenceClusteronMultimodalComputingandInteraction(MMCI-www.m2ci.org).

References

BABOUDL.,DÉCORETX.:Realisticwatervolumesinreal-time.InEGWorkshoponNaturalPhenomena(2006),Eurographics.BOUBEKEURT.,HEIDRICHW.,GRANIERX.,SCHLICKC.:Volume-surfacetrees.ComputerGraphicsForum25,3(2006),399–409.ProceedingsofEUROGRAPHICS2006.

BOUTHORSA.,NEYRETF.,MAXN.,BRUNETONE.,CRASSINC.:Interactivemultipleanisotropicscatteringinclouds.InACMSymposiumonInteractive3DGraphicsandGames(I3D)(2008).BOADAI.,NAVAZOI.,SCOPIGNOR.:Multiresolutionvolumevisualizationwithatexture-basedoctree.Vis.Comput.13,3(2001).BARGEB.L.,TESSENDORFJ.,GADDIPATIV.:TetradvolumeandparticlerenderinginX2.InSIGGRAPHSketch(2003).http://portal.acm.org/ft_gateway.cfm?id=965491.CHRISTENSENP.H.,BATALID.:Anirradianceatlasforglobalilluminationincomplexproductionscenes.InRenderingTech-niques(EGSR)(2004),pp.133–142.DECAUDINP.,NEYRETF.:Renderingforestscenesinreal-time.InRenderingTechniques(EGSR)(june2004),pp.93–102.DECAUDINP.,NEYRETF.:Volumetricbillboards.Computer

GraphicsForum(2009).

ENGELK.,HADWIGERM.,KNISSJ.,REZK-SALAMAC.,WEISKOPFD.:Real-timeVolumeGraphics.AK-Peters,2006.ENGELK.,KRAUSM.,ERTLT.:High-qualitypre-integratedvolumerenderingusinghardware-acceleratedpixelshading.InACMSIGGRAPH/EUROGRAPHICSworkshoponGraphicshardware(HWWS)(2001),pp.9–16.GOBBETTIE.,MARTONF.:Farvoxels:amultiresolutionframe-workforinteractiverenderingofhugecomplex3dmodelsoncommoditygraphicsplatforms.InACMTransactionsonGraph-ics(ProceedingsofSIGGRAPH)(2005),ACM.GOBBETTIE.,MARTONF.,ANTONIOJ.,GUITIANI.:Asingle-passGPUraycastingframeworkforinteractiveout-of-coreren-deringofmassivevolumetricdatasets.Vis.Comput.24,7(2008),797–806.GUTHES.,STRASSERW.:Advancedtechniquesforhighqualitymultiresolutionvolumerendering.InComputers&Graphics(2004),ElsevierScience,pp.51–58.HONGW.,QIUF.,KAUFMANA.:GPU-basedobject-orderray-castingforlargedatasets.InVolumeGraphics,FourthInterna-tionalWorkshopon(2005),pp.177–240.HORND.R.,SUGERMANJ.,HOUSTONM.,HANRAHANP.:In-teractivek-dtreeGPUraytracing.InACMSiggraphsymposiumonInteractive3Dgraphicsandgames(I3D)(2007).KAPLERA.:Avalanche!snowyFXforXXX.InSIGGRAPHSketch(2003).

http://portal.acm.org/ft_gateway.cfm?id=965492.KRAUSM.,ERTLT.:Adaptivetexturemaps.InACMSIG-GRAPH/EUROGRAPHICSconferenceonGraphicshardware(HWWS)(2002),pp.7–15.KRALLJ.,HARRINGTONC.:Modelingandrenderingofcloudson\"stealth\".InSIGGRAPHSketch(2005).http://portal.acm.org/ft_gateway.cfm?id=1187214.KISACIKOGLUG.:Themakingofblack-holeandnebulacloudsforthemotionpicture`Sphere´withvolumetricren-deringandthef-repofsolids.InSIGGRAPHSketch(1998).http://portal.acm.org/ft_gateway.cfm?id=282285.KAJIYAJ.T.,KAYT.L.:Renderingfurwiththreedimensionaltextures.InSIGGRAPH(1989),pp.271–280.

LEFEBVRES.,DACHSBACHERC.:Tiletrees.InACMSIG-GRAPHSymposiumonInteractive3DGraphicsandGames(I3D)(2007).LEFEBVRES.,HOPPEH.:Perfectspatialhashing.InSIGGRAPH(2006),pp.579–588.LAMARE.,HAMANNB.,JOYK.I.:Multiresolutiontechniquesforinteractivetexture-basedvolumevisualization.InProceed-ingsofVisualization(VIS)(1999),pp.355–361.LEFEBVRES.,HORNUSS.,NEYRETF.:GPUGems2.2005,ch.OctreeTexturesontheGPU,pp.595–613.LEFEBVRES.,HORNUSS.,NEYRETF.:Texturesprites:Textureelementssplattedonsurfaces.InACM-SIGGRAPHSymposiumonInteractive3DGraphics(I3D)(April2005).LEFOHNA.,KNISSJ.M.,STRZODKAR.,SENGUPTAS.,OWENSJ.D.:Glift:Generic,Efficient,Random-AccessGPUDataStructures.ACMTransactionsonGraphics25,1(2006).LACROUTEP.,LEVOYM.:Fastvolumerenderingusingashear-warpfactorizationoftheviewingtransformation.InSIGGRAPH(1994),pp.451–458.LIW.,MUELLERK.,KAUFMANA.:Emptyspaceskippingandocclusionclippingfortexture-basedvolumerendering.InPro-ceedingsofIEEEVisualization(VIS)(2003),p.42.MÖLLERT.,MACHIRAJUR.,MUELLERK.,YAGELR.:Acom-parisonofnormalestimationschemes.InProceedingsofVIS(IEEEConferenceonVisualization)(1997),pp.19–.MEYERA.,NEYRETF.:Multiscaleshadersfortheefficientre-alisticrenderingofpine-trees.InProceedingsofGI(GraphicsInterface)(2000).NEYRETF.:Modelinganimatingandrenderingcomplexscenesusingvolumetrictextures.IEEETransactionsonVisualizationandComputerGraphics4,1(Jan.–Mar.1998),55–70.OLIVEIRAM.M.,BISHOPG.,MCALLISTERD.:Relieftexturemapping.InSIGGRAPH(2000),pp.359–368.PORUMBESCUS.D.,BUDGEB.,FENGL.,JOYK.I.:Shellmaps.InSIGGRAPH(2005),pp.626–633.PERLINK.,HOFFERTE.M.:Hypertexture.

InSIGGRAPH

(1989),pp.253–262.

POLICARPOF.,OLIVEIRAM.:GPUGems3.Addison-Wesley,2007,ch.18:RelaxedConeSteppingForReliefMapping.RECHE-MARTINEZA.,MARTINI.,DRETTAKISG.:Volumet-ricreconstructionandinteractiverenderingoftreesfrompho-tographs.InSIGGRAPHproceedings(2004),pp.720–727.SCHARSACHH.:AdvancedGPUraycasting.InCentralEuropeanSeminaronComputerGraphics(2005),pp.69–76.TONGX.,ZHANGJ.,LIUL.,WANGX.,GUOB.,SHUMH.-Y.:Synthesisofbidirectionaltexturefunctionsonarbitrarysur-faces.ACMTransactionsonGraphics21,3(2002),665–672.(ProceedingsofACMSIGGRAPH2002).VOLLRATHJ.E.,SCHAFHITZELT.,ERTLT.:EmployingCom-plexGPUDataStructuresfortheInteractiveVisualizationofAdaptiveMeshRefinementData.InoftheInternationalWork-shoponVolumeGraphics(2006).ZIEGLERG.,TEVSA.,THEOBALTC.,SEIDELH.-P.:GPUpointlistgenerationthroughhistogrampyramids.InProc.ofVMV(2006),pp.137–141.

因篇幅问题不能全部显示,请点此查看更多更全内容

Top